What are Accumulators in Apache Spark ?

May 20, 2018

Accumulators are variables that are "added" to through an associative and commutative "add" operation. They act as a container for accumulating partial values across multiple tasks (running on executors). They are designed to be used safely and efficiently in parallel and distributed Spark computations and are meant for distributed counters and sums (e.g. task metrics).

You can create built-in accumulators for longs, doubles, or collections or register custom accumulators using the SparkContext.register methods. You can create accumulators with or without a name, but only named accumulators are displayed in web UI (under Stages tab for a given stage).

Ref:

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-accumulators.html

https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#accumulators

Search This Blog

VSNR Blog

What are Accumulators in Apache Spark ?

Comments

Post a Comment

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting

How to give arguments to kill via pipe