What are Accumulators in Apache Spark ?

Accumulators are variables that are "added" to through an associative and commutative "add" operation. They act as a container for accumulating partial values across multiple tasks (running on executors). They are designed to be used safely and efficiently in parallel and distributed Spark computations and are meant for distributed counters and sums (e.g. task metrics).
You can create built-in accumulators for longs, doubles, or collections or register custom accumulators using the SparkContext.register methods. You can create accumulators with or without a name, but only named accumulators are displayed in web UI (under Stages tab for a given stage).


Ref: 

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-accumulators.html
https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#accumulators

Comments

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting