What are Accumulators in Apache Spark ?
Accumulators are variables that are "added" to through an associative and commutative "add" operation. They act as a container for accumulating partial values across multiple tasks (running on executors). They are designed to be used safely and efficiently in parallel and distributed Spark computations and are meant for distributed counters and sums (e.g. task metrics).
You can create built-in accumulators for longs, doubles, or collections or register custom accumulators using the SparkContext.register methods. You can create accumulators with or without a name, but only named accumulators are displayed in web UI (under Stages tab for a given stage).
Ref: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-accumulators.html
https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#accumulators
Comments
Post a Comment