What is Accumulator in Spark ?
Accumulators are variables that are only “added” to through an associative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark naively supports accumulators of numeric types, and programmers can add support for new types. If accumulators are created with a name, they will be displayed in Spark’s UI. This can be useful for understanding the progress of running stages.
An accumulator is created from an initial value
An accumulator is created from an initial value
v
by calling SparkContext.accumulator(v)
. Tasks running on the cluster can then add to it using the add
method or the +=
operator (in Scala and Python). However, they cannot read its value. Only the driver program can read the accumulator’s value, using its value
method.scala> val accum = sc.accumulator(0, "My Accumulator")
accum: spark.Accumulator[Int] = 0
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
...
10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
scala> accum.value
res2: Int = 10
Comments
Post a Comment