In Summary: 1) Apache Spark is written in Scala and because of its scalability on JVM 2) Scala programming retains a perfect balance between productivity and performance. 3) Organizations want to enjoy the expressive power of dynamic programming language without having to lose type safety 4) Scala is designed with parallelism and concurrency in mind for big data applications 5) Scala collaborates well within the MapReduce big data model because of its functional paradigm 6)Scala programming language provides the best path for building scalable big data applications in terms of data size and program complexity 7) Scala programming is comparatively less complex unlike Java. 8) Scala has well-designed libraries for scientific computing, linear algebra and random number generation. 9) Efficiency and speed play a vital role regardless of increasing processor speeds. 10) Other programming languages like Python or Java have lag in the API coverage. 1...
Accumulators are variables that are only “added” to through an associative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark naively supports accumulators of numeric types, and programmers can add support for new types. If accumulators are created with a name, they will be displayed in Spark’s UI. This can be useful for understanding the progress of running stages. An accumulator is created from an initial value v by calling SparkContext.accumulator(v) . Tasks running on the cluster can then add to it using the add method or the += operator (in Scala and Python). However, they cannot read its value. Only the driver program can read the accumulator’s value, using its value method. scala > val accum = sc . accumulator ( 0 , "My Accumulator" ) accum : spark.Accumulator [ Int ] = 0 scala > sc . parallelize ( Array ( 1 , 2 ,...
Comments
Post a Comment