Why Scala for Spark ?

In Summary:

1) Apache Spark is written in Scala and because of its scalability on JVM
2) Scala programming retains a perfect balance between productivity and performance. 
3) Organizations want to enjoy the expressive power of dynamic programming language without having to lose type safety
4) Scala is designed with parallelism and concurrency in mind for big data applications
5) Scala collaborates well within the MapReduce big data model because of its functional paradigm
6)Scala programming language provides the best path for building scalable big data applications in terms of data size and program complexity
7) Scala programming is comparatively less complex unlike Java.
8) Scala has well-designed libraries for scientific computing, linear algebra and random number generation.
9) Efficiency and speed play a vital role regardless of increasing processor speeds. 
10) Other programming languages like Python or Java have lag in the API coverage. 
11) The Functional programming language gives the inbuilt capability of immutability was really need for spark to achieve fault tolerance for both processing and persistence.

In Brief:


1) Apache Spark is written in Scala and because of its scalability on JVM - Scala programming is most prominently used programming language, by big data developers for working on Spark projects. Developers state that using Scala helps dig deep into Spark’s source code so that they can easily access and implement the newest features of Spark. Scala’s interoperability with Java is its biggest attraction as java developers can easily get on the learning path by grasping the object oriented concepts quickly.

2) Scala programming retains a perfect balance between productivity and performance. Most of the big data developers are from Python or R programming background. Syntax for Scala programming is less intimidating when compared to Java or C++. For a new Spark developer with no prior experience, it is enough for him/her to know the basic syntax collections and lambda to become productive in big data processing using Apache Spark. Also, the performance achieved using Scala is better than many other traditional data analysis tools like R or Python. Over the time, as the skills of a developer develop- it becomes easy to transition from imperative to more elegant functional programming code to improve performance.

3) Organizations want to enjoy the expressive power of dynamic programming language without having to lose type safety- Scala programming has this potential and this can be judged from its increasing adoption rates in the enterprise.

4) Scala is designed with parallelism and concurrency in mind for big data applications. Scala has excellent built-in concurrency support and libraries like Akka which make it easy for developers to build a truly scalable application.

5) Scala collaborates well within the MapReduce big data model because of its functional paradigm. Many Scala data frameworks follow similar abstract data types that are consistent with Scala’s collection API’s. Developers just need to learn the standard collections and it would easy to work with other libraries.

6) Scala programming language provides the best path for building scalable big data applications in terms of data size and program complexity. With support for immutable data structures, for-comprehensions, immutable named values- Scala provides remarkable support for functional programming.

7) Scala programming is comparatively less complex unlike Java. A single complex line of code in Scala can replace 20 to 25 lines of complex java code making it a preferable choice for big data processing on Apache Spark.

8) Scala has well-designed libraries for scientific computing, linear algebra and random number generation. The standard scientific library Breeze contains non-uniform random generation, numerical algebra, and other special functions. Saddle is the data library supported by Scala programming which provides a solid foundation for data manipulation through 2D data structures, robustness to missing values, array-backed support, and automatic data alignment.

9) Efficiency and speed play a vital role regardless of increasing processor speeds. Scala is fast and efficient making it an ideal choice of language for computationally intensive algorithms. Compute cycle and memory efficiency are also well tuned when using Scala for Spark programming.


10) Other programming languages like Python or Java have lag in the API coverage. Scala has bridged this API coverage gap and is gaining traction from the Spark community. The thumb rule here is that by using Scala or Python - developers can write most concise code and using Java or Scala they can achieve the best runtime performance. The best trade-off is to use Scala for Spark as it makes use of all the mainstream features, instead of developers having to master the advanced constructs.




Comments

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting

Application Localisation in YARN and its pitfalls