Apache Spark driver class logs need to direct to directory in both cluster and client mode. Application users can log useful information in driver class. We have two things: 1) Spark Application run in yarn-client mode 2) Spark Application run in yarn-cluster mode Spark Application run in yarn-client mode When running a job in yarn-client mode, the driver logs are spilled on the console, but this may not useful for longer run, because the terminal will be aborted. So it is always a good approach to log the driver information in to definite location. Following is the approach as discussed in the HDP blogspot for yarn-client mode: https://community.hortonworks.com/articles/138849/how-to-capture-spark-driver-and-executor-logs-in-y.html. Here are the steps: 1. Place a driver_log4j.properties file in a certain location (say /tmp) on the machine where you will be submitting the job in yarn-client mode Contents o...
In Summary: 1) Apache Spark is written in Scala and because of its scalability on JVM 2) Scala programming retains a perfect balance between productivity and performance. 3) Organizations want to enjoy the expressive power of dynamic programming language without having to lose type safety 4) Scala is designed with parallelism and concurrency in mind for big data applications 5) Scala collaborates well within the MapReduce big data model because of its functional paradigm 6)Scala programming language provides the best path for building scalable big data applications in terms of data size and program complexity 7) Scala programming is comparatively less complex unlike Java. 8) Scala has well-designed libraries for scientific computing, linear algebra and random number generation. 9) Efficiency and speed play a vital role regardless of increasing processor speeds. 10) Other programming languages like Python or Java have lag in the API coverage. 1...
Comments
Post a Comment