What is the Spark Driver and Executors ?

The main() method of the program runs in the driver. The driver is the process that runs the user code that creates RDDs, and performs transformation and action, and also creates SparkContext. When the Spark Shell is launched, this signifies that we have created a driver program. On the termination of the driver, the application is finished.
The driver program splits the Spark application into the task and schedules them to run on the executor. The task scheduler resides in the driver and distributes task among workers. The two main key roles of drivers are:
  • Converting user program into the task.
  • Scheduling task on the executor.
The structure of Spark program at a higher level is: RDDs are created from some input data, derive new RDD from existing using various transformations, and then after it performs an action to compute data. In Spark Program, the DAG (directed acyclic graph) of operations are created implicitly. And when the driver runs, it converts that Spark DAG into a physical execution plan.

Apache Spark Executors

The individual task in the given Spark job runs in the Spark executors. Executors are launched once in the beginning of Spark Application and then they run for the entire lifetime of an application. Even if the Spark executor fails, the Spark application can continue with ease. There are two main roles of the executors:
  • Runs the task that makes up the application and returns the result to the driver.
  • Provide in-memory storage for RDDs that are cached by the user.

Ref : https://data-flair.training/blogs/how-apache-spark-works/

Comments

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting

Application Localisation in YARN and its pitfalls