In how many ways can we create RDDs in Apache Spark?

May 21, 2018

These are three methods to create the RDD.

1.The first method is used when data is already available with the external systems like local filesystem, HDFS, HBase
RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.

scala> val data = sc.textFile("File1.txt")
sc is the object of SparkContext
You need to create a file File1.txt in Spark_Home directory

2.The second approach can be used with the existing collections

scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val rdd1 = sc.parallelize(arr1)

3.The third one is a way to create new RDD from the existing one.

scala> val newRDD = arr1.map(data => (data * 2))

Ref : http://data-flair.training/forums/topic/in-how-many-ways-can-we-create-rdds-in-apache-spark-explain

Search This Blog

VSNR Blog

In how many ways can we create RDDs in Apache Spark?

Comments

Post a Comment

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting

How to give arguments to kill via pipe