In how many ways can we create RDDs in Apache Spark?

These are three methods to create the RDD.
1.The first method is used when data is already available with the external systems like local filesystem, HDFSHBase
RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.
scala> val data = sc.textFile("File1.txt")
sc is the object of SparkContext
You need to create a file File1.txt in Spark_Home directory
2.The second approach can be used with the existing collections
scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val rdd1 = sc.parallelize(arr1)
3.The third one is a way to create new RDD from the existing one.
scala> val newRDD = arr1.map(data => (data * 2))
Ref : http://data-flair.training/forums/topic/in-how-many-ways-can-we-create-rdds-in-apache-spark-explain

Comments

Popular posts from this blog

Apache Spark Driver logs redirecting to directory in both cluster and client mode

eclipse oxygen is not starting

Application Localisation in YARN and its pitfalls