In how many ways can we create RDDs in Apache Spark?
These are three methods to create the RDD.
1.The first method is used when data is already available with the external systems like local filesystem, HDFS, HBase
RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.
RDD can be created by calling a textFile method of SparkContext with path / URL as the argument.
scala> val data = sc.textFile("File1.txt")
sc is the object of SparkContext
You need to create a file File1.txt in Spark_Home directory
2.The second approach can be used with the existing collections
scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val rdd1 = sc.parallelize(arr1)
3.The third one is a way to create new RDD from the existing one.
scala> val newRDD = arr1.map(data => (data * 2))
Ref : http://data-flair.training/forums/topic/in-how-many-ways-can-we-create-rdds-in-apache-spark-explain
Comments
Post a Comment