These are three methods to create the RDD . 1.The first method is used when data is already available with the external systems like local filesystem, HDFS , HBase RDD can be created by calling a textFile method of SparkContext with path / URL as the argument. scala> val data = sc.textFile("File1.txt") sc is the object of SparkContext You need to create a file File1.txt in Spark_Home directory 2.The second approach can be used with the existing collections scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) scala> val rdd1 = sc.parallelize(arr1) 3.The third one is a way to create new RDD from the existing one. scala> val newRDD = arr1.map(data => (data * 2)) Ref : http://data-flair.training/forums/topic/in-how-many-ways-can-we-create-rdds-in-apache-spark-explain