These are three methods to create the  RDD .   1.The first method is used when data is already available with the external systems like local filesystem,  HDFS ,  HBase RDD  can be created by calling a textFile method of  SparkContext  with path / URL as the argument.   scala> val data = sc.textFile("File1.txt") sc is the object of SparkContext You need to create a file File1.txt in Spark_Home directory   2.The second approach can be used with the existing collections  scala> val arr1 = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) scala> val rdd1 = sc.parallelize(arr1)   3.The third one is a way to create new RDD from the existing one.   scala> val newRDD = arr1.map(data => (data * 2)) Ref : http://data-flair.training/forums/topic/in-how-many-ways-can-we-create-rdds-in-apache-spark-explain