RDD definition
RDD Resilient distributed datasets represents an immutable and partitioned collection of elements that can be operated in parallel.
A RDD can be created or paralelizando a collection of data (list, dictionary,..) or loading it of an external storage system, such as a file sharing system, HDFS, HBase, or any data source that offers a Hadoop input format.
0 Comments