Creating datasets
RDD Simple to Dataset
Example of creating a dataset from a RDD
val rdd = sc.parallelize(List(1,2,3,4,5)) val ds = spark.createDataset(rdd) ds.show()
+-----+ |value| +-----+ | 1 | | 2 | | 3 | | 4 | | 5 | +-----+
Classes to Dataset
An example of creating a dataset from an instance of a class that contains Data.
import spark.implicits._ case class Person(name: String, surname: String, age: Integer, salary: Integer) val person1 = Person("Peter","Garcia",24,24000) val person2 = Person("Juan","Garcia",26,27000) val person3 = Person("Lola","Martin",29,31000) val person4 = Person("Sara","Garcia",35,34000) val data = Seq(person1,person2,person3,person4) val ds = spark.createDataset(data) ds.show()
+------+--------+----+-------+ |name |surname |age |salary | +------+--------+----+-------+ | Peter| Garcia| 24| 24000| | Juan| Garcia| 26| 27000| | Lola| Martin| 29| 31000| | Sara| Garcia| 35| 34000| +------+--------+----+-------+
Transforming RDD to Dataset
Example of how to move from a rdd to a dataset in a simple way
val rdd = sc.parallelize(Seq(("Paco","Garcia",24,24000),("Juan","Garcia",26,27000),("Lola","Martin",29,31000),("Sara","Garcia",35,34000))) val ds = rdd.toDS() display(ds)
0 Comments