Create DataFrames
Example of how to create a dataframe in Scala.
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}; val data = List( Row("Peter","Garcia",24,24000), Row("Juan","Garcia",26,27000), Row("Lola","Martin",29,31000), Row("Sara","Garcia",35,34000) ) val rdd = sc.parallelize(data) val schema = StructType( List( StructField("name", StringType, nullable=false), StructField("surname", StringType, nullable=false), StructField("age", IntegerType), StructField("salary", IntegerType) ) ) val df = spark.createDataFrame(rdd,schema) df.printSchema() df.show()
root |-- name: string (nullable = false) |-- surname: string (nullable = false) |-- age: integer (nullable = true) |-- salary: integer (nullable = true)
+------+--------+----+-------+ | name | surname| age| salary| +------+--------+----+-------+ | Peter| Garcia | 24 | 24000 | | Juan | Garcia | 26 | 27000 | | Lola | Martin | 29 | 31000 | | Sara | Garcia | 35 | 34000 | +------+--------+----+-------+
Creating Dataframe with Random data
import scala.util.Random val df = sc.parallelize( Seq.fill(5){(Math.abs(Random.nextLong % 100000L),Math.abs(Random.nextLong % 100L))} ).toDF("salary" , "age") df.show()
+------+----+ |salary| age| +------+----+ | 41772| 17 | | 74772| 66 | | 6326 | 60 | | 72581| 70 | | 53037| 0 | +-------+---+
Transforming RDD to Dataframe
val name_cols=Array("id", "name", "values") val df=sc.parallelize(Seq( (1,"Mario", Seq(0,2,5)), (2,"Sonia", Seq(1,20,5)))).toDF(name_cols: _*) df.show()
+---+------+----------+ | id| name | values | +---+------+----------+ | 1| Mario | [0, 2, 5]| | 2| Sonia |[1, 20, 5]| +---+------+----------+
Transforming Dataset to Dataframe
import org.apache.spark.sql.functions._ val wordsDataset = sc.parallelize( Seq("Hello world hello world", "no hello no world no more", "count words")) .toDS() val result = wordsDataset .flatMap(_.split(" ")) // Split sentences into words .filter(_ != "") // Filter entry words .map(_.toLowerCase()) .toDF() // Convert to DF to add and order .groupBy($"value") // Count occurrences of words .agg(count("*") as "ocurrences") .orderBy($"ocurrences" desc) // Show occurrences per word result.show()
+---------+------------+ | value | ocurrences | +---------+------------+ | count | 1 | | words | 1 | | more | 1 | | no | 3 | | hello | 3 | | world | 3 | +---------+------------+
Transforming lists to Dataframe
val A = List("Paco","Sara","Tom","Rosa") val B = List(1,2,3,4) val C = List(5,6,7,8) val zip = A.zip(B).zip(C) val tup = zip.map{case ((w,x),y)=>(w,x,y)} val df = tup.toDF("A","B","C") df.show()
+----+---+---+ | A| B| C| +----+---+---+ |Paco| 1| 5| |Sara| 2| 6| |Tom | 3| 7| |Rosa| 4| 8| +----+---+---+
0 Comments