Crear DataFrames
Ejemplo de como crear un dataframe en Scala.
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}; val data = List( Row("Paco","Garcia",24,24000), Row("Juan","Garcia",26,27000), Row("Lola","Martin",29,31000), Row("Sara","Garcia",35,34000) ) val rdd = sc.parallelize(data) val schema = StructType( List( StructField("nombre", StringType, nullable=false), StructField("apellido", StringType, nullable=false), StructField("edad", IntegerType), StructField("salario", IntegerType) ) ) val df = spark.createDataFrame(rdd,schema) df.printSchema() df.show()
root |-- nombre: string (nullable = false) |-- apellido: string (nullable = false) |-- edad: integer (nullable = true) |-- salario: integer (nullable = true)
+------+--------+----+-------+ |nombre|apellido|edad|salario| +------+--------+----+-------+ | Paco | Garcia | 24 | 24000 | | Juan | Garcia | 26 | 27000 | | Lola | Martin | 29 | 31000 | | Sara | Garcia | 35 | 34000 | +------+--------+----+-------+
Crear dataframe con datos aleatorios
import scala.util.Random val df = sc.parallelize( Seq.fill(5){(Math.abs(Random.nextLong % 100000L),Math.abs(Random.nextLong % 100L))} ).toDF("salario" , "edad") df.show()
+-------+----+ |salario|edad| +-------+----+ | 41772| 17 | | 74772| 66 | | 6326| 60 | | 72581| 70 | | 53037| 0 | +-------+----+
Transformar RDD a Dataframe
val nombre_cols=Array("id", "nombre", "valores") val df=sc.parallelize(Seq( (1,"Mario", Seq(0,2,5)), (2,"Sonia", Seq(1,20,5)))).toDF(nombre_cols: _*) df.show()
+---+------+----------+ | id|nombre| valores | +---+------+----------+ | 1| Mario | [0, 2, 5]| | 2| Sonia |[1, 20, 5]| +---+------+----------+
Transformar Dataset a Dataframe
import org.apache.spark.sql.functions._ val wordsDataset = sc.parallelize( Seq("Hola mundo hola mundo", "ni hola ni mundo ni nada", "cuenta palabras")) .toDS() val result = wordsDataset .flatMap(_.split(" ")) // Dividir las frases en palabras .filter(_ != "") // Filtrar palabras vacias .map(_.toLowerCase()) .toDF() // Convertir a DF para agregar y ordenar .groupBy($"value") // Contar ocurrencias de palabras .agg(count("*") as "ocurrencias") .orderBy($"ocurrencias" desc) // Mostar la ocurrencia de cada palabra result.show()
+---------+------------+ | value | ocurrencias| +---------+------------+ | nada | 1 | | palabras| 1 | | cuenta | 1 | | ni | 3 | | hola | 3 | | mundo | 3 | +---------+------------+
Transformar Listas a Dataframe
val A = List("Paco","Sara","Flor","Rosa") val B = List(1,2,3,4) val C = List(5,6,7,8) val zip = A.zip(B).zip(C) val tup = zip.map{case ((w,x),y)=>(w,x,y)} val df = tup.toDF("A","B","C") df.show
+----+---+---+ | A| B| C| +----+---+---+ |Paco| 1| 5| |Sara| 2| 6| |Flor| 3| 7| |Rosa| 4| 8| +----+---+---+
0 comentarios