by Diego Calvo | Sep 5, 2018 | Apache Spark, Big Data, Scala-example
Example: Grouping data in a simple way Example where people table is grouped by last name. df.groupBy(“surname”).count().show() +——-+—–+ |surname|count| +——-+—–+ | Martin| 1| | Garcia| 3|... by Diego Calvo | Aug 27, 2018 | Apache Spark, Big Data, Scala-example
Filter data with like Filtering is made to select the people whose surname contains “Garc” and which age is under 30. val df = sc.parallelize(Seq( (“Paco”,”Garcia”,24,24000,”2018-08-06 00:00:00″),... by Diego Calvo | Aug 17, 2018 | Apache Spark, Big Data, Scala-example
The following post shows the steps to recreate an example of linear regression in Scala. Set the data set Defines the set of data to apply to the model. import org.apache.spark.ml.linalg.Vectors val df = spark.createDataFrame(Seq( (0, 60), (0, 56), (0, 54), (0, 62),... by Diego Calvo | Aug 10, 2018 | Apache Hadoop, Apache Spark, Big Data
Write data to HDFS Example of how to write RDD data in a HDFS of Hadoop. Delete the file if it exists Import Scala. sys. process. _ "HDFs DFS-rm-R/pruebas"! Record a RDD in HDFS Val Rdd = sc. parallelize (List ( (0, 60), (0, 56), (0, 54), (0,... by Diego Calvo | Jul 23, 2018 | Apache Spark, Big Data, Scala-example
Create DataFrames Example of how to create a dataframe in Scala. import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}; val data = List( Row(“Peter”,”Garcia”,24,24000),... by Diego Calvo | Jul 21, 2018 | Apache Spark, Big Data, Scala-example
Creating datasets RDD Simple to Dataset Example of creating a dataset from a RDD val rdd = sc.parallelize(List(1,2,3,4,5)) val ds = spark.createDataset(rdd) ds.show() +—–+ |value| +—–+ | 1 | | 2 | | 3 | | 4 | | 5 | +—–+ Classes to...