Big Data Archivos - Page 2 of 8

Generate a Kerberos authentication keytab in a Hadoop cluster

by Diego Calvo | Sep 4, 2018 | Big Data

Access the cluster by SSH ssh user_name@server_cluster_name Authentication in the Shell Kinit user_name@REINO.COM If authentication is successful, we will receive a ticket-granding ticket (TGT) from the KDC. This means that we have authenticated with the server, but...

Scala Filter DataFrame

by Diego Calvo | Aug 27, 2018 | Apache Spark, Big Data, Scala-example

Filter data with like Filtering is made to select the people whose surname contains “Garc” and which age is under 30. val df = sc.parallelize(Seq( (“Paco”,”Garcia”,24,24000,”2018-08-06 00:00:00″),...

Apache Sqoop Examples

by Diego Calvo | Aug 19, 2018 | Big Data, Data bases

Prerequisites of Apache Sqoop Examples The prerequisites for these examples are the same as for the previous post of Sqoop. These examples create a database “myddbb” and a table with values entered “mytable” and another empty table...

Linear Regression in Scala

by Diego Calvo | Aug 17, 2018 | Apache Spark, Big Data, Scala-example

The following post shows the steps to recreate an example of linear regression in Scala. Set the data set Defines the set of data to apply to the model. import org.apache.spark.ml.linalg.Vectors val df = spark.createDataFrame(Seq( (0, 60), (0, 56), (0, 54), (0, 62),...

Connect with Scala to the HDFS of Hadoop

by Diego Calvo | Aug 10, 2018 | Apache Hadoop, Apache Spark, Big Data

Write data to HDFS Example of how to write RDD data in a HDFS of Hadoop. Delete the file if it exists Import Scala. sys. process. _ "HDFs DFS-rm-R/pruebas"! Record a RDD in HDFS Val Rdd = sc. parallelize (List ( (0, 60), (0, 56), (0, 54), (0,...

Scala DataFrames

by Diego Calvo | Jul 23, 2018 | Apache Spark, Big Data, Scala-example

Create DataFrames Example of how to create a dataframe in Scala. import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}; val data = List( Row(“Peter”,”Garcia”,24,24000),...