Write data to HDFS
Example of how to write RDD data in a HDFS of Hadoop.
Delete the file if it exists
Import Scala. sys. process. _
"HDFs DFS-rm-R/pruebas"!
Record a RDD in HDFS
Val Rdd = sc. parallelize (List (
(0, 60),
(0, 56),
(0, 54),
(0, 62),
(0, 61),
(0, 53),
(0, 55),
(0, 62),
(0, 64),
(1, 73),
(1, 78),
(1, 67),
(1, 68),
(1, 78)
))
Rdd. SaveAsTextFile ("HDFs:///pruebas/prueba1.csv")
Rdd. Collect
Write Data in HDFS (2nd form)
An example of how to write plain text data to a Hadoop HDFS.
import org. apache. Hadoop. conf. Configuration;
import org. apache. Hadoop. fs. FileSystem;
import org. apache. Hadoop. fs. Path;
Import java. io. PrintWriter;
Object APP {
println ("Writing test in HDFS...")
Val conf = new Configuration ()
Val fs = FileSystem. Get (CONF)
Val output = fs. Create (New Path ("HDFs://sandbox-hdp.hortonworks.com: 8020/Tests/test2. txt"))
Val writer = new PrintWriter (output)
Try
Writer. write ("Hello World")
Writer. write ("n")
}
Finally
Writer. Close ()
}
Print ("Finished!")
}
Add data to HDFS
Example of adding data of type dataframe to a HDFS
Val df = Seq ((1, 2), (3, 4), (5.6), (0.0)). ToDF ("Col_0", "Col_1")
DF. Show ()
DF. Write. Mode ("Overwrite"). Format ("Parquet"). Save ("HDFs:///incrementar_datos.parquet")
DF. Write. Mode ("append"). Format ("Parquet"). Save ("HDFs:///incrementar_datos.parquet")
Val df2 = Spark
. read
. Format ("parquet")
. Option ("InferSchema", True)
. Load ("HDFs:///incrementar_datos.parquet")
Df2. Show ()
+-----+-----+ DF | Col_0 | Col_1 | +-----+-----+ | 1 | 2 | | 3 | 4 | | 5 | 6 | | 0 | 0 | +-----+-----+ +-----+-----+ DF2 | Col_0 | Col_1 | +-----+-----+ | 1 | 2 | | 3 | 4 | | 1 | 2 | | 3 | 4 | | 5 | 6 | | 0 | 0 | | 5 | 6 | | 0 | 0 | +-----+-----+
Read RDDs from HDFS
Simple example of how to read data from a HDFS.
Val rdd2 = sc. TextFile ("HDFs:///pruebas/prueba1.csv")
Rdd2. Collect ()Note: SC refers to SparkContext, in many big data development environment is already instantiated but we should instantiate the object.
Read Dataframes from HDFS
import org. apache. Spark. sql. SparkSession
import org. apache. Spark. sql. DataFrame
Val df: DataFrame = Spark
. read
. Format ("CSV")
. Option ("header", false)
. Option ("InferSchema", True)
. Load ("HDFs:///pruebas/prueba1.csv")
DF. Show ()Note: Spark refers to SparkSession, in many big data development environment is already instantiated but we should instantiate the object.




0 Comments