Write data to HDFS
Example of how to write RDD data in a HDFS of Hadoop.
Delete the file if it exists Import Scala. sys. process. _ "HDFs DFS-rm-R/pruebas"! Record a RDD in HDFS Val Rdd = sc. parallelize (List ( (0, 60), (0, 56), (0, 54), (0, 62), (0, 61), (0, 53), (0, 55), (0, 62), (0, 64), (1, 73), (1, 78), (1, 67), (1, 68), (1, 78) )) Rdd. SaveAsTextFile ("HDFs:///pruebas/prueba1.csv") Rdd. Collect
Write Data in HDFS (2nd form)
An example of how to write plain text data to a Hadoop HDFS.
import org. apache. Hadoop. conf. Configuration; import org. apache. Hadoop. fs. FileSystem; import org. apache. Hadoop. fs. Path; Import java. io. PrintWriter; Object APP { println ("Writing test in HDFS...") Val conf = new Configuration () Val fs = FileSystem. Get (CONF) Val output = fs. Create (New Path ("HDFs://sandbox-hdp.hortonworks.com: 8020/Tests/test2. txt")) Val writer = new PrintWriter (output) Try Writer. write ("Hello World") Writer. write ("n") } Finally Writer. Close () } Print ("Finished!") }
Add data to HDFS
Example of adding data of type dataframe to a HDFS
Val df = Seq ((1, 2), (3, 4), (5.6), (0.0)). ToDF ("Col_0", "Col_1") DF. Show () DF. Write. Mode ("Overwrite"). Format ("Parquet"). Save ("HDFs:///incrementar_datos.parquet") DF. Write. Mode ("append"). Format ("Parquet"). Save ("HDFs:///incrementar_datos.parquet") Val df2 = Spark . read . Format ("parquet") . Option ("InferSchema", True) . Load ("HDFs:///incrementar_datos.parquet") Df2. Show ()
+-----+-----+ DF | Col_0 | Col_1 | +-----+-----+ | 1 | 2 | | 3 | 4 | | 5 | 6 | | 0 | 0 | +-----+-----+ +-----+-----+ DF2 | Col_0 | Col_1 | +-----+-----+ | 1 | 2 | | 3 | 4 | | 1 | 2 | | 3 | 4 | | 5 | 6 | | 0 | 0 | | 5 | 6 | | 0 | 0 | +-----+-----+
Read RDDs from HDFS
Simple example of how to read data from a HDFS.
Val rdd2 = sc. TextFile ("HDFs:///pruebas/prueba1.csv") Rdd2. Collect ()
Note: SC refers to SparkContext, in many big data development environment is already instantiated but we should instantiate the object.
Read Dataframes from HDFS
import org. apache. Spark. sql. SparkSession import org. apache. Spark. sql. DataFrame Val df: DataFrame = Spark . read . Format ("CSV") . Option ("header", false) . Option ("InferSchema", True) . Load ("HDFs:///pruebas/prueba1.csv") DF. Show ()
Note: Spark refers to SparkSession, in many big data development environment is already instantiated but we should instantiate the object.
0 Comments