Read and write in parquet format in Python

by | Oct 10, 2018 | Big Data, Python-example | 0 comments

Generate data to use for reading and writing in parquet formatPython logo

Example of random data to use in the following sections

data = []
for x in range(5):
    data.append((random.randint(0,9), random.randint(0,9)))
df = spark.createDataFrame(data, ("label", "data"))

df.show()
+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write data in parquet format

path_parquet = "/prueba.parquet"# Read from HDFS 
path_parquet = "/prueba.parquet" # Read from local file
df.write \
    .mode("overwrite") \
    .format("parquet") \
    .save(path_parquet)

Read data in parquet format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .parquet(path_parquet)

df2.show()
+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write data in gzip compressed data in parquet format

path_parquet_gzip = "/prueba_gzip.parquet"# Read from HDFS 
path_parquet_gzip = "D:/prueba_gzip.parquet" # Read from local file
df.write\
    .mode("overwrite")\
    .format("parquet")\
    .option("compression", "gzip")\
    .save(path_parquet_gzip)

Read gzip compressed data in parquet format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .parquet(path_parquet_gzip)

df2.show()
+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

Write data snappy compressed in parquet format

path_parquet_snappy = "/prueba_snappy.parquet"# Read from HDFS 
path_parquet_snappy = "D:/prueba_snappy.parquet" # Read from local file

df.write\
    .mode("overwrite")\
    .format("parquet")\
    .option("compression", "snappy")\
    .save(path_parquet_snappy)

Read data snappy compressed in parquet format

df2 = spark\
    .read\
    .option("multiline", "true") \
    .parquet(path_parquet_snappy)
df2.show()
+-----+----+
|label|data|
+-----+----+
|    4|   0|
|    7|   0|
|    1|   1|
|    3|   8|
|    3|   5|
+-----+----+

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *