by Diego Calvo | Oct 10, 2018 | Big Data, Python-example
Generate data to use for reading and writing in parquet format Example of random data to use in the following sections data = [] for x in range(5): data.append((random.randint(0,9), random.randint(0,9))) df = spark.createDataFrame(data, (“label”,... by Diego Calvo | Oct 9, 2018 | Big Data, Python-example
Generate data to use to read & write JSON Example of random data to use in the following sections data = [] for x in range(5): data.append((random.randint(0,9), random.randint(0,9))) df = spark.createDataFrame(data, (“label”, “data”))... by Diego Calvo | Sep 24, 2018 | Python-example
Create date from a String import pandas as pd startdate = “10/10/2018” my_date = pd.to_datetime(startdate) print(my_date.strftime(“%Y-%m-%d”)) 2018-10-10 Create current date import datetime my_date = datetime.datetime.now()... by Diego Calvo | May 24, 2018 | Python-example
Define virtual Environment from command line > python -m venv develop_virtual_enviroment Activate in Environment > ..\develop_virtual_enviroment\Scripts\activate.bat (for Windows) > ..\develop_virtual_enviroment\bin\activate.bat (for Linux) Disable the... by Diego Calvo | Apr 26, 2018 | Apache Spark, Big Data, Python-example
Load CSV in Databricks Databricks Community Edition provides a graphical interface for file loading. This interface is accessed in the DataBase > Create New Table. Once inside, the fields must be indicated: Upload to DBF: name of the file to Load. Select a cluster... by Diego Calvo | Jan 17, 2018 | Apache Spark, Python-example
Example of pipeline concatenation In this example, you can show an example of how elements are included in a pipe in such a way that finally all converge in the same point, which we call “features” from pyspark.ml import Pipeline from pyspark.ml.feature...