por Adrian Atienza | Ene 17, 2018 | Python, Spark
from pyspark.ml import Pipeline from pyspark.ml.feature import VectorAssembler # Definir el ‘df’ Spark a utilizar df = spark.createDataFrame([ (‘line_1’, 100, 10, 1), (‘line_2’, 200, 20, 2), (‘line_3’, 300,... por Adrian Atienza | Dic 24, 2017 | Python, Spark
Cargar datos # Cargar un dataframe df = sqlContext.read.format(«com.databricks.spark.csv»).options(delimiter=’\t’,header=’true’,inferschema=’true’).load(«/databricks-datasets/power-plant/data») display(df) AT V AP RH PE 14.96 41.76...