Apache Spark Archivos - Page 3 of 4

Apache Spark Components

by Diego Calvo | Jun 20, 2018 | Apache Spark, Big Data

Components Spark Core Spark core is the core where all the architecture is supported, provides: Distributing tasks Programming Input/output operations Using Java, Python, Scala and R programming interfaces focused on RDDs’s abstraction. It establishes a...

Install Hortonworks in Virtual Box for Spark

by Diego Calvo | May 30, 2018 | Apache Spark, Big Data

Download Hortonworks Data Platform (HDP) Sandbox Virtualbox Installation First install virtual box and once installed go to the virtual machine of Hortonworks and run it, this will appear an installation of this machine in virtual box. Configure the features of the...

Read CSV in Databricks in Spark

by Diego Calvo | Apr 26, 2018 | Apache Spark, Big Data, Python-example

Load CSV in Databricks Databricks Community Edition provides a graphical interface for file loading. This interface is accessed in the DataBase > Create New Table. Once inside, the fields must be indicated: Upload to DBF: name of the file to Load. Select a cluster...

Use of pipelines in Apache Spark in Python

by Diego Calvo | Jan 17, 2018 | Apache Spark, Python-example

Example of pipeline concatenation In this example, you can show an example of how elements are included in a pipe in such a way that finally all converge in the same point, which we call “features” from pyspark.ml import Pipeline from pyspark.ml.feature...

Programming languages

by Diego Calvo | Nov 23, 2017 | Apache Spark, Neo4J Data Base, Python-example, R-example, Scala-example

Comparison of programming languages

Apache Spark function in Python

by Diego Calvo | Nov 23, 2017 | Apache Spark, Python-example

Function example in Spark Python Displays an example of a map function with Spark. def my_func(iterator): yield sum(iterator) list = range(1,10) parallel = sc.parallelize(list, 5) parallel.mapPartitions(my_func).collect() [1, 5, 9, 13,...