Diego Calvo, Autor en Diego Calvo

Read & write JSON in Python

by Diego Calvo | Oct 9, 2018 | Big Data, Python-example

Generate data to use to read & write JSON Example of random data to use in the following sections data = [] for x in range(5): data.append((random.randint(0,9), random.randint(0,9))) df = spark.createDataFrame(data, (“label”, “data”))...

Dates in Python

by Diego Calvo | Sep 24, 2018 | Python-example

Create date from a String import pandas as pd startdate = “10/10/2018” my_date = pd.to_datetime(startdate) print(my_date.strftime(“%Y-%m-%d”)) 2018-10-10 Create current date import datetime my_date = datetime.datetime.now()...

Machine Learning (Supervised & unsupervised)

by Diego Calvo | Sep 21, 2018 | Machine learning

Machine Learning definition Machine learning is a discipline of artificial intelligence. The main objective is to create systems that are able to learn automatically, ie they are able to find complex patterns in large sets of data on their own. Types of...

Apache Hadoop YARN

by Diego Calvo | Sep 12, 2018 | Apache Hadoop, Big Data

Yarn definition Yarn (Yet Another Resource negotiator) is a data operating system and distributed Resource Manager, also known as Hadoop 2 as it is the evolution of Hadoop Map-Reduce. The most significant changes of Hadoop 2 over Hadoop 1 is that the thread technology...

Group dataframe elements in Scala

by Diego Calvo | Sep 5, 2018 | Apache Spark, Big Data, Scala-example

Example: Grouping data in a simple way Example where people table is grouped by last name. df.groupBy(“surname”).count().show() +——-+—–+ |surname|count| +——-+—–+ | Martin| 1| | Garcia| 3|...