by Diego Calvo | Oct 9, 2018 | Big Data, Python-example
Generate data to use to read & write JSON Example of random data to use in the following sections data = [] for x in range(5): data.append((random.randint(0,9), random.randint(0,9))) df = spark.createDataFrame(data, (“label”, “data”))... by Diego Calvo | Sep 24, 2018 | Python-example
Create date from a String import pandas as pd startdate = “10/10/2018” my_date = pd.to_datetime(startdate) print(my_date.strftime(“%Y-%m-%d”)) 2018-10-10 Create current date import datetime my_date = datetime.datetime.now()... by Diego Calvo | Sep 21, 2018 | Machine learning
Machine Learning definition Machine learning is a discipline of artificial intelligence. The main objective is to create systems that are able to learn automatically, ie they are able to find complex patterns in large sets of data on their own. Types of... by Diego Calvo | Sep 12, 2018 | Apache Hadoop, Big Data
Yarn definition Yarn (Yet Another Resource negotiator) is a data operating system and distributed Resource Manager, also known as Hadoop 2 as it is the evolution of Hadoop Map-Reduce. The most significant changes of Hadoop 2 over Hadoop 1 is that the thread technology... by Diego Calvo | Sep 5, 2018 | Apache Spark, Big Data, Scala-example
Example: Grouping data in a simple way Example where people table is grouped by last name. df.groupBy(“surname”).count().show() +——-+—–+ |surname|count| +——-+—–+ | Martin| 1| | Garcia| 3|...