Diego Calvo, Autor en Diego Calvo

Apache Spark libraries and installation in Python

by Diego Calvo | Nov 23, 2017 | Apache Spark, Python-example

Prerequisites Java 6 or higher Python Interpreter 2.6 or higher Installation Install is very simple just download the latest version of Spark and unzip wget http://apache.rediris.es/spark/spark-1.5-0/spark-1.5.0-bin-hadoop2.6.tgz tar -xf spark-1.5.0-bin-hadoop2.6.tgz...

Big Data definition

by Diego Calvo | Nov 21, 2017 | Big Data

Big Data definition The term big Data refers to a volume of data that exceeded the capabilities of the software commonly used to view capturing, administering, and processing data. As the computing capacity is getting higher and the number from which is considered a...

Lambda Architecture (batch and stream processing combination)

by Diego Calvo | Nov 15, 2017 | Big Data

Before we focus on the Lambda architecture it is advisable to specify the two types of data processing that compose it: The processing of data in batch mode, is one that allows us to process data volumes in spaced times, for example every 10 minutes, 1 hour or daily....

Create a directory if it does not exist in Python

by Diego Calvo | Jul 20, 2017 | Python-example

Example of how to create a directory in Python: import os directory = “/Users/diego/test/” try: os.stat(directory) except: os.mkdir(directory)

Business Intelligence

by Diego Calvo | Jul 18, 2017 | Business Intelligence

Business intelligence is the set of strategies, applications, data, products, technologies and technical architecture, aimed at obtaining and managing knowledge, through the analysis of existing data in the organization. It should be noted that the creation of a data...