Temporal Evolution of Big Data

Temporal evolution graphic line Temporal evolution line 2003 – Google File System 2004 – MapReduce: Simplified processing of big clusters. 2005 – Doug Cutting starts developing Hadoop. 2006 – Yahoo starts working on Hadoop. 2008 – Hadoop...

RDD definition

RDD definition RDD Resilient distributed datasets represents an immutable and partitioned collection of elements that can be operated in parallel. A RDD can be created or paralelizando a collection of data (list, dictionary,..) or loading it of an external storage...

Apache HBase

HBase definition HBase is a column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. HBase does not support a structured query language such as SQL, as opposed to relational database managers. The system provides...

Apache Kafka

Kafka definition Apache Kafka is a message intermediation system based on the publisher/subscriber model. Kafka is considered a persistent, scalable, replicated, and fault-tolerant system. To these features is added the speed of readings and writes that make it an...

Apache Nifi

Definition of Nifi Apache NiFi is an integrated real-time data processing and logistics platform to automate data movement between different systems quickly, easily and securely. Apache Hifi is an ETL tool that is responsible for loading data from different sources,...