by Diego Calvo | Jun 29, 2018 | Big Data
Temporal evolution graphic line Temporal evolution line 2003 – Google File System 2004 – MapReduce: Simplified processing of big clusters. 2005 – Doug Cutting starts developing Hadoop. 2006 – Yahoo starts working on Hadoop. 2008 – Hadoop... by Diego Calvo | Jun 27, 2018 | Apache Spark
RDD definition RDD Resilient distributed datasets represents an immutable and partitioned collection of elements that can be operated in parallel. A RDD can be created or paralelizando a collection of data (list, dictionary,..) or loading it of an external storage... by Diego Calvo | Jun 27, 2018 | Big Data
HBase definition HBase is a column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. HBase does not support a structured query language such as SQL, as opposed to relational database managers. The system provides... by Diego Calvo | Jun 27, 2018 | Big Data
Kafka definition Apache Kafka is a message intermediation system based on the publisher/subscriber model. Kafka is considered a persistent, scalable, replicated, and fault-tolerant system. To these features is added the speed of readings and writes that make it an... by Diego Calvo | Jun 27, 2018 | Big Data
Definition of Nifi Apache NiFi is an integrated real-time data processing and logistics platform to automate data movement between different systems quickly, easily and securely. Apache Hifi is an ETL tool that is responsible for loading data from different sources,...