by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data
The Big data ecosystems data processing frameworks are classified in the following blocks: Batch Processing Hadoop Map-Reduce: Batch or batch processing engine. Real-time processing Apache Storm Apache Samza IBM InfoSphere Apache S4 (Yahoo) Apache complexion Hybrid... by Diego Calvo | Jul 5, 2018 | Big Data
Storm definition Apache Storm is a low-latency, high-availability real-time distributed computing system based on master-slave architecture. Storm is ideal for working with data that need to be analyzed in real time where latency is a variable to take into account, an... by Diego Calvo | Jul 3, 2018 | Apache Hadoop, Big Data
RabbitMQ definition RabbitMQ is an MQ Message Queuing system that allows you to communicate to a multitude of actors in a fast, secure, asynchronous and reliable way. RabbitMQ acts as a middleware between producers and consumers of messages. Features Guarantees the... by Diego Calvo | Jul 2, 2018 | Big Data
Flume definition Apache Flume is a distributed service that reliably and efficiently moves large amounts of data, especially logs. Ideal for online analytics applications in Hadoop environments. Flume has a simple and flexible architecture based on streaming data,... by Diego Calvo | Jun 29, 2018 | Big Data
Temporal evolution graphic line Temporal evolution line 2003 – Google File System 2004 – MapReduce: Simplified processing of big clusters. 2005 – Doug Cutting starts developing Hadoop. 2006 – Yahoo starts working on Hadoop. 2008 – Hadoop... by Diego Calvo | Jun 27, 2018 | Big Data
HBase definition HBase is a column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. HBase does not support a structured query language such as SQL, as opposed to relational database managers. The system provides...