by Diego Calvo | Jul 5, 2018 | Apache Spark, Big Data
Spark Streaming definition Apache Spark Streaming is an extension of the Spark core API, which responds to real-time data processing in a scalable, high-performance, fault-tolerant manner. Spark Sreaming live was developed by the University of California at Berkeley,... by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data
Flink definition Apache Flink is a native low-latency data flow processing engine that provides communication and fault tolerance data distribution capabilities. Flink was developed in Java and Scala by the Technical University of Berlin and is currently the start-up... by Diego Calvo | Jul 5, 2018 | Big Data
The main data storage systems for BIG data ecosystems are: HDFS: Storage System par excellence of Hadoop. Apache Hbase: A column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. S3: Amazon storage System,... by Diego Calvo | Jul 5, 2018 | Big Data
Data ingest tools for BIG data ecosystems are classified into the following blocks: Apache Nifi: An ETL tool that takes care of loading data from different sources, passes it through a process flow for treatment, and dumps it into another source. Apache Sqoop:... by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data
Data visualization tools for BIG data ecosystems are classified in the following blocks: Notebooks Jupyter Zeppelin Graphic libraries Google Chart D3. js Plotty Graphic analysis Tools Kibana Shiny Video Recorder Loggy Proprietary tools Splunk Tableau QLink Google... by Diego Calvo | Jul 5, 2018 | Big Data
Messaging systems provide a communication channel between applications of the big data ecosystem, this systems usually implement queue systems, such as: Apache KAFKA: Message intermediation system based on the publisher/subscriber model. RabbitMQ: Message Queuing...