Big Data Archivos - Page 4 of 8

Spark Streaming (Batch & Streaming processing )

by Diego Calvo | Jul 5, 2018 | Apache Spark, Big Data

Spark Streaming definition Apache Spark Streaming is an extension of the Spark core API, which responds to real-time data processing in a scalable, high-performance, fault-tolerant manner. Spark Sreaming live was developed by the University of California at Berkeley,...

Apache Flink (batch & streaming processing)

by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data

Flink definition Apache Flink is a native low-latency data flow processing engine that provides communication and fault tolerance data distribution capabilities. Flink was developed in Java and Scala by the Technical University of Berlin and is currently the start-up...

Massive data storage systems – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

The main data storage systems for BIG data ecosystems are: HDFS: Storage System par excellence of Hadoop. Apache Hbase: A column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. S3: Amazon storage System,...

Data ingestion Tools – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

Data ingest tools for BIG data ecosystems are classified into the following blocks: Apache Nifi: An ETL tool that takes care of loading data from different sources, passes it through a process flow for treatment, and dumps it into another source. Apache Sqoop:...

Big data-Data visualization tools

by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data

Data visualization tools for BIG data ecosystems are classified in the following blocks: Notebooks Jupyter Zeppelin Graphic libraries Google Chart D3. js Plotty Graphic analysis Tools Kibana Shiny Video Recorder Loggy Proprietary tools Splunk Tableau QLink Google...

Messaging Systems – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

Messaging systems provide a communication channel between applications of the big data ecosystem, this systems usually implement queue systems, such as: Apache KAFKA: Message intermediation system based on the publisher/subscriber model. RabbitMQ: Message Queuing...