by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data
Flink definition Apache Flink is a native low-latency data flow processing engine that provides communication and fault tolerance data distribution capabilities. Flink was developed in Java and Scala by the Technical University of Berlin and is currently the start-up... by Diego Calvo | Jul 5, 2018 | Trick
Security Tools Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Apache Sentry is a system for applying functionality-based authorization of fine granularity to data and metadata stored in a Hadoop... by Diego Calvo | Jul 5, 2018 | Big Data
The main data storage systems for BIG data ecosystems are: HDFS: Storage System par excellence of Hadoop. Apache Hbase: A column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. S3: Amazon storage System,... by Diego Calvo | Jul 5, 2018 | Big Data
Data ingest tools for BIG data ecosystems are classified into the following blocks: Apache Nifi: An ETL tool that takes care of loading data from different sources, passes it through a process flow for treatment, and dumps it into another source. Apache Sqoop:... by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data
Data visualization tools for BIG data ecosystems are classified in the following blocks: Notebooks Jupyter Zeppelin Graphic libraries Google Chart D3. js Plotty Graphic analysis Tools Kibana Shiny Video Recorder Loggy Proprietary tools Splunk Tableau QLink Google...