Diego Calvo, Autor en Diego Calvo

Apache Flink (batch & streaming processing)

by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data

Flink definition Apache Flink is a native low-latency data flow processing engine that provides communication and fault tolerance data distribution capabilities. Flink was developed in Java and Scala by the Technical University of Berlin and is currently the start-up...

Big data-security tools, machine learning, labelling,…

by Diego Calvo | Jul 5, 2018 | Trick

Security Tools Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Apache Sentry is a system for applying functionality-based authorization of fine granularity to data and metadata stored in a Hadoop...

Massive data storage systems – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

The main data storage systems for BIG data ecosystems are: HDFS: Storage System par excellence of Hadoop. Apache Hbase: A column-oriented database management system that runs on the HDFS and is typically used to distribute data sets. S3: Amazon storage System,...

Data ingestion Tools – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

Data ingest tools for BIG data ecosystems are classified into the following blocks: Apache Nifi: An ETL tool that takes care of loading data from different sources, passes it through a process flow for treatment, and dumps it into another source. Apache Sqoop:...

Big data-Data visualization tools

by Diego Calvo | Jul 5, 2018 | Apache Hadoop, Big Data

Data visualization tools for BIG data ecosystems are classified in the following blocks: Notebooks Jupyter Zeppelin Graphic libraries Google Chart D3. js Plotty Graphic analysis Tools Kibana Shiny Video Recorder Loggy Proprietary tools Splunk Tableau QLink Google...