Big Data definition

by Diego Calvo | Nov 21, 2017 | Big Data | 0 comments

Big Data definition

The term big Data refers to a volume of data that exceeded the capabilities of the software commonly used to view capturing, administering, and processing data.

As the computing capacity is getting higher and the number from which is considered a dataset as big data is increasing, in 2012 already fixed the border in 12 terabytes.

All this would not have been possible if not for Hadoop, a system that allows to implement on hardware at a relatively low cost, for the analysis of large volumes of data of the three types of existing data (structured, unstructured and Semi-structured).

It is worth mentioning the importance that Spark is acquiring as an evolution of Hadoop.

Time evolution of Big data tools

Temporal evolution of big data

Big Data processing

The processing of large volumes of data is divided into several phases, and each one of them is shown as well as the most important tools for each of them:

Intake

Kafka, Flume, RabbitMQ: Kinesis, Azure Event Hubs, Google Pub/Sub, Sokets TCP,…

Transformation

Hive, Hue, Impala, Sqoop, Pig, HCatalog, Nifi,…

Processing

Map-Reduce, Storm, Samza, IBM InfoSphere, S4, complexion, Spark streaming, Flink,…

Storage

HDFS, HBase, S3, Kudo, ElasticSearch, Cassandra, MongoDB, MariaDB,…

Display

Jupyter, Zeppelin, Google Chart, D3. js, Plotty, Kibana, Shiny, Recorda, Loggy, Splunk, Tableau, QLink, Google Cloud Platform, Power Bi,…

Big Data management Platforms

Commercial platforms

Cloudy
Hortonworks
MapR
Pivotal

Commercial platforms in the cloud

Amazon Web Service
Microsoft Azure
Google Cloud Platform
IBM InfoSphere

Big Data definition