Diego Calvo, Autor en Diego Calvo

File formats – Big Data

by Diego Calvo | Jul 19, 2018 | Big Data

Format: Textfile The Textfile format is the simplest storage format of all and is the default for tables in Hadoop systems. It is only plain text where the fields are stored separated by a delimiter and each register is separated by a line. Within this format...

Apache Sqoop

by Diego Calvo | Jul 6, 2018 | Big Data

Sqoop definition Apache Sqoop is a command line tool developed to transfer large volumes of data from databases to relate to Hadoop, hence its name that comes from the merger of SQL and Hadoop. Specifically transforms data relating to Hive or Hbase in one direction...

Cluster management tools – Big Data

by Diego Calvo | Jul 5, 2018 | Trick

Big Data application and resource managers Hadoop Map-Reduce is a distributed resource manager and data processing. Provides a scheduling infrastructure that provides algorithms for performing the distributed calculations. YARN is an operating data system and...

Massive data search tools – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

ElasticSearch: is a real-time open-source mass data Search server that provides indexed and distributed Lucene-based storage. It provides all the Lucene search power for full-text searches, but simplifies queries through its to RestFul Web interface. Apache SOLR is a...

Spark Streaming (Batch & Streaming processing )

by Diego Calvo | Jul 5, 2018 | Apache Spark, Big Data

Spark Streaming definition Apache Spark Streaming is an extension of the Spark core API, which responds to real-time data processing in a scalable, high-performance, fault-tolerant manner. Spark Sreaming live was developed by the University of California at Berkeley,...