by Diego Calvo | Jun 27, 2018 | Big Data
Kafka definition Apache Kafka is a message intermediation system based on the publisher/subscriber model. Kafka is considered a persistent, scalable, replicated, and fault-tolerant system. To these features is added the speed of readings and writes that make it an... by Diego Calvo | Jun 27, 2018 | Big Data
Definition of Nifi Apache NiFi is an integrated real-time data processing and logistics platform to automate data movement between different systems quickly, easily and securely. Apache Hifi is an ETL tool that is responsible for loading data from different sources,... by Diego Calvo | Jun 22, 2018 | Apache Spark, Big Data, Scala-example
IF Example of conditional use where it determines whether a note is approved or suspense var x = 6 if (x > = 5) { println (“approved”) } else {} println (“Substeno”) } X: Int = 6 Approved FOR Example of using “for” where... by Diego Calvo | Jun 22, 2018 | Big Data
ElasticSearch definition Elasticsearch is an open-source real-time search server that provides indexed and distributed storage based on Lucene. It provides all the Lucene search power for full-text searches, but simplifies queries through its to RestFul Web interface.... by Diego Calvo | Jun 20, 2018 | Big Data
Metric Scala Java Python R Type Compiled Compiled Interpreted Interpreted Based on JVM If If Not Not Cumbersome (-) (+) (-) (-) Length of code (-) (+) (-) (-) Productivity (+) (-) (+) (+) Scalability (+) (+) (-)... by Diego Calvo | Jun 20, 2018 | Apache Spark, Big Data
Spark definition Apache Spark is a distributed computing system of free software, which allows to process large sets of data on a set of machines simultaneously, providing horizontal scalability and fault tolerance. To meet these features provides a program...