Big Data Archivos - Page 3 of 8

Scala Dataset

by Diego Calvo | Jul 21, 2018 | Apache Spark, Big Data, Scala-example

Creating datasets RDD Simple to Dataset Example of creating a dataset from a RDD val rdd = sc.parallelize(List(1,2,3,4,5)) val ds = spark.createDataset(rdd) ds.show() +—–+ |value| +—–+ | 1 | | 2 | | 3 | | 4 | | 5 | +—–+ Classes to...

Scala Lists

by Diego Calvo | Jul 20, 2018 | Apache Spark, Big Data, Scala-example

Create lists Examples that define the lists to be used in the rest of the sections of the post val list1 = 1::2::3::4::5::Nil val list2 = List(1,2,3,4,5) val list3 = List.range(1,6) val list4 = List.range(1,6,2) val list5 = List.fill(5)(1) val list6 =...

HDFS – compress & decompress in Scala

by Diego Calvo | Jul 20, 2018 | Apache Spark, Big Data, Scala-example

Displays a number of examples of file compression and decompression in different formats of both rendering and Compression. Compress Json Files val rdd = sc.parallelize( Array(1, 2, 3, 4, 5) ) // Define RDD val df = rdd.toDF() // df transform...

File formats – Big Data

by Diego Calvo | Jul 19, 2018 | Big Data

Format: Textfile The Textfile format is the simplest storage format of all and is the default for tables in Hadoop systems. It is only plain text where the fields are stored separated by a delimiter and each register is separated by a line. Within this format...

Apache Sqoop

by Diego Calvo | Jul 6, 2018 | Big Data

Sqoop definition Apache Sqoop is a command line tool developed to transfer large volumes of data from databases to relate to Hadoop, hence its name that comes from the merger of SQL and Hadoop. Specifically transforms data relating to Hive or Hbase in one direction...

Massive data search tools – Big data

by Diego Calvo | Jul 5, 2018 | Big Data

ElasticSearch: is a real-time open-source mass data Search server that provides indexed and distributed Lucene-based storage. It provides all the Lucene search power for full-text searches, but simplifies queries through its to RestFul Web interface. Apache SOLR is a...