Sqoop definition
Apache Sqoop is a command line tool developed to transfer large volumes of data from databases to relate to Hadoop, hence its name that comes from the merger of SQL and Hadoop. Specifically transforms data relating to Hive or Hbase in one direction and the other from HDFS to relational data such as MySQL, Oracle, Postgress or a data warehouse.
The transfer process consists of reading row by row each table in the database and importing them to HDFS, the output of these is a set of files that can be in CSV, Avro, binary or sequence format.
Sqoop2 has recently been created to improve the usability of Sqoop, providing a Web application and greater integration with Oozie.
Features
- Provides a Java API to perform ingest processing. Allowing to program applications that perform some type of transformation on the data.
- Provides commands for listing tables and schemas.
- Supports incremental data loads.
- Provides many connectors such as FTP, JDBC, Kafka, Kite, SFTP
Example of how to pass data between MySQL and HDFS using Sqoop
Prerequisites
Have installed Hortonworks (Virtual Box) for Spark
Creating a MySQL database table
A database is created with a table on which to perform the tests, the following commands will be used.
accessing MYSQL
$ mysql
mysql -u root-P
Note: Keep in mind that the MySQL and system wash can be different.
Consult database
mysql> show databases;
Create Database
mysql> create database myddbb;
Use database
mysql> use myddbb;
Create base table
mysql> CREATE TABLE mytable (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR (30) NOT NULL,
age INTEGER (30),
salary INTEGER (30),
PRIMARY KEY (id));
CREATE TABLE 2
mysql> CREATE TABLE mytable2 (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR (30) NOT NULL,
age INTEGER (30),
salary INTEGER (30),
PRIMARY KEY (id));
Insert Data
mysql> INSERT INTO mytable (name, age, salary) values
("Peter", 24, 21000), ("Maria", 26, 24000), ("John", 28, 25000), ("Louis", 35, 28000), ("Monica", 42, 30000), ("Rose", 43, 25000), ("Susana", 45, 39000);
Note: If you have permission problems enter MySQL and give all permissions:
grant all privileges on *.* to 'root'@'localhost' IDENTIFIED BY 'MySQL_Key' WITH GRANT OPTION;
Exit MYSQL
mysql> exit;
Load MySQL data to HDFS
Example of loading data from the table “MyTable” of the Database “MIBBDD” to the folder HDFs name “Mitabla_hdfs”
$ sqoop Import \ --connect jdbc:mysql://localhost/myddbb \ --username = root -P \ --table = mytable \ --driver = com.mysql.jdbc.driver \ --target-dir =/my_hdfs_table \ --fields-terminated-by = ',' \ --lines-terminated-by '\n'
Target-dir: File HDFS where it is stored.
Table: Identifies the table to be copied.
Clear-Staging-table: Indicates that past data can be deleted.
VERBOSE: Prints additional information to facilitate debugging.
Fields-terminated-by: defining the delimiter.
Loading data from HDFS to MySQL
Example of loading data from the HDFS folder named “my_hdfs_table” to the “mytable2” table in the “myddbb” database.
$ sqoop export --connect jdbc: mysql://localhost/myddbb --username = root -P --table = mytable2 --export-dir =/my_hdfs_table -m 1
Note: If you have problems “set $ ACCUMULO_HOME to the root in your ACCUMULO intallation”, it can be avoided with:
$ ACCUMULO_HOME = '/var/lib/accumulo' $ Export ACCUMULO_HOME $ sudo mkdir/var/lib/accumulo
To practice with Sqoop then consult: “Examples of Sqoop”, in this section compiled many examples that can serve you useful.
Source: official Shell documentation
0 Comments