Prerequisites
See the section “commands to manipulate HDFS files”
Databases
Before you can use it it is necessary to start the service of Hadoop
And start the service of HIVE to execute commands, to do so find the folder HIVE and once there run:
$ cd hive $ bin/hive
Check Existing DB
hive> SHOW DATABASES;
Create db
Being inside we can execute the sentence to create a database, as an example:
hive> CREATE DATABASE diegocalvo
Select the database to work with
hive> use diegocalvo;
Tables
Before manupular tables of a databases in the same state, indicate to the system that databases to use, using the command “use NOMBRE_BBDD”
See tables
hive> show tables;
Create External Table
But the Hive service is initiated to perform all the steps indicated in the creation of the database and then:
CREATE EXTERNAL TABLE localization (localizationID INT, zone STRING, neighborhood STRING) COMMENT 'location table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hadoop/localization';
CREATE EXTERNAL TABLE consumption (TIMESTAMP date, company STRING, consumptionID INT) COMMENT ' Table of consumption' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hadoop/consumption';
Create internal table for joins
CREATE EXTERNAL TABLE union (date TIMESTAMP, company STRING, ID INT, zone STRING, neighborhood STRING) COMMENT 'table of union between consumption and localization' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET
Insert data into internal table
INSERT INTO TABLE union SELECT C.date, C.company, C.ID, L.Zone, L.neighborhood FROM consumption C INNER JOIN location L ON C.consumptionID = L.localizationID
Queries: Counting repeated days of the week
SELECT count(*), date_format(date, 'EEEE') FROM union GROUP BY date_format(date, 'EEEE');
Queries: The 10 most active neighborhoods
SELECT count(*) AS cnt, neighborhood FROM union GROUP BY neighborhood SORT BY cnt DESC LIMIT 10;
Queries: Count how many items for each day of the month
SELECT DayOfMonth(date), count(*) FROM union GROUP BY dayofmonth(date);
Queries: More Active time slots
SELECT hour(date), COUNT(*) as cnt FROM union GROUP BY hour(date) ORDER BY cnt DESC LIMIT 5;
0 Comments