As part of this topic let us below tools in the lab.
- Sqoop – export and import tool to copy data between traditional databases and HDFS
- Hive – a query language tool. Syntax and semantics look like SQL
- Sqoop is used to copy data between traditional databases and HDFS
- It use Map Reduce to connect to source database using JDBC and then copy data from one table at a time in distributed fashion.
- Let us see how a sample sqoop command look like to confirm it is available for our usage
- Sqoop require a mysql database
- As part of Big Data, we process structured as well as unstructured data
- As SQL is the popular in processing the structured data, the open source community of Hadoop eco system came up with Hive. It follows SQL syntax.
- We can create tables, load data into tables, process data using queries which follow ASCII syntax
- Let us see how to create table, load data into table and run simple query.