Let us perform simple Sqoop import and understand execution life cycle.
Here are the list of control arguments that need to be used for simple sqoop import
- target-dir or warehouse-dir
Execution Life Cycle
Here is the execution life cycle of Sqoop.
- Connect to source database and get metadata
- Generate java file with metadata and compile to jar file
- Apply boundaryvalsquery to apply split logic, default 4
- Use split boundaries to issue queries against source database
- Each thread will have different connection to issue the query
- Each thread will get mutually exclusive sub set of the data
- Data will be written to HDFS in a separate file per thread
- By default sqoop import fails if target directory already exists
- Directory can be overwritten by using –delete-target-dir
- Data can be appended to existing directories by saying –append