Apache Flume is open source tool which can capture data generated by web logs in real time to Hadoop eco system.
- Distributed and reliable
- It can collect, aggregate and move large amounts of log data
- It is robust, fault tolerant and tunable
- Uses a simple extensible data model that allows for online analytic application
- Flume User Guide will be provided at the time of taking certification exam.
Topics covered as part of this lesson
The skills to transfer data between external systems and your cluster. This includes the following:
- Import data from a MySQL database into HDFS using Sqoop
- Export data to a MySQL database from HDFS using Sqoop
- Change the delimiter and file format of data during import using Sqoop
- Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
- Load data into and out of HDFS using the Hadoop File System (FS) commands