Let us understand briefly about Flume.
- It is a data ingestion tool
- It can get log messages from log files, syslog and many other sources
- No design changes are required
Documentation and Simple Example
- One or more sources
- One or more sinks
- One channel for each sink
In the simple example we will see flume agent configuration with
- One source – netcat web service
- One sink – logger
- One channel – memory
Integration to HDFS – Introduction
HDFS is one of the most common sinks to which data is pushed to by Flume agents. There are several properties that can be override while pushing data to HDFS.
- File Suffix
- File Prefix
- Roll Properties
- and many more
Setting up data
Let us see how we can set up data to get simulated log messages pushed to HDFS. We use gen_logs application to continuously generate log messages in /opt/gen_logs/logs/alert.log file.