As part of this lesson following topics under HDFS are covered in detail
- Describe the function of HDFS Daemons
- Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
- Identify current features of computing systems that motivate a system like Apache Hadoop.
- Classify major goals of HDFS Design
- Given a scenario, identify appropriate use case for HDFS Federation
- Identify components and daemon of an HDFS HA-Quorum cluster
- Analyze the role of HDFS security (Kerberos)
- Determine the best data serialization choice for a given scenario
- Describe file read and write paths
- Identify the commands to manipulate files in the Hadoop File System Shell
- HDFS (two main layers)
- Namespace manages directories, files and blocks. It supports file system operations such as creation, modification, deletion and listing of files and directories.
- Block Storage
- Block Management maintains the membership of datanodes in the cluster. It supports block-related operations such as creation, deletion, modification and getting location of the blocks. It also takes care of replica placement and replication.
- Physical Storage stores the blocks and provides read/write access to it.
- Limitations with typical HDFS Configuration
- Namespace Scalability
- Federation is applicable to Namenode
- Namenode Scalability
- Better Performance
HDFS Federation – Implementation
Here are the high level details of implementation of HDFS Federation
- Multiple namespaces
- Multiple namenodes
- Same set of datanodes for all namespaces
- Block Pool
- Namespace Volume (Block Pool and associated Namespace)
- Self contained