In the next sections, we will be covering how to setup hadoop core components such as HDFS, YARN (MRv2) using Cloudera Manager.
The following are the steps for installing HDFS.
- Login to Cloudera Manager(http://cdhnew0:7180)
- Default user/password: admin/admin
- Click on add service
- Select service “HDFS”
- Configure Namenode, Secondary Namenode, Datanodes as well as clients designated to specific hosts
- Next, we shall be able to make changes in some important parameters like block size, namenode and datanode directories etc.
- Install the configured components with following steps done by Cloudera Manager.
- Deploy client configurations – E.g.: core-site.xml, hdfs-site.xml
- Create the mount point directories that are configured
- Format Namenode
- Start HDFS and create some directories (/tmp)
- The HDFS components should be installed successfully and started
- Validate by copying a file from client/gateway node
Components of HDFS ecosystem
- We need to configure the Namenode, secondary namenode, datanodes and balancer as part HDFS component. Here Namenode and secondary namenode acts as masters and data nodes acts as slaves.
- Datanodes sends heartbeat to Namenode at frequent and regular intervals. As part of heartbeat additional information about available storage will be sent to Namenode
- Balancer is a service which keep track of data nodes to distribute data appropriately across all the nodes.
- Namenode Web Interface : http://<namenode-server>:50070
Features of HDFS
- Fault tolerant – HDFS uses mirroring and dfs.replication controls how many copies should be made
- Logical Distributed File System – The files are divided into blocks based up on dfs.blocksize and stored in the servers designated as datanodes.
- Rack Awareness – HDFS is rack aware, at least one copy will be made in each of the racks in a 2 rack hadoop cluster