Once you have a cluster up and running, you are now responsible for managing it and making sure the cluster is available all the time. Single point of failure (SPOF) is a system in the cluster, if it fails, it causes the whole cluster to be nonfunctional.
Hadoop, being a distributed system, is not only prone to failures, but is expected to fail. Not all Hadoop components currently support highly availability configurations. The master nodes such as the namenode and jobtracker are single points of failure. We should plan or Namenode HA and Job tracker HA in the standard configured cluster since cluster become unavailable in the following cases:
- Unplanned event – hardware or software crash
- Planned Events like maintenance or upgrades
To manage HDFS efficiently and also handle the single point of failure in Hadoop cluster the two of the following points are considered to be configured HA.
- HDFS high availability using Quorum-based storage and storage using Network File System (NFS)
- Jobtracker high availability
In the coming steps we will implement HDFS high availability by configuring two namenodes and jobtracker high availability by configuring two jobtrackers in an active/passive configuration. Both HDFS and jobtracker HA can be configured for manual failover or automatic failover using Apache ZooKeeper.