All these following prerequisites needs to be executed on all the nodes of cluster. This section explains the tasks you must perform before deploying CDH on a cluster.
- Define host names
- Passwordless ssh must be setup from master to slaves
- Disable Selinux
- Setup NTP
- Disable firewall or iptables and ip6tables
- For Hadoop users we need to increase nofile and nproc limits
- Set umask to 022
- Disable swappiness
Once we complete setting up each of the above prerequisite, we should validate on each node.
Define host names
Make sure the /etc/hosts file on each system contains the IP addresses and fully qualified domain names (FQDN) of all the members of the cluster.
Sample lines to add in /etc/hosts file
192.168.100.210 cdhn0 cdhn0 192.168.100.211 cdhn1 cdhn1 192.168.100.212 cdhn2 cdhn2 192.168.100.213 cdhn3 cdhn3 192.168.100.214 cdhn4 cdhn4 192.168.100.215 cdhn5 cdhn5
Note: Not FQDN domains also work, prefer to have FQDN in production clusters.
Verifying or Troubleshooting the hostname setup:
In order verify or troubleshoot in case if you run into any issues related hostname check failures you can use the following commands.
- Run “uname -a” and “hostname” commands and verify whether both are matching.
uname -a hostname
Passwordless ssh must be setup from master to slaves
Refer the topic “Learn ssh” to know more about the ssh and related commands. In a cluster we shouble to setup passwordless connection from main node to all nodes for setting up required services in the cluster.
- In order to generate private and public key for the machine we can use the following command.
- To copy the key from the master to slave (other nodes in the cluster), we use ssh-copy id command.
- How to validate or test?
Run `ssh user@slave-hostname`
It should get connected without prompting for a password.
SELinux enforces some security policies i.e.. Mandatory Access Control (MAC) security structure at kernel level. In RHEL (E.g.: Cent OS), SELinux feature enabled by default, because of which some of your application on server may not actually support security mechanism. So to make our application run normally, we need to disable or turnoff SELinux in the server. The follwing are commands related to SELinux.
SELinux commands usage:
- To check whether SELinux is enabled /enforcing or disabled
- To disable SELinux we have to change “/etc/selinux/config” file by making SELINUX = disabled or you can run the following commands also to disable.
sudo sed -i 's/^\(SELINUX\s*=\s*\).*$/\1disabled/' /etc/selinux/config
- We must restart the system once you change the SELinux status. We can use the following command.
sudo init 6 - must restart the system to take effect
NTP is an Internet protocol used to synchronize the clocks of computers to some time reference.
There are three basic requirements you must satisfy before joining the NTP Pool Project:
- Your server must have a static IP address.
- Your server must have a permanent and stable internet connection.
- Your IP address most not change, or only changes infrequently (once a year or less).
Follow these steps to install NTP service on all nodes:
- Install NTP service
sudo yum -y install ntp
- To run NTP as system service
sudo systemctl enable ntpd.service
- Start NTP
sudo systemctl start ntpd.service
- The hardware clock is independent of the operating system you use and works even when the machine is shut down. This utility is used for displaying the time from the hardware clock. set the hardware clock to the current system time by using this command:
sudo hwclock --systohc
Disable firewall or iptables and ip6tables
Iptables or firewalld are applications to configure the security by packet filtering or defining security tables provided by Kernel. Either of the feature will be enabled on server, which may cause some security issues while communicating with other nodes in a cluster. So we should stop the iptables or firewalld service as prerequisite before installing Cloudera services.
- To stop iptables
sudo service iptables stop or sudo service firewalld stop
- To check if iptables are running or stopped
sudo service iptables status or sudo service firewalld status
Increase nofile and nproc limits
Linux is a multi user system, and kernel refers to files using descriptors. The default values for maximum open files and maximum processes in Linux are not enough. So we can manually Increase both values for some of users in cluster. We can use the following commands to do.
To Increase maximum open files for hdfs, mapred and hbase users
sudo /bin/sh -c 'echo hdfs - nofile 32768 >> /etc/security/limits.conf' sudo /bin/sh -c 'echo hbase - nofile 32768 >> /etc/security/limits.conf' sudo /bin/sh -c 'echo mapred - nofile 32768 >> /etc/security/limits.conf'
To Increase maximum processes
sudo /bin/sh -c 'echo hbase - nproc 32768 >> /etc/security/limits.conf' sudo /bin/sh -c 'echo mapred - nproc 32768 >> /etc/security/limits.conf' sudo /bin/sh -c 'echo hdfs - nproc 32768 >> /etc/security/limits.conf'
Note: Cloudera Manager will do this for you.
Set umask to 022
When user create a file or directory under Linux or UNIX, she create it with a default set of permissions. The default umask 002 used for normal user. With this mask default directory permissions are 775 and default file permissions are 664. The default umask for the root user is 022 result into default directory permissions are 755 and default file permissions are 644.
To set umask of 022 permanently which allows only you to write data, but anyone can read data use the following steps
Open the profile file using the vi command the verify or change the lines as below.
vi /etc/profile if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then umask 022 else umask 022
Now next time you login into a terminal or different session the new umask value would be applied which you can verify using the below command
$ umask 022
Swapping is a memory management in the OS, which uses to write blocks to disk and read blocks into memory when needed. Since for Hadoop, IO should be maximized to be able to manage JVM memory allocation we will limit or disable the swapping at kernel level. This can be done use the folllwing commands.
sudo sysctl vm.swappiness=1 sudo /bin/sh -c 'sudo echo "vm.swappiness = 1" >> /etc/sysctl.conf'
Reboot machine using following command
Install the Oracle JDK
Install the Oracle Java Development Kit (JDK) on the Cloudera Manager Server host. The JDK is included in the Cloudera Manager 5 repositories. After downloading and editing the repo or list file, install the JDK as follows:
$ sudo yum install oracle-j2sdk1.7
- Check passwordless ssh from master to other nodes
sudo ssh <user>@<hostname>/<ip> it should login without passwd
- Check /etc/hosts file if all nodes exist in that file
- Check selinux config file(selinux=disable) -setenforce
- Check the ntpd status (server ntp status) – (ps -ef|grep ntp)
- Check umask – umask ->022
- Check firewall or iptables
sudo service firewalld status sudo service iptables status
- Check swappiness