IT Versity

making IT resourceful

  • Courses
  • Blog
  • Labs
  • Discuss

October 9, 2017 By Vinod Nerella Leave a Comment

Setting up prerequisites for all nodes

Topic Progress:
← Back to Lesson

All these following prerequisites needs to be executed on all the nodes of cluster. This section explains the tasks you must perform before deploying CDH on a cluster.

  • Define host names
  • Passwordless ssh must be setup from master to slaves
  • Disable Selinux
  • Setup NTP
  • Disable firewall or iptables and ip6tables
  • For Hadoop users we need to increase nofile and nproc limits
  • Set umask to 022
  • Disable swappiness

Once we complete setting up each of the above prerequisite, we should validate on each node.

  • Define host names

Make sure the /etc/hosts file on each system contains the IP addresses and fully qualified domain names (FQDN) of all the members of the cluster.

Sample lines to add in /etc/hosts file

192.168.100.210 cdhn0 cdhn0
192.168.100.211 cdhn1 cdhn1
192.168.100.212 cdhn2 cdhn2
192.168.100.213 cdhn3 cdhn3
192.168.100.214 cdhn4 cdhn4
192.168.100.215 cdhn5 cdhn5

Note: Not FQDN domains also work, prefer to have FQDN in production clusters.

 

Verifying or Troubleshooting the hostname setup:

In order verify or troubleshoot in case if you run into any issues related hostname check failures you can use the following commands.

  1. Run “uname -a” and “hostname” commands and verify whether both are matching.
uname -a
hostname

 

  • Passwordless ssh must be setup from master to slaves

Refer the topic “Learn ssh” to know more about the ssh and related commands. In a cluster we shouble to setup passwordless connection from main node to all nodes for setting up required services in the cluster.

  • In order to generate private and public key for the machine we can use the following command.
ssh-keygen
  • To copy the key from the master to slave (other nodes in the cluster), we use ssh-copy id command.
ssh-copy-id root@slave.hadoop.com
  • How to validate or test?
Run `ssh user@slave-hostname`

It should get connected without prompting for a password.

  • Disable Selinux

SELinux enforces some security policies i.e.. Mandatory Access Control (MAC) security structure at kernel level. In RHEL (E.g.: Cent OS), SELinux feature enabled by default, because of which some of your application on server may not actually support security mechanism. So to make our application run normally, we need to disable or turnoff SELinux in the server. The follwing are commands related to SELinux.

SELinux commands usage:

  • To check whether SELinux is enabled /enforcing or disabled
sestatus
  • To disable SELinux we have to change “/etc/selinux/config” file by making SELINUX = disabled or you can run the following commands also to disable.
sudo sed -i 's/^\(SELINUX\s*=\s*\).*$/\1disabled/' /etc/selinux/config
  • We must restart the system once you change the SELinux status. We can use the following command.
sudo init 6 - must restart the system to take effect
  • Setup NTP

NTP is an Internet protocol used to synchronize the clocks of computers to some time reference.

There are three basic requirements you must satisfy before joining the NTP Pool Project:

  • Your server must have a static IP address.
  • Your server must have a permanent and stable internet connection.
  • Your IP address most not change, or only changes infrequently (once a year or less).

Follow these steps to install NTP service on all nodes:

  • Install NTP service
sudo yum -y install ntp
  • To run NTP as system service
sudo systemctl enable ntpd.service
  • Start NTP
sudo systemctl start ntpd.service
  • The hardware clock is independent of the operating system you use and works even when the machine is shut down. This utility is used for displaying the time from the hardware clock. set the hardware clock to the current system time by using this command:
sudo hwclock --systohc
  • Disable firewall or iptables and ip6tables

Iptables or firewalld are applications to configure the security by packet filtering or defining security tables provided by Kernel. Either of the feature will be enabled on server, which may cause some security issues while communicating with other nodes in a cluster. So we should stop the iptables or firewalld service as prerequisite before installing Cloudera services.

  1. To stop iptables
sudo service iptables stop or sudo service firewalld stop
  • To check if iptables are running or stopped
sudo service iptables status or sudo service firewalld status
  • Increase nofile and nproc limits

Linux is a multi user system, and kernel refers to files using descriptors. The default values for maximum open files and maximum processes in Linux are not enough. So we can manually Increase both values for some of users in cluster. We can use the following commands to do.

To Increase maximum open files for hdfs, mapred and hbase users

sudo /bin/sh -c 'echo hdfs - nofile 32768 >> /etc/security/limits.conf'
sudo /bin/sh -c 'echo hbase - nofile 32768 >> /etc/security/limits.conf'
sudo /bin/sh -c 'echo mapred - nofile 32768 >> /etc/security/limits.conf'

To Increase maximum processes

sudo /bin/sh -c 'echo hbase - nproc 32768 >> /etc/security/limits.conf'
sudo /bin/sh -c 'echo mapred - nproc 32768 >> /etc/security/limits.conf'
sudo /bin/sh -c 'echo hdfs - nproc 32768 >> /etc/security/limits.conf'

  Note: Cloudera Manager will do this for you.

  • Set umask to 022

When user create a file or directory under Linux or UNIX, she create it with a default set of permissions. The default umask 002 used for normal user. With this mask default directory permissions are 775 and default file permissions are 664. The default umask for the root user is 022 result into default directory permissions are 755 and default file permissions are 644.

To set umask of 022 permanently which allows only you to write data, but anyone can read data use the following steps

Open the profile file using the vi command the verify or change the lines as below.

vi /etc/profile

if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then
     umask 022
 else
     umask 022

Now next time you login into a terminal or different session the new umask value would be applied which you can verify using the below command

$ umask
 022
  • Disable swappiness

Swapping is a memory management in the OS, which uses to write blocks to disk and read blocks into memory when needed. Since for Hadoop, IO should be maximized to be able to manage JVM memory allocation we will limit or disable the swapping at kernel level. This can be done use the folllwing commands.

sudo sysctl vm.swappiness=1
sudo /bin/sh -c 'sudo echo "vm.swappiness = 1" >> /etc/sysctl.conf'

Reboot machine using following command

init 6

Install the Oracle JDK

Install the Oracle Java Development Kit (JDK) on the Cloudera Manager Server host. The JDK is included in the Cloudera Manager 5 repositories. After downloading and editing the repo or list file, install the JDK as follows:

$ sudo yum install oracle-j2sdk1.7

Validating pre-requisites

  • Check passwordless ssh from master to other nodes

sudo ssh  <user>@<hostname>/<ip> it should login without passwd

  • Check /etc/hosts file if all nodes exist in that file
cat  /etc/hosts
  • Check selinux config file(selinux=disable) -setenforce
sestatus
  • Check the ntpd status (server ntp status) – (ps -ef|grep ntp)
  • Check umask – umask ->022
umask
  • Check firewall or iptables
sudo service firewalld status
sudo service iptables status
  • Check swappiness
cat  /proc/sys/vm/swappiness

Related

← Previous Topic

Filed Under: Uncategorized

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Socially Connected

  • Facebook
  • Google+
  • Linkedin
  • Twitter
  • YouTube
Basics of Linux for Cluster set up and support
  • Basic Linux Commands
  • Learn ssh
  • File editing with VI or VIM
  • Understanding repository management - yum
  • Adding JBOD
  • Setting up apache or httpd server
Setting up Cloudera Manager
  • Set up a local Cloudera Repositories
  • Setting up external DataBase for Cloudera Manager
  • Setting up prerequisites for all nodes
  • Preparing to run Cloudera Manager with custom data base
  • Setting up CDH using Cloudera Manager
  • Run Cloudera Manager Installer
Installation Cluster Services / CDH Components
  • Setting up Zookeeper
  • Setting up Hadoop Core components - HDFS
  • Setting up YARN (MRv2)
  • Setting up Hive
  • Installation of Cloudera Manager Services
Configuring High Availability (HA)
Implementing Security Using Kerberos
Managing the cluster to support day-to-day operations
Secure - to meet goals defined by security policy
Return to Big Data Cluster administration – Cloudera

Copyright © 2018 · Education Pro Theme On Genesis Framework · WordPress · Log in