Why lab on cloud?
- While setting up cloudera quickstart VM is the easiest way of setting up the lab, it requires expensive laptops with higher configuration (i7 Quad Core, 16 GB RAM)
- Some people, especially students might not be able to afford it. Hence it will be better to use cloud platforms such as Amazon Web Service (AWS), Google Cloud, Microsoft Azure etc. Azure and Google Cloud provide some credits which can be leveraged to set up the environment.
- People need to be diligent to understand the cost associate with cloud provider and make sure the instance or server is stopped when not in use
- Sign up for cloud account
- Setup prerequisites for cloud account
- Steps to provision host on cloud, for AWS
- Create VPC, key pair and take care of other prerequisites
- Provision ec2 instance with 15 GB RAM, 4 or 8 cores and centos 6.5
- Open up ports in security group (open all for convenience, in production only necessary ports should be exposed).
- Make sure instance is running, resize root file system using
- Install cloudera manager
- Login to host using ssh command
wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin chmod +x cloudera-manager-installer.bin sudo ./cloudera-manager-installer.bin
- Launch cloudera manager and install CDH
- Make sure cloudera manager is successfully installed
- Get public DNS from AWS or any other cloud provider
- Go to browser and run
- Accept enterprise trial license for 60 days
hostname -fcommand on the cloud host
- In the wizard use private dns retrieved from previous command and then follow the wizard
- Validate all the tools
- Setup retail_db and get-logs to practice sqoop and flume
Here is the playlist which covers all the steps to set up single node lab on public cloud (eg: AWS). Setup process is almost same for cloud host of any vendor