Hadoop and Spark based virtual machine images are provided by most of the leading Big Data vendors
- Cloudera QuickStart VM
- Hortonworks Sandbox VM
- MapR Sandbox
- IBM Biginsights Docker container
- Oracle Big Data VM
All these virtual machine images comes with
- Linux based operating system
- All Hadoop and Spark based tools
- All necessary programming languages installed and configured
- Management tools (such as Cloudera Manager or Hortonworks Ambari)
- A relational database (typically MySQL)
First step to explore Big Data eco system is to set up one of the virtual machine and install any of the missing tools to practice for Big Data Engineer or Certifications from leading vendors.
Pre-requisites for installation
Typically most of the virtual machine require below configuration for your PC or Mac
- 16 GB RAM
- Quad Core (i7 highly recommended)
- Good amount of storage (SSD is recommended)
- There are some work arounds for lower configurations, but there will be many issues
- If one do not have above configuration, I recommend to follow set up single node lab in cloud (for eg: AWS)
- Virtual box is completely free
Setup Hortonworks Sandbox
Here is the video to setup Hortonworks Sandbox.
Here is the video to setup admin password.