As part of this lesson let us understand Data Frames and Spark SQL in detail.
- Data Frames and Data Frame operations
- Structured Query Language – SQL
- Spark SQL – Native Context
- Spark SQL – Hive Context
- Spark SQL – JDBC
- Different file formats
In case you need well integrated environment to practice these, please sign up to the lab
Rationale
- In many cases data will be structured
- Rather than writing code, we can leverage simple interfaces such as SQL to process structured data
- Additional Optimizations
- Same execution engine for different programming languages
Typical transformations
- Filtering the data
- Data Cleansing
- Data Standardization
- Aggregation
- Sorting and Ranking
- Joining the data sets
- Analytic or Windowing functions
Prepare for Data Frames
To explore data frames and perform necessary operations, we will use retail_db database (especially orders and order_items)
Previous LessonNext Lesson
Join the discussion at discuss.itversity.com