### Introduction to sorting and ranking

Sorting can be broadly categorized into global and by key. As part of this topic we will covering sorting – global.

- Load data from HDFS and store results back to HDFS using Spark
- Join disparate datasets together using Spark
- Calculate aggregate statistics (e.g., average or sum) using Spark
- Filter data into a smaller dataset using Spark
**Write a query that produces ranked or sorted data using Spark**

- Global sorting and ranking
- Here are few examples of using different transformations and actions such as sortByKey, top

orders = sc.textFile("/user/cloudera/sqoop_import/orders") for i in orders. map(lambda rec: (int(rec.split(",")[0]), rec)). sortByKey(). collect(): print(i) for i in orders. map(lambda rec: (int(rec.split(",")[0]), rec)). sortByKey(False). take(5): print(i) for i in orders. map(lambda rec: (int(rec.split(",")[0]), rec)). top(5): print(i)

- Global sorting using custom logic
- Here is the implementation of takeOrdered in which data is sorted by using field of our choice in our delimited data set.

orders = sc.textFile("/user/cloudera/sqoop_import/orders") for i in orders. map(lambda rec: (int(rec.split(",")[0]), rec)). takeOrdered(5, lambda x: x[0]): print(i) for i in orders. map(lambda rec: (int(rec.split(",")[0]), rec)). takeOrdered(5, lambda x: -x[0]): print(i) for i in orders. takeOrdered(5, lambda x: int(x.split(",")[0])): print(i) for i in orders. takeOrdered(5, lambda x: -int(x.split(",")[0])): print(i)

Previous TopicNext Topic

## Start the discussion at discuss.itversity.com