In this topic we will try to understand how to write word count program.
Here is the video to get word count by using flatMap, map and reduceByKey functions
Here is the code snippet and details about the logic. One can run one line at a time and understand what is going on.
- Developing word count program
- Create a file and type few lines and save it as wordcount.txt and copy to HDFS to /user/cloudera/wordcount.txt
- Reading data from HDFS location
val data = sc.textFile("/user/cloudera/wordcount.txt")
- Flatten each line into multiple words using ” ” (space) as delimiter.
val dataFlatMap = data.flatMap(x => x.split(" "))
- Associate value 1 for each of the input word to map function.
val dataMap = dataFlatMap.map(x => (x, 1))
- Aggregating using key (which are nothing but all unique words)
val dataReduceByKey = dataMap.reduceByKey((x,y) => x + y)
- Saving to HDFS
- Validating the data set
Here is the video which explain the details of word count program