There are several APIs to perform aggregations

- By Key aggregations
- countByKey
- reduceByKey
- aggregateByKey

- groupByKey can be used for aggregations, but should be given low priority as it does not use combiner

## Using countByKey

## Using groupByKey

- groupByKey can be used for any aggregation
- It is least preferred as combiner will not be used
- groupByKey is generic API which group values into array for a given key
- On top of aggregations, we can perform many other transformations using groupByKey

## Using reduceByKey

- reduceByKey uses combiner
- It is used when logic to compute intermediate values and logic to compute final value using intermediate values are same
- It is very straight forward to implement
- It takes one anonymous or lambda function with 2 arguments

## Using aggregateByKey

- aggregateByKey uses combiner
- It is used when logic to compute intermediate values and logic to compute final value using intermediate values are not same
- It is a bit tricky to implement
- It takes 3 arguments
- Initialize value – driven by output value type
- Combine function or seqOp – 2 arguments
- first argument – driven by output value type
- second argument – driven by input value type

- Reduce function or combineOp – 2 arguments – driven by output value type

## Start the discussion at discuss.itversity.com