Avro file format is one of the popular file formats in Hadoop based applications. Avro is an Apache™ open source project that provides data serialization and data exchange services for Hadoop®. These services can be used together or independently. Using Avro, big data can be exchanged between programs written in any language.
Highlighted certification item is covered as part of this topic.
Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and Impala.
- Read and/or create a table in the Hive metastore in a given schema
- Extract an Avro schema from a set of datafiles using avro-tools
- Create a table in the Hive metastore using the Avro file format and an external schema file
- Improve query performance by creating partitioned tables in the Hive metastore
- Evolve an Avro schema by changing JSON files
Here is the video which explains the concepts of evolving Avro schema