Avro file format is one of the popular file formats in Hadoop based applications. Avro is an Apache™ open source project that provides data serialization and data exchange services for Hadoop®. These services can be used together or independently. Using Avro, big data can be exchanged between programs written in any language
Highlighted certification item is covered as part of this topic.
Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and Impala.
- Read and/or create a table in the Hive metastore in a given schema
- Extract an Avro schema from a set of datafiles using avro-tools
- Create a table in the Hive metastore using the Avro file format and an external schema file
- Improve query performance by creating partitioned tables in the Hive metastore
- Evolve an Avro schema by changing JSON files
Here is the video which explains the process of creating hive partitioned table using avro file format
- Here is the example for creating tables with different names than column names in avsc files
CREATE TABLE orders_part_avro ( order_id int, order_date bigint, order_customer_id int, order_status string ) PARTITIONED BY (order_month string) STORED AS AVRO LOCATION 'hdfs:///user/cloudera/sqoop_import/orders_part_avro' TBLPROPERTIES ('avro.schema.url'='hdfs://quickstart.cloudera/user/cloudera/avsc_files/orders_part_avro.avsc');