如何创建分区具有不同列的AWS Glue表? (“ HIVE_PARTITION_SCHEMA_MISMATCH”) [英] How to create AWS Glue table where partitions have different columns? ('HIVE_PARTITION_SCHEMA_MISMATCH')

查看:218
本文介绍了如何创建分区具有不同列的AWS Glue表? (“ HIVE_PARTITION_SCHEMA_MISMATCH”)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据此


As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table schema)?

At the moment, when I run the crawler over this data and then make a query in Athena, I get the error 'HIVE_PARTITION_SCHEMA_MISMATCH'

My use case is:

  • Partitions represent days
  • Files represent events
  • Each event is a json blob in a single s3 file
  • An event contains a subset of columns (dependent on the type of event)
  • The 'schema' of the entire table is the full set of columns for all the event types (this is correctly put together by Glue crawler)
  • The 'schema' of each partition is the subset of columns for the event types that occurred on that day (hence in Glue each partition potentially has a different subset of columns from the table schema)
  • This inconsistency causes the error in Athena I think

If I were to manually write a schema I could do this fine as there would just be one table schema, and keys which are missing in the JSON file would be treated as Nulls.

Thanks in advance!

解决方案

I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:

这篇关于如何创建分区具有不同列的AWS Glue表? (“ HIVE_PARTITION_SCHEMA_MISMATCH”)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆