当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？ [英] How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

查看：133 发布时间：2020/6/3 23:06:35 amazon-web-services amazon-s3 amazon-athena aws-glue

本文介绍了当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的预期解决方案中有两个问题：

I have two problems in my intended solution:

1。
我的S3商店结构如下：

1. My S3 store structure is as following:

mainfolder/date=2019-01-01/hour=14/abcd.json
mainfolder/date=2019-01-01/hour=13/abcd2.json.gz
...
mainfolder/date=2019-01-15/hour=13/abcd74.json.gz

所有json文件都具有相同的架构，我想使搜寻器指向到 mainfolder / ，然后可以在Athena中创建一个表进行查询。

All json files have the same schema and I want to make a crawler pointing to mainfolder/ which can then create a table in Athena for querying.

我已经尝试了一种文件格式，例如如果文件只是 json 或 gz ，那么搜寻器就可以正常工作，但是我正在寻找一种解决方案，通过该解决方案，我可以自动化两种类型的文件处理。我愿意编写自定义脚本或任何现成的解决方案，但需要从何处开始的指针。

I have already tried with just one file format, e.g. if the files are just json or just gz then the crawler works perfectly but I am looking for a solution through which I can automate either type of file processing. I am open to write a custom script or any out of the box solution but need pointers where to start.

2。
第二个问题是我的json数据具有一个字段（列），爬网程序将其解释为 struct 数据，但我想将该字段类型设为 string 。原因是如果类型仍然是 struct ，则日期/小时分区会出现不匹配错误，因为显然struct数据在文件中没有相同的内部架构。我试图做一个自定义分类器，但是那里没有描述数据类型的选项。

2. The second issue that my json data has a field(column) which the crawler interprets as struct data but I want to make that field type as string. Reason being that if the type remains struct the date/hour partitions get a mismatch error as obviously struct data has not the same internal schema across the files. I have tried to make a custom classifier but there are no options there to describe data types.

当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？ [英] How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？ [英] How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭