AWS Glue Crawler 将 json 文件分类为 UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

查看：31 发布时间：2021/12/22 21:41:12 json amazon-web-services pyspark aws-glue

本文介绍了AWS Glue Crawler 将 json 文件分类为 UNKNOWN的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一项 ETL 作业，该作业会将 JSON 文件提取到 RDS 临时表中.我配置的爬虫可以对 JSON 文件进行分类，只要它们的大小小于 1MB.如果我缩小文件(而不是漂亮的打印件)，如果结果小于 1MB，它将毫无问题地对文件进行分类.

I'm working on an ETL job that will ingest JSON files into a RDS staging table. The crawler I've configured classifies JSON files without issue as long as they are under 1MB in size. If I minify a file (instead of pretty print) it will classify the file without issue if the result is under 1MB.

我在想出解决方法时遇到了麻烦.我尝试将 JSON 转换为 BSON 或 GZIPing JSON 文件，但它仍然归类为 UNKNOWN.

I'm having trouble coming up with a workaround. I tried converting the JSON to BSON or GZIPing the JSON file but it is still classified as UNKNOWN.

有没有其他人遇到过这个问题?有一个更好的方法吗?

Has anyone else run into this issue? Is there a better way to do this?

AWS Glue Crawler 将 json 文件分类为 UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

AWS Glue Crawler 将 json 文件分类为 UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭