AWS Glue Crawler将json文件分类为UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

查看：320 发布时间：2019/11/23 21:21:52 json amazon-web-services pyspark aws-glue

本文介绍了AWS Glue Crawler将json文件分类为UNKNOWN的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事一项ETL作业，该作业将JSON文件提取到RDS登台表中.我配置的抓取工具可以对JSON文件进行分类，只要它们的大小小于1MB.如果我缩小文件(而不是漂亮的打印文件)，并且结果小于1MB，它将对文件进行分类而不会出现问题.

I'm working on an ETL job that will ingest JSON files into a RDS staging table. The crawler I've configured classifies JSON files without issue as long as they are under 1MB in size. If I minify a file (instead of pretty print) it will classify the file without issue if the result is under 1MB.

我在想办法时遇到了麻烦.我尝试将JSON转换为BSON或GZIP转换JSON文件，但仍被归类为UNKNOWN.

I'm having trouble coming up with a workaround. I tried converting the JSON to BSON or GZIPing the JSON file but it is still classified as UNKNOWN.

还有其他人遇到这个问题吗?有一个更好的方法吗?

Has anyone else run into this issue? Is there a better way to do this?

AWS Glue Crawler将json文件分类为UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

AWS Glue Crawler将json文件分类为UNKNOWN [英] AWS Glue Crawler Classifies json file as UNKNOWN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭