雅典娜-保留字和表格,无法查询 [英] Athena - reserved words and table that cannot be queried

查看:150
本文介绍了雅典娜-保留字和表格,无法查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将JSON数据文件放入S3,并使用AWS-Glue构建表定义.每个json行"有大约120个字段.字段之一被称为时间戳".小写.我有数千个大文件,不愿全部更改.

此处(

更新#1:

几周前,我在Athena上使用了CSV,效果很好.这次我使用的是JSON.

我用一个包含以下内容的文件创建了一个新文件夹,运行了Glue Crawler:

  [{"firstName":"Neal","lastName":"Walters",城市":烦人",状态","TX"}{"firstName":"Fred","lastname":"Flintstone",城市":基岩",状态","TX"}{"firstName":"Barney","lastName":"Rubble",城市":静水",状态",确定"}] 

,并且此SQL给出与上述相同的错误:

  SELECT * FROM"relatixcurrdayjson".限制10; 

解决方案

让Glue搜寻器创建在Athena中不起作用的表非常容易,考虑到这是它的主要目标,这令人惊讶./p>

如果您发布的JSON正是您针对搜寻器运行的问题,则是Athena不支持多行JSON文档.您的文件每行必须只有一个JSON文档.请参阅使用多行JSON进行交易吗?(还有加分,CRLF)多配置单元中的在线JSON文件查询创建雅典娜(Athena)中的表格来自嵌套JSON

I'm putting JSON data files into S3, and use AWS-Glue to build the table definition. I have about 120 fields per each json "row". One of the fields is called "timestamp" in lower case. I have 1000s of large files, and would hate to change them all.

Here (https://docs.aws.amazon.com/athena/latest/ug/reserved-words.html), I see TIMESTAMP in DDL is a reserved word. Does that mean I won't be able to read those JSON file from Athena.

I'm getting this error, which lead me to the above being a potential reason.

I clicked the 3 dots to the right of the tablename, and clicked "Preview Table", which built and ran this select statement:

SELECT * FROM "relatixcurrdayjson"."table_currday" limit 10;

That lead to an error which seems wrong or misleading:

Your query has the following error(s):

SYNTAX_ERROR: line 1:8: SELECT * not allowed in queries without FROM clause

This query ran against the "relatixcurrdayjson" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: c448f0ea-5086-4436-9107-2b60dab0f04f.

If I click the option that says "Generate Create Table DDL", it builds this line to execute:

SHOW CREATE TABLE table_currday;

and results in this error:

Your query has the following error(s):

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException

This query ran against the "relatixcurrdayjson" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 6ac5d90f-8d52-4e3e-8f16-cd42e1edcfa3.

This is the AWS Glue Log:

UPDATE #1:

I used Athena a couple of weeks ago with CSV and it worked great. This time I'm using JSON.

I created a new folder with one file containing the following, ran the Glue Crawler:

[
    {"firstName": "Neal",
     "lastName": "Walters",
     "city": "Irving",
     "state", "TX"
     }
     
    {"firstName": "Fred",
     "lastName": "Flintstone",
     "city": "Bedrock",
     "state", "TX"
     }
     
    {"firstName": "Barney",
     "lastName": "Rubble",
     "city": "Stillwater",
     "state", "OK"
     }
     
]

and this SQL gives the same error as above:

SELECT * FROM "relatixcurrdayjson"."tbeasyeasytest" limit 10;

解决方案

It's very easy to get Glue crawlers to create tables that don't work in Athena, which is surprising given that it's the primary goal it was designed for.

If the JSON you posted is exactly what you ran your crawler against the problem is that Athena does not support multi-line JSON documents. Your files must have exactly one JSON document per line. See Dealing with multi-line JSON? (And, bonus points, CRLF), Multi-line JSON file querying in hive, and Create Table in Athena From Nested JSON

这篇关于雅典娜-保留字和表格,无法查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆