Amazon Athena-无法在基本SQL WHERE查询中解析列 [英] Amazon Athena - Column cannot be resolved on basic SQL WHERE query

查看:187
本文介绍了Amazon Athena-无法在基本SQL WHERE查询中解析列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在评估Amazon Athena和Amazon S3。
我用一个表(awsevaluationtable)创建了一个数据库(testdb)。该表有两列,x(bigint)和y(bigint)。



当我运行时:

  SELECT * 
FROM testdb。 awsevaluationtable

我获得了所有测试数据:



但是,当我尝试基本的WHERE查询时:

  SELECT * 
来自testdb。 awsevaluationtable
WHERE x> 5

我得到:

  SYNTAX_ERROR:第3:7行:无法解析列'x'

我尝试了各种各样的变体:

  SELECT * FROM testdb.awsevaluationtable WHERE x> 5 
SELECT * FROM awsevaluationtable WHERE x> 5
SELECT * FROM testdb。 awsevaluationtable WHERE X> 5
SELECT * FROM testdb。 awsevaluationtable WHERE testdb。 awsevaluationtable .x> 5
SELECT * FROM testdb.awsevaluationtable WHERE awsevaluationtable.x> 5

我还确认x列存在:

 显示列中的评估列



这似乎是一个非常简单的查询,但我无法弄清楚出了什么问题。在文档中,我看不到任何明显的内容。任何建议将不胜感激。

解决方案

我已根据当前的发现以及与这两个机构的联系对我对该问题的回复进行了编辑。 AWS Glue和Athena支持团队。



我们遇到了同样的问题-无法在CSV文件的第一列中查询。问题归结为CSV文件的编码。简而言之,AWS Glue和Athena当前不支持 UTF-8-BOM 编码的CSV。如果在Excel或Notepad ++中打开用字节顺序标记(BOM)编码的CSV,则它看起来像任何用逗号分隔的文本文件。但是,在十六进制编辑器中打开它会发现潜在的问题。文件开头有一堆特殊字符:,即BOM。



在AWS Glue中处理UTF-8-BOM CSV文件时,它将保留这些特殊字符,然后将其与第一列名称相关联。当您尝试在Athena的第一列中查询时,会生成一个错误。



在AWS上有多种解决方法:




  • 在AWS Glue 中,编辑表架构并删除第一列,然后使用正确的列名OR再次插入。 p>


  • 在AWS Athena中,执行 SHOW CREATE TABLE DDL以脚本化有问题的表,删除生成的脚本中的特殊字符,然后运行脚本以创建新表




要使您的生活变得简单,只需确保CSV编码为UTF-8


I am currently evaluating Amazon Athena and Amazon S3. I have created a database (testdb) with one table (awsevaluationtable). The table has two columns, x (bigint) and y (bigint).

When I run:

SELECT * 
FROM testdb."awsevaluationtable"

I get all of the test data:

However, when I try a basic WHERE query:

SELECT * 
FROM testdb."awsevaluationtable" 
WHERE x > 5

I get:

SYNTAX_ERROR: line 3:7: Column 'x' cannot be resolved

I have tried all sorts of variations:

SELECT * FROM testdb.awsevaluationtable WHERE x > 5
SELECT * FROM awsevaluationtable WHERE x > 5
SELECT * FROM testdb."awsevaluationtable" WHERE X > 5
SELECT * FROM testdb."awsevaluationtable" WHERE testdb."awsevaluationtable".x > 5
SELECT * FROM testdb.awsevaluationtable WHERE awsevaluationtable.x > 5

I have also confirmed that the x column exists with:

SHOW COLUMNS IN sctawsevaluation

This seems like an extremely simple query yet I can't figure out what is wrong. I don't see anything obvious in the documentation. Any suggestions would be appreciated.

解决方案

I have edited my response to this issue based on my current findings and my contact with both the AWS Glue and Athena support teams.

We were having the same issue - an inability to query on the first column in our CSV files. The problem comes down to the encoding of the CSV file. In short, AWS Glue and Athena currently do not support CSV's encoded in UTF-8-BOM. If you open up a CSV encoded with a Byte Order Mark (BOM) in Excel or Notepad++, it looks like any comma-delimited text file. However, opening it up in a Hex editor reveals the underlying issue. There are a bunch of special characters at the start of the file:  i.e. the BOM.

When a UTF-8-BOM CSV file is processed in AWS Glue, it retains these special characters, and associates then with the first column name. When you try and query on the first column within Athena, you will generate an error.

There are ways around this on AWS:

  • In AWS Glue, edit the table schema and delete the first column, then reinsert it back with the proper column name, OR

  • In AWS Athena, execute the SHOW CREATE TABLE DDL to script out the problematic table, remove the special character in the generated script, then run the script to create a new table which you can query on.

To make your life simple, just make sure your CSV's are encoded as UTF-8.

这篇关于Amazon Athena-无法在基本SQL WHERE查询中解析列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆