如何在Athena中跳过与架构不匹配的文档？ [英] How to skip documents that do not match schema in Athena?

查看：74 发布时间：2020/6/3 23:08:58 amazon-athena

本文介绍了如何在Athena中跳过与架构不匹配的文档？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个像这样的外部表：

Suppose I have an external table like this:

CREATE EXTERNAL TABLE my.data (
  `id` string,
  `timestamp` string,
  `profile` struct<
    `name`: string,
    `score`: int>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'ignore.malformed.json' = 'true'
)
LOCATION 's3://my-bucket-of-data'
TBLPROPERTIES ('has_encrypted_data'='false');

我的一些文档具有无效的 profile.score （字符串而不是整数）。

A few of my documents have an invalid profile.score (a string rather than an integer).

这会导致雅典娜查询失败：

This causes queries in Athena to fail:

状态：{
状态：失败，
StateChangeReason： HIVE_BAD_DATA：解析字段0的字段值时出错：对于输入字符串：\ 4099999.9999999995\，

"Status": { "State": "FAILED", "StateChangeReason": "HIVE_BAD_DATA: Error parsing field value for field 0: For input string: \"4099999.9999999995\"",

如何配置Athena跳过不适合外部表模式的文档？

How can I configure Athena to skip the documents that do not fit the external table schema?

问题此处是关于查找有问题的文档；这个问题是关于跳过它们。

The question here is about finding the problematic documents; this question is about skipping them.

推荐答案

此处是有关如何排除特定文件的示例

Here is a sample on how to exclude a particular file

SELECT
   * 
FROM 
    "some_database"."some_table"
WHERE(
  "$PATH" != 's3://path/to/a/file'
)

只需使用$ p
$ b

Just tested this approach with

SELECT 
   COUNT(*)
FROM 
    "some_database"."some_table"
-- Result: 68491573

SELECT 
   COUNT(*)
FROM 
    "some_database"."some_table"
WHERE(
  "$PATH" != 's3://path/to/a/file'
)
-- Result: 68041452

SELECT 
   COUNT(*)
FROM 
    "some_database"."some_table"
WHERE(
  "$PATH" = 's3://path/to/a/file'
)
-- Result: 450121

总计：450121 + 68041452 = 68491573

Total: 450121 + 68041452 = 68491573

这篇关于如何在Athena中跳过与架构不匹配的文档？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Athena中跳过与架构不匹配的文档？ [英] How to skip documents that do not match schema in Athena?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Athena中跳过与架构不匹配的文档？ [英] How to skip documents that do not match schema in Athena?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭