雅典娜无法从AWS DMS解析CSV文件 [英] Athena can't resolve CSV files from AWS DMS

查看：110 发布时间：2020/6/3 23:07:36 amazon-athena aws-dms

本文介绍了雅典娜无法从AWS DMS解析CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已将DMS配置为将数据从MySQL RDS连续复制到S3。这将创建两种类型的CSV文件：完全加载和更改数据捕获（CDC）。根据我的测试，我有以下文件：

I've DMS configured to continuously replicate data from MySQL RDS to S3. This creates two type of CSV files: a full load and change data capture (CDC). According to my tests, I have the following files:

testdb/addresses/LOAD001.csv.gz
testdb/addresses/20180405_205807186_csv.gz

DMS正常运行后，我触发了一个AWS Glue Crawler来构建数据包含MySQL复制文件的S3存储桶的目录，因此Athena用户将能够在基于S3的Data Lake中构建查询。

After DMS is running properly, I trigger a AWS Glue Crawler to build the Data Catalog for the S3 Bucket that contains the MySQL Replication files, so the Athena users will be able to build queries in our S3 based Data Lake.

不幸的是，爬虫没有建立S3中存储的表的正确表模式。
对于上面的示例，它为雅典娜创建了两个表：

Unfortunately the crawlers are not building the correct table schema for the tables stored in S3. For the example above It creates two tables for Athena:

addresses
20180405_205807186_csv_gz

文件 20180405_205807186_csv.gz 包含一行更新，但搜寻器无法合并这两个信息（从 LOAD001.csv.gz 进行第一次加载，并进行 20180405_205807186_csv.gz 中所述的更新）。

The file 20180405_205807186_csv.gz contains a one line update, but the crawler is not capable of merging the two informations (taking the first load from LOAD001.csv.gz and making the updpate described in 20180405_205807186_csv.gz).

我还尝试按照本博客文章中的描述在Athena控制台中创建表： https://aws.amazon。 com / pt / blogs / database / using-aws-database-migration-service-and-amazon-athena复制并运行SQL服务器数据库中的临时查询/一个>。
，但不能产生所需的输出。

I also tried to create the table in the Athena console, as described in this blog post:https://aws.amazon.com/pt/blogs/database/using-aws-database-migration-service-and-amazon-athena-to-replicate-and-run-ad-hoc-queries-on-a-sql-server-database/. But it does not yield the desired output.

来自博客文章：

使用Amazon Athena查询数据时（本文稍后），您
只需将文件夹位置指向Athena，查询结果
通过合并来自以下位置的数据来包含现有和新数据插入这两个
文件。

When you query data using Amazon Athena (later in this post), you simply point the folder location to Athena, and the query results include existing and new data inserts by combining data from both files.

我错过了什么吗？

雅典娜无法从AWS DMS解析CSV文件 [英] Athena can't resolve CSV files from AWS DMS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

雅典娜无法从AWS DMS解析CSV文件 [英] Athena can&#39;t resolve CSV files from AWS DMS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

雅典娜无法从AWS DMS解析CSV文件 [英] Athena can't resolve CSV files from AWS DMS

登录关闭