查询通过雅典娜存储的s3的csv表 [英] Query csv tables stored s3 through athena
问题描述
最近,我们开始将备份存储在AWS s3中。我们需要通过aws athena查询所有的csv文件。
我们试图一张一张地插入表格,但是时间太长了,数据量很大。是否可以使用任何API或已设置好的东西?
我们本来会做些火花,但是也许有一种更简单的方法,或者已经完成了一些事情。
谢谢
Recently we started to store our backups in aws s3. It is all csv files that we need to query through aws athena. We tried to insert the tables one by one but it's taking too long, it is a fair amount of data. Is there any API that we can use or something that is alredy set? we were about to do something with spark, but maybe there is a simpler way, or something that's already have been done. thanks
推荐答案
您可以简单地在CSV文件顶部创建具有所需属性的外部表。
You can simply create an external table on top of CSV files with the required properties.
您还可以使用Glue Crawler并将其配置为自动为您填充表。
You can also use Glue Crawler and configure it to automatically populate the tables for you.
参考资料:目录表
有多种可用的AWS开发工具包(此处)以自动化您的任务,例如将文件上传到S3,创建雅典娜表或通过胶粘钳对表进行分类。
There are different AWS SDK's available (here) to automate your tasks like uploading files to S3, creating athena tables or cataloging tables through glue clawler.
这篇关于查询通过雅典娜存储的s3的csv表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!