查询通过雅典娜存储的s3的csv表 [英] Query csv tables stored s3 through athena

查看:197
本文介绍了查询通过雅典娜存储的s3的csv表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我们开始将备份存储在AWS s3中。我们需要通过aws athena查询所有的csv文件。
我们试图一张一张地插入表格,但是时间太长了,数据量很大。是否可以使用任何API或已设置好的东西?
我们本来会做些火花,但是也许有一种更简单的方法,或者已经完成了一些事情。
谢谢

Recently we started to store our backups in aws s3. It is all csv files that we need to query through aws athena. We tried to insert the tables one by one but it's taking too long, it is a fair amount of data. Is there any API that we can use or something that is alredy set? we were about to do something with spark, but maybe there is a simpler way, or something that's already have been done. thanks

推荐答案

您可以简单地在CSV文件顶部创建具有所需属性的外部表。

You can simply create an external table on top of CSV files with the required properties.

参考:在AWS Athena上创建外部表

您还可以使用Glue Crawler并将其配置为自动为您填充表。

You can also use Glue Crawler and configure it to automatically populate the tables for you.

参考资料:目录表

有多种可用的AWS开发工具包(此处)以自动化您的任务,例如将文件上传到S3,创建雅典娜表或通过胶粘钳对表进行分类。

There are different AWS SDK's available (here) to automate your tasks like uploading files to S3, creating athena tables or cataloging tables through glue clawler.

这篇关于查询通过雅典娜存储的s3的csv表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆