有没有一种方法可以使用SQL表达式遍历s3对象内容? [英] Is there a way to iterate through s3 object content using a SQL expression?
问题描述
我想遍历每个s3存储桶对象,并使用sql表达式查找与sql匹配的所有内容.
I would like to iterate through each s3 bucket object and use a sql expression to find all the content that match the sql.
我能够创建一个列出了存储桶中所有对象的python脚本.
I was able to create a python script that lists all the objects inside my bucket.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
startAfter = 'bucketname/directory'
for obj in bucket.objects.all():
print(obj.key)
我还能够创建一个使用sql表达式浏览对象内容的python脚本.
I was also able to create a python script that uses a sql expression to look through the object content.
import boto3
S3_BUCKET = 'bucketname'
s3 = boto3.client('s3')
var1 = 'aj9c03869'
var2 = 'b3bu11043'
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key='name_of_object',
ExpressionType='SQL',
Expression='select * from s3object s where s.\"serialnumber\" in (%r,%r) ' % (var1,var2),
OutputSerialization={'JSON': {}},
InputSerialization={
'CompressionType': 'GZIP',
'JSON': {
'Type': 'DOCUMENT'
} }, )
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
我想创建一个遍历每个存储桶对象的循环,使用sql表达式在对象中查找数据,然后返回所有匹配项.
I would like to create a loop that goes through each bucket object, uses the sql expression to find the data within the object, and returns all the matches.
-
我尝试查询所有对象的原因是在对象中查找内容并删除特定数据.我很欣赏有关雅典娜的答案,但我认为这对我而言不起作用.
The reason why I am trying to query all the objects is to find content within the objects and delete specific data. I appreciate the answers about Athena but I don't think that would work in my case.
推荐答案
看看Amazon Athena – Amazon S3中数据的交互式SQL查询
Take a look at Amazon Athena – Interactive SQL Queries for Data in Amazon S3
这篇关于有没有一种方法可以使用SQL表达式遍历s3对象内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!