S3在CSV中选择检索标题 [英] S3 Select retrieve headers in the CSV

查看:71
本文介绍了S3在CSV中选择检索标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下代码从存储在S#存储桶中的CSV中获取记录的子集:

I am trying to fetch a subset of records from a CSV stored in an S# bucket using the following code:

s3 = boto3.client('s3')
bucket = bucket
file_name = file

sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""


req = s3.select_object_content(
    Bucket=bucket,
    Key=file,
    ExpressionType='SQL',
    Expression=sql_stmt,
    InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization = {'CSV': {}},
)

records = []
for event in req['Payload']:
    if 'Records' in event:
        records.append(event['Records']['Payload'])
    elif 'Stats' in event:
        stats = event['Stats']['Details']


file_str = ''.join(r.decode('utf-8') for r in records)

select_df = pd.read_csv(StringIO(file_str))
df = pd.DataFrame(select_df)
print(df)

这成功地产生了记录,但是错过了标题.

This successfully yields the records but misses out on headers.

我在这里阅读了 S3 Select CSV标头,S3 Select根本不产生标头.因此,是否可以通过其他任何方式在S3中检索CSV文件的标题?

I read here S3 Select CSV Headers that S3 Select does not yield headers at all. So, is it possible to retrieve the headers of a CSV file in S3 in any other way?

推荐答案

更改InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},

InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},

然后,它将打印完整内容,包括header.

Then, it will print full content, including the header.

说明:

FileHeaderInfo接受"NONE"或"USE"或"IGNORE"之一.

FileHeaderInfo accepts one of "NONE" OR "USE" OR "IGNORE".

使用NONE选项而不是USE,它还会打印header,因为NONE告诉您对于processing也需要header.

Use NONE option rather then USE, it will then print header as well, as NONE tells that you need header as well for processing.

这里是参考. https://boto3. amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content

希望对您有帮助.

这篇关于S3在CSV中选择检索标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆