AWS lambda读取zip文件执行验证,如果通过验证,则解压缩到s3存储桶 [英] AWS lambda read zip file perform validation and unzip to s3 bucket if validation is passed

查看:160
本文介绍了AWS lambda读取zip文件执行验证,如果通过验证,则解压缩到s3存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个zip文件到达s3存储桶的要求,我需要使用python编写一个lambda来读取该zip文件以执行一些验证并在另一个S3存储桶上解压缩.

I have a requirement in which a zip files arrives on s3 bucket, I need to write a lambda using python to read the zip file perform some validation and unzip on another S3 bucket.

Zip文件包含以下内容:

Zip file contains below:

a.csv b.csv c.csv trigger_file.txt

trigger_file.txt-包含zip文件的名称和记录数(例如:a.csv:120,b.csv:10,c.csv:50)

trigger_file.txt -- contain names of files in zip and record count (example: a.csv:120 , b.csv:10 , c.csv:50 )

因此,使用lambda时,我需要读取触发器文件,如果将unzip传递到s3存储桶,则需要检查zip文件夹中的文件数量是否等于触发器文件中提到的文件数量.

So using lambda I need to read trigger file check if number files in zip folder is equal to number of files mentioned in trigger file if pass the unzip to s3 bucket.

下面我准备的代码:

def write_to_s3(config_dict):
    inp_bucket = config_dict["inp_bucket"]
    inp_key = config_dict["inp_key"]
    out_bucket = config_dict["out_bucket"]
    des_key = config_dict["des_key"]
    processed_key = config_dict["processed_key"]

    obj = S3_CLIENT.get_object(Bucket=inp_bucket, Key=inp_key)
    putObjects = []
    with io.BytesIO(obj["Body"].read()) as tf:
        # rewind the file
        tf.seek(0)

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        for file in zipf.infolist():
            fileName = file.filename
            print("file name before while loop :",fileName)
            try:
                found = False
                while not found :
                    if fileName == "Trigger_file.txt" :
                        with zipf.open(fileName , 'r') as thefile:
                            my_list = [i.decode('utf8').split(' ') for i in thefile]
                            my_list = str(my_list)[1:-1]
                            print("my_list :",my_list)
                            print("fileName :",fileName)
                            found = True
                            break
                            thefile.close()
                    else:
                        print("Trigger file not found ,try again")
            except Exception as exp_handler:
                    raise exp_handler

            if 'csv' in fileName :
                try:
                    if fileName in my_list:
                        print("Validation Success , all files in Trigger file  are present procced for extraction")
                    else:
                        print("Validation Failed")
                except Exception as exp_handler:
                    raise exp_handler

    # *****FUNCTION TO UNZIP ********


def lambda_handler(event, context):
    try:
        inp_bucket = event['Records'][0]['s3']['bucket']['name']
        inp_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        config_dict = build_conf_obj(os.environ['config_bucket'],os.environ['config_file'], os.environ['param_name'])
        write_to_s3(config_dict)
    except Exception as exp_handler:
        print("ERROR")

一切进展顺利,我所面临的唯一问题是验证部分,我认为while循环是错误的,因为它正在进入无限循环.

All was going well, only issue I am facing is in validation part, I think while loop is wrong, since it is going into infinite loop.

期望:

如果找到zip文件夹,则搜索trigger_file.txt,然后中断循环执行验证并将其解压缩到s3文件夹.如果找不到,请继续搜索直到字典结束.

Search for trigger_file.txt in zip folder if found then break the loop do validation and unzip it to s3 folder. If not found keep searching until end of dict.

错误输出(超时):

Response:
{
  "errorMessage": "2020-06-16T20:09:06.168Z 39253b98-db87-4e65-b288-b585d268ac5f Task timed out after 60.06 seconds"
}

Request ID:
"39253b98-db87-4e65-b288-b585d268ac5f"

Function Logs:
 again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,trEND RequestId: 39253b98-db87-4e65-b288-b585d268ac5f
REPORT RequestId: 39253b98-db87-4e65-b288-b585d268ac5f  Duration: 60060.06 ms   Billed Duration: 60000 ms   Memory Size: 3008 MB    Max Memory Used: 83 MB  Init Duration: 389.65 ms    
2020-06-16T20:09:06.168Z 39253

推荐答案

在下面的代码循环中,如果fileName不是"Trigger_file.txt",它将陷入无限循环.

In the following while loop in your code, if fileName is not "Trigger_file.txt", it falls into infinite loop.

found = False
while not found:
    if fileName == "Trigger_file.txt":
        with zipf.open(fileName , 'r') as thefile:
            my_list = [i.decode('utf8').split(' ') for i in thefile]
            my_list = str(my_list)[1:-1]
            print("my_list :",my_list)
            print("fileName :",fileName)
            found = True
            break
            thefile.close()
    else:
        print("Trigger file not found ,try again")


我认为您可以用以下代码替换write_to_s3功能代码的一部分:


I think you can replace part of your write_to_s3 function code by the following code:

def write_to_s3(config_dict):

    ######################
    #### Do something ####
    ######################    

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        found = False
        for file in zipf.infolist():
            fileName = file.filename
            if fileName == "Trigger_file.txt":
                with zipf.open(fileName, 'r') as thefile:
                    my_list = [i.decode('utf8').split(' ') for i in thefile]
                    my_list = str(my_list)[1:-1]
                    print("my_list :", my_list)
                    print("fileName :", fileName)
                    found = True
                    thefile.close()
                    break

        if found is False:
            print("Trigger file not found ,try again")
            return

        for file in zipf.infolist():
            fileName = file.filename
            if 'csv' in fileName:
                if fileName not in my_list:
                    print("Validation Failed")
                    return

        print("Validation Success , all files in Trigger file  are present procced for extraction")

    # *****FUNCTION TO UNZIP ********

这篇关于AWS lambda读取zip文件执行验证,如果通过验证,则解压缩到s3存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆