使用Lambda将文件从S3复制到EMR本地 [英] Copy files from S3 to EMR local using Lambda

查看:188
本文介绍了使用Lambda将文件从S3复制到EMR本地的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Lambda以编程方式将文件从S3移到EMR的本地目录/home/hadoop.

I need to move the files from S3 to EMR's local dir /home/hadoop programmatically using Lambda.

S3DistCp复制到HDFS.然后,我登录EMR并在命令行上运行CopyToLocal hdfs命令,以将文件获取到/home/hadoop.

S3DistCp copies over to HDFS. I then login into EMR and run a CopyToLocal hdfs command on commandline to get the files to /home/hadoop.

是否存在使用Lambda中的boto3从S3复制到Emr的本地目录的编程方式?

Is there a programmatic way using boto3 in Lambda to copy from S3 to Emr's local dir?

推荐答案

我编写了一个测试Lambda函数,以将作业步骤提交给EMR,该步骤将文件从S3复制到EMR的本地目录.这行得通.

I wrote a test Lambda function to submit a job step to EMR that copies files from S3 to EMR's local dir. This worked.

emrclient = boto3.client('emr', region_name='us-west-2')

def lambda_handler(event, context): 
EMRS = emrclient.list_clusters( ClusterStates = ['STARTING', 'RUNNING', 'WAITING'] ) 
clusters = EMRS["Clusters"] 
print(clusters)
for cluster in clusters: 
    ID = cluster["Id"]
    response = emrclient.add_job_flow_steps(JobFlowId=ID,
                                 Steps=[
                                     {
                                         'Name': 'AWS S3 Copy',
                                         'ActionOnFailure': 'CONTINUE',
                                         'HadoopJarStep': {
                                             'Jar': 'command-runner.jar',
                                             'Args':["aws","s3","cp","s3://XXX/","/home/hadoop/copy/","--recursive"],
                                         }
                                     }
                                 ],
                            )

如果有更好的复制方法,请告诉我.

If there are better ways to do the copy, please do let me know.

这篇关于使用Lambda将文件从S3复制到EMR本地的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆