Amazon Web Services-如何每天运行脚本 [英] Amazon Web Services - how to run a script daily

查看:76
本文介绍了Amazon Web Services-如何每天运行脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每天运行一个R脚本,该脚本从几个不同的网站上抓取数据,然后将抓取的数据写到几个不同的CSV文件中.每天,在特定时间(每天更改),我打开RStudio,打开文件,然后运行脚本.我每次都检查它是否正确运行,然后将输出保存到CSV文件.每天必须这样做(每天大约需要10-15分钟)是很痛苦的.如果能以某种方式在预定的特定时间自动运行此脚本,我的一个朋友说我的AWS能够做到这一点,那我将非常喜欢.

I have an R script that I run every day that scrapes data from a couple of different websites, and then writes the data scraped to a couple of different CSV files. Each day, at a specific time (that changes daily) I open RStudio, open the file, and run the script. I check that it runs correctly each time, and then I save the output to a CSV file. It is often a pain to have to do this everyday (takes ~10-15 minutes a day). I would love it if someway I could have this script run automatically at a pre-defined specific time, and a buddy of mine said AWS is capable of doing this?

这是真的吗?如果是这样,AWS能够做到的特定功能/方面是什么,通过这种方式,我可以对其进行更多研究?

Is this true? If so, what is the specific feature / aspect of AWS that is able to do this, this way I can look more into it?

谢谢!

推荐答案

我想到了两个选择:

  • 托管具有 R 的EC2实例,并配置CRON-Job以定期执行R-Script.
    一种简单的入门方法:使用 AMI.
    为了执行脚本,R提供了CLI rscript.参见例如此处有关如何进行设置
  • >
  • 无服务器运行:AWS Lambda是托管的微服务.当前 R 本身不受支持,但在官方AWS Blog上 rpy2 -Package.
    一旦完成此设置,就可以通过CloudWatch Events(〜托管的cron-job)计划该功能. 此处,您可以找到有关操作方法的分步指南那.
    还有一件事:您说您的函数输出CSV文件:要正确保存它们,您需要将它们放入类似AWS-S3的文件存储中.您可以通过 aws.s3 -package来完成此操作.另一个选择是使用预先安装在lambda函数中的 AWS SDK for python .您可以例如将csv文件写入/tmp/ -dir,并完成 R 脚本后,通过
  • Host a EC2 Instance with R on it and configure a CRON-Job to execute your R-Script regularly.
    One easy way to get started: Use this AMI.
    To execute the script R offers a CLI rscript. See e.g. here on how to set this up
  • Go Serverless: AWS Lambda is a hosted microservice. Currently R is not natively supported but on the official AWS Blog here they offer a step by step guid on how to run R. Basically you execute R from Python using the rpy2-Package.
    Once you have this setup schedule the function via CloudWatch Events (~hosted cron-job). Here you can find a step by step guide on how to do that.
    One more thing: You say that your function outputs CSV files: To save them properly you will need to put them to a file-storage like AWS-S3. You can do this i R via the aws.s3-package. Another option would be to use the AWS SDK for python which is preinstalled in the lambda-function. You could e.g. write a csv file to the /tmp/-dir and after the R script is done move the file to S3 via boto3's S3 upload_file function.

恕我直言,第一个选项更易于设置,但第二个选项更强大.

IMHO the first option is easier to setup but the second-one is more robust.

这篇关于Amazon Web Services-如何每天运行脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆