使用Java代码计算S3上文件的行数 [英] Using java code to count the number of lines in a file on S3

查看:51
本文介绍了使用Java代码计算S3上文件的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Java代码,可以在AWS s3上计算文件中的行数,而无需将其下载到本地计算机上.

Using java code, is it possible to count the number of lines in a file on AWS s3 without downloading it to local machine.

推荐答案

取决于您下载的含义.

Depends what you mean by download.

S3中没有远程处理-您不能上载将在S3服务中执行的代码.可能的替代方法:

There is no remote processing in S3 - you can't upload code that will execute in the S3 service. Possible alternatives:

  • 如果问题是文件太大而无法存储在内存或本地磁盘中,则您仍可以分块下载文件并分别处理每个块.您只需使用Java InputStream(或您使用的任何其他API)并下载一个块(例如4KB),对其进行处理(扫描行尾),然后继续而不存储到磁盘.缺点是您仍在执行S3的所有I/O操作,以将文件下载到计算机上.
  • 使用AWS lambda -创建一个可以为您进行处理的lambda函数.该代码在亚马逊云中运行,因此您的计算机没有I/O,仅在云内部.该功能与上一个选项相同,只是可以远程运行.
  • 使用EC2-如果您需要对代码,自定义操作系统等进行更多控制,则可以在ec2上拥有专用的VM来处理此问题.
  • If the issue is that the file is too big to store in memory or on your local disk, you can still download the file in chunks and process each chunk separately. You just use the Java InputStream (or whatever other API you are using) and download a chunk, say 4KB, process it (scan for line endings), and continue without storing to disk. Downside here is that you are still doing all this I/O from S3 to download the file to your machine.
  • Use AWS lambda - create a lambda function that does the processing for you. This code runs in the amazon cloud, so no I/O to your machine, only inside the cloud. The function would be the same as the previous option, just runs remotely.
  • Use EC2 - If you need more control of your code, custom operating systems, etc, you can have a dedicated VM on ec2 that handles this.

鉴于您所提问题的信息,我想说lambda函数可能是最好的选择.

Given the information in your question, I would say that the lambda function is probably the best option.

这篇关于使用Java代码计算S3上文件的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆