在 apache camel 中处理大型压缩文件 [英] Processing large compressed files in apache camel
问题描述
我正在尝试从 ftp 服务器获取具有 .zip 压缩的单个文件,并尝试使用骆驼使用 .gzip 压缩将其存储在 S3 中.以下是我目前的路线.
I am trying to get a single file with .zip compression from a ftp server and trying to store it in S3 with .gzip compression using camel. Following is the route I currently have.
from("sftp://username@host/file_path/?password=<password>&noop=true&streamDownload=true")
.routeId("route_id")
.setExchangePattern(ExchangePattern.InOut)
.unmarshal().zipFile()
.marshal().gzip()
.to("aws-s3://s3_bucket_name?amazonS3Client=#client");
这适用于较小的文件.但是我有压缩后大小约为 700 MB 的文件.对于那种大小的文件,我得到 OutOfMemoryError for Java heap space
我知道骆驼中有一个流式传输选项 (.split(body().tokenize("
")).streaming()
) 但我不确定我是否可以 umashal 和流式传输时编组.(我在 here 看到了类似的解决方案,但是在这种情况下,源文件是纯文本/csv).
问题的第二部分是将文件流式传输回 S3.我知道 camel-aws<中的 multiPartUpload 选项/a> 组件,但它似乎要求源是文件.我不知道如何实现.
This works fine for smaller files. But I have files that are ~700 MB in size when compressed. For files of that size I get OutOfMemoryError for Java heap space
I know there is a streaming option in camel (.split(body().tokenize("
")).streaming()
) but I am not sure if I can umarshal and marshal while streaming. (I see a similar solution here but in this case the source file is plain text / csv).
The second part to the problem is streaming the file back to S3. I am aware of the multiPartUpload option in camel-aws component but it seems to require the source to be a file. I do not know how to achieve that.
这是否可以在不使用自定义处理器中的 java 代码处理(解压缩然后 gzipping)文件的情况下实现?
Can this be achieved without processing (unzipping and then gzipping) the file using java code in a custom processor ?
环境:骆驼2.19.3,Java 8
Environment: Camel 2.19.3, Java 8
谢谢
推荐答案
我使用 streamCaching()
解决了这个问题.所以我会这样做的方式是
I solved it using streamCaching()
. So the way I would do that is
from('xyz')
.streamCaching()
.unmarshall().gzip()
.to('abc')
.end()
这篇关于在 apache camel 中处理大型压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!