在apachecamel中处理大型压缩文件 [英] Processing large compressed files in apache camel
问题描述
我正在尝试从 ftp 服务器获取具有 .zip 压缩的单个文件,并尝试使用骆驼使用 .gzip 压缩将其存储在 S3 中.以下是我目前的路线.
I am trying to get a single file with .zip compression from a ftp server and trying to store it in S3 with .gzip compression using camel. Following is the route I currently have.
from("sftp://username@host/file_path/?password=<password>&noop=true&streamDownload=true")
.routeId("route_id")
.setExchangePattern(ExchangePattern.InOut)
.unmarshal().zipFile()
.marshal().gzip()
.to("aws-s3://s3_bucket_name?amazonS3Client=#client");
这适用于较小的文件.但是我有压缩时大约 700 MB 的文件.对于这种大小的文件,我得到 OutOfMemoryError for Java heap space
我知道骆驼中有一个流选项(.split(body().tokenize("\n")).streaming()
),但我不确定我是否可以 umarshal 和元帅,而流.(我在这里看到了类似的解决方案,但是在这种情况下,源文件是纯文本/csv).
问题的第二部分是将文件流式传输回 S3.我知道 camel-aws<中的 multiPartUpload 选项/a> 组件,但它似乎要求源为文件.我不知道如何实现.
This works fine for smaller files. But I have files that are ~700 MB in size when compressed. For files of that size I get OutOfMemoryError for Java heap space
I know there is a streaming option in camel (.split(body().tokenize("\n")).streaming()
) but I am not sure if I can umarshal and marshal while streaming. (I see a similar solution here but in this case the source file is plain text / csv).
The second part to the problem is streaming the file back to S3. I am aware of the multiPartUpload option in camel-aws component but it seems to require the source to be a file. I do not know how to achieve that.
这是否可以在不使用自定义处理器中的 Java 代码处理(解压然后 gzip)文件的情况下实现?
Can this be achieved without processing (unzipping and then gzipping) the file using java code in a custom processor ?
环境:Camel 2.19.3,Java 8
Environment: Camel 2.19.3, Java 8
谢谢
推荐答案
我使用 streamCaching()
解决了这个问题.所以我会这样做
I solved it using streamCaching()
. So the way I would do that is
from('xyz')
.streamCaching()
.unmarshall().gzip()
.to('abc')
.end()
这篇关于在apachecamel中处理大型压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!