Base64编码数据流进行解码 [英] Stream decoding of Base64 data
问题描述
我有一些大的base64 EN codeD数据(存储在瞬间文件在Hadoop文件系统)。
该数据最初gzip压缩的文本数据。
我需要能够读取这个块EN codeD数据,德code,然后将其刷新到一个GZIPOutputStream。
I have some large base64 encoded data (stored in snappy files in the hadoop filesystem). This data was originally gzipped text data. I need to be able to read chunks of this encoded data, decode it, and then flush it to a GZIPOutputStream.
这是我怎么能这样做,而不是整个的base64数据加载到一个数组,并呼吁Base64.de codeBase64任何想法(字节[])?
Any ideas on how I could do this instead of loading the whole base64 data into an array and calling Base64.decodeBase64(byte[]) ?
我说得对,如果我读字符,直到'\\ r \\ n分隔符和德code将其一行行?
例如
Am I right if I read the characters till the '\r\n' delimiter and decode it line by line? e.g. :
for (int i = 0; i < byteData.length; i++) {
if (byteData[i] == CARRIAGE_RETURN || byteData[i] == NEWLINE) {
if (i < byteData.length - 1 && byteData[i + 1] == NEWLINE)
i += 2;
else
i += 1;
byteBuffer.put(Base64.decodeBase64(record));
byteCounter = 0;
record = new byte[8192];
} else {
record[byteCounter++] = byteData[i];
}
}
可悲的是,这种方法并没有给出任何人类可读的输出。
理想情况下,我想流中读取,德code,和流出来的数据。
Sadly, this approach doesn't give any human readable output. Ideally, I would like to stream read, decode, and stream out the data.
现在,我试图把在一个InputStream,然后复制到gzipout
Right now, I'm trying to put in an inputstream and then copy to a gzipout
byteBuffer.get(bufferBytes);
InputStream inputStream = new ByteArrayInputStream(bufferBytes);
inputStream = new GZIPInputStream(inputStream);
IOUtils.copy(inputStream , gzipOutputStream);
和它给了我一个
java.io.IOException异常:腐败GZIP拖车
And it gives me a java.io.IOException: Corrupt GZIP trailer
推荐答案
让我们一步一步来:
-
您需要一个
GZIPInputStream
读取压缩数据(而不是GZIPOutputStream
;输出流用于COM preSS数据)。有了这个流,你将能够读取uncom pressed,原始二进制数据。这需要一个的InputStream
在构造函数中。
You need a
GZIPInputStream
to read zipped data (that and not aGZIPOutputStream
; the output stream is used to compress data). Having this stream you will be able to read the uncompressed, original binary data. This requires anInputStream
in the constructor.
您需要能够读取Base64编码的连接codeD数据的输入流。我建议得心应手<一个href=\"http://commons.apache.org/proper/commons-$c$cc/javadocs/api-release/org/apache/commons/$c$cc/binary/Base64InputStream.html\"相对=nofollow> Base64InputStream
从的 Apache的commons- codeC 。用构造可以设置线路长度,行分隔,并设置 DOEN code =假
脱code数据。这就需要另一个输入流 - 原始,Base64编码的连接codeD数据
You need an input stream capable of reading the Base64 encoded data. I suggest the handy Base64InputStream
from apache-commons-codec. With the constructor you can set the line length, the line separator and set doEncode=false
to decode data. This in turn requires another input stream - the raw, Base64 encoded data.
这流取决于你如何让你的数据;理想情况下,数据应该作为的InputStream
- 问题解决了。如果没有,你可能不得不使用 ByteArrayInputStream的
(如二进制),的StringBufferInputStream
(如果字符串)等。
This stream depends on how you get your data; ideally the data should be available as InputStream
- problem solved. If not, you may have to use the ByteArrayInputStream
(if binary), StringBufferInputStream
(if string) etc.
大致是这样的逻辑是:
InputStream fromHadoop = ...; // 3rd paragraph
Base64InputStream b64is = // 2nd paragraph
new Base64InputStream(fromHadoop, false, 80, "\n".getBytes("UTF-8"));
GZIPInputStream zis = new GZIPInputStream(b64is); // 1st paragraph
请注意 Base64InputStream
的参数(线路长度和尾线的字节数组),你可能需要调整它们。
Please pay attention to the arguments of Base64InputStream
(line length and end-of-line byte array), you may need to tweak them.
这篇关于Base64编码数据流进行解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!