Java缓冲的base64编码器流 [英] Java buffered base64 encoder for streams

查看:160
本文介绍了Java缓冲的base64编码器流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多PDF文件,需要使用base64对其内容进行编码。我有一个Akka应用程序,它以流的形式获取文件并分发给许多工作人员以对这些文件进行编码,并为每个文件返回字符串base64。我有一个编码的基本解决方案:

I have lots of PDF files that I need to get its content encoded using base64. I have an Akka app which fetch the files as stream and distributes to many workers to encode these files and returns the string base64 for each file. I got a basic solution for encoding:

    org.apache.commons.codec.binary.Base64InputStream;
    ...
    Base64InputStream b64IStream = null;
    InputStreamReader reader = null;
    BufferedReader br = null;
    StringBuilder sb = new StringBuilder();
    try {
        b64IStream = new Base64InputStream(input, true);
        reader = new InputStreamReader(b64IStream);
        br = new BufferedReader(reader);
        String line;
        while ((line = br.readLine()) != null) {
            sb.append(line);
        }
    } finally {
        if (b64IStream != null) {
            b64IStream.close();
        }
        if (reader != null) {
            reader.close();
        }
        if (br != null) {
            br.close();
        }
    }

可以,但是我想知道会发生什么是我可以使用缓冲区对文件进行编码的最佳方法,如果有更快的替代方法。

It works, but I would like to know what would be the best way that I can encode the files using a buffer and if there is a faster alternative for this.

我测试了其他一些方法,例如:

I tested some other approaches such as:


  • Base64.getEncoder

  • sun.misc.BASE64Encoder

  • Base64.encodeBase64

  • javax.xml.bind.DatatypeConverter.printBase64

  • com.google.guava.BaseEncoding.base64

  • Base64.getEncoder
  • sun.misc.BASE64Encoder
  • Base64.encodeBase64
  • javax.xml.bind.DatatypeConverter.printBase64
  • com.google.guava.BaseEncoding.base64

它们速度更快,但它们需要整个文件,对吗?另外,我不想在编码1个PDF文件时阻塞其他线程。

They are faster but they need the entire file, correct? Also, I do not want to block other threads while encoding 1 PDF file.

任何输入都非常有用。谢谢!

Any input is really helpful. Thank you!

推荐答案

关于Base64的有趣事实:它需要三个字节,并将它们转换为四个字母。这意味着,如果读取的二进制数据中的数据块可被三整除,则可以将这些数据块馈送到 any Base64编码器,并且编码方式与对整个文件的馈送方式相同

Fun fact about Base64: It takes three bytes, and converts them into four letters. This means that if you read binary data in chunks that are divisible by three, you can feed the chunks to any Base64 encoder, and it will encode it in the same way as if you fed it the entire file.

现在,如果您希望输出流只是一长行Base64数据-这是完全合法的-那么您所要做的就是顺其自然的:

Now, if you want your output stream to just be one long line of Base64 data - which is perfectly legal - then all you need to do is something along the lines of:

private static final int BUFFER_SIZE = 3 * 1024;

try ( BufferedInputStream in = new BufferedInputStream(input, BUFFER_SIZE); ) {
    Base64.Encoder encoder = Base64.getEncoder();
    StringBuilder result = new StringBuilder();
    byte[] chunk = new byte[BUFFER_SIZE];
    int len = 0;
    while ( (len = in.read(chunk)) == BUFFER_SIZE ) {
         result.append( encoder.encodeToString(chunk) );
    }
    if ( len > 0 ) {
         chunk = Arrays.copyOf(chunk,len);
         result.append( encoder.encodeToString(chunk) );
    }
}

这意味着只有最后一个块的长度

This means that only the last chunk may have a length that is not divisible by three and will therefore contain the padding characters.

上面的示例是在Java 8 Base64中使用的,但是您实际上可以使用任何采用字节数组的编码器

The above example is with Java 8 Base64, but you can really use any encoder that takes a byte array of an arbitrary length and returns the base64 string of that byte array.

这意味着您可以随心所欲地使用缓冲区大小。

This means that you can play around with the buffer size as you wish.

但是,如果您希望输出与MIME兼容,则需要将输出分成几行。在这种情况下,我将上面示例中的块大小设置为乘以4/3时得到的整数行。例如,如果您希望每行有64个字符,则每行编码64/4 * 3,即48个字节。如果您对48个字节进行编码,则会得到一行。如果您编码480个字节,则将获得10条完整行。

If you want your output to be MIME compatible, however, you need to have the output separated into lines. In this case, I would set the chunk size in the above example to something that, when multiplied by 4/3, gives you a round number of lines. For example, if you want to have 64 characters per line, each line encodes 64 / 4 * 3, which is 48 bytes. If you encode 48 bytes, you'll get one line. If you encode 480 bytes, you'll get 10 full lines.

因此,将上述BUFFER_SIZE修改为4800。而不是 Base64.getEncoder ()使用 Base64.getMimeEncoder(64,new byte [] {13,10})。然后,在对其进行编码时,除了最后一个,您将从每个块中获得100条全尺寸行。您可能需要在while循环中添加 result.append( \r\n)

So modify the above BUFFER_SIZE to something like 4800. Instead of Base64.getEncoder() use Base64.getMimeEncoder(64,new byte[] { 13, 10}). And then, when it encodes, you'll get 100 full-sized lines from each chunk except the last. You may need to add a result.append("\r\n") to the while loop.

这篇关于Java缓冲的base64编码器流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆