为什么流行的Java Base64编码库使用OutputStreams进行编码而使用InputStreams进行编码? [英] Why do popular Java Base64 encoding libraries use OutputStreams for Encoding and InputStreams for encoding?

查看:77
本文介绍了为什么流行的Java Base64编码库使用OutputStreams进行编码而使用InputStreams进行编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试解决Java程序中的内存问题,在Java程序中,我们将整个文件加载到内存中,对它进行base64编码,然后将其用作发布请求中的表单参数.这是由于文件非常大而导致的OOME.

I have been trying to solve a memory issue in a Java program where we are loading an entire file into memory, base64 encoding it and then using it as a form parameter in a post request. This is cause OOME due to the extremely large file size.

我正在研究一种解决方案,该解决方案使我能够通过base64编码器将文件流式传输到Http Post请求的请求主体中.我在所有流行的编码库(Guava,java.util.Base64,android.util.Base64和org.apache.batik.util)中注意到的常见模式之一是 if 该库支持使用流进行编码,编码始终通过OutputStream进行,而解码始终通过InputStream进行.

I am working on a solution where I am able to stream the file through a base64 encoder, into the request body of an Http Post request. One of the common patterns I have noticed in all of the popular encoding libraries( Guava, java.util.Base64, android.util.Base64 and org.apache.batik.util ) is that if the library supports encoding with Streams, the Encoding is always done through an OutputStream and the Decoding is always done through an InputStream.

我在寻找/确定这些决定背后的原因时遇到了麻烦.鉴于这么多流行且编写良好的库都符合此api设计,因此我认为这是有原因的. 使这些解码器之一成为InputStream或接受InputStream似乎并不困难,但是我想知道这些编码器是否采用这种方式设计是否存在有效的架构原因.

I am having trouble finding/determining the reasoning behind these decisions. Given that so many of these popular and well-written libraries align with this api design, I assume that there is a reason for this. It doesn't seem very difficult to adapt one of these decoders to become an InputStream or accept an InputStream, but I am wondering if there is a valid architectural reason these encoders are designed this way.

为什么常见的库为什么通过OuputStream进行Base64编码而通过InputStream进行Base64解码?

Why do common libraries do Base64 encoding through an OuputStream and Base64 decoding through an InputStream?

支持我的主张的示例:

java.util.Base64
 - Base64.Decoder.wrap(InputStream stream)
 - Base64.Encoder.wrap(OutputStream stream)

android.util.Base64
 - Base64InputStream  // An InputStream that does Base64 decoding on the data read through it.
 - Base64OutputStream // An OutputStream that does Base64 encoding

google.common.io.BaseEncoding
 - decodingStream(Reader reader)
 - encodingStream(Writer writer)

org.apache.batik.util
 - Base64DecodeStream implements InputStream
 - Base64EncodeStream implements OutputStream

推荐答案

是的,您可以将其反转,但这是最有意义的. Base64用于使二进制数据(由应用程序生成或操作)与基于文本的外部环境兼容. 因此,外部始终需要基数为64的编码数据,而内部始终需要已解码的二进制数据.

Well, yes, you can reverse it, but this makes the most sense. Base64 is used to make binary data - generated or operated on by the application - compatible with a text based outside environment. So the base 64 encoded data is always required on the outside and the decoded binary data is required on the inside.

应用程序通常不会对基于64位编码的数据本身执行任何操作;在需要或期望使用文本界面时,只需与其他应用程序进行二进制数据通信即可.

An application generally doesn't perform any operations on the base 64 encoded data itself; it is just needed to communicate binary data with another application when a text interface is required or expected.

如果要将二进制数据导出到外部,自然会使用输出流.如果该数据需要使用base 64进行编码,请确保将数据发送到编码为base 64的输出流.

If you want to export your binary data to the outside, naturally you would use an output stream. If that data needs to be encoded in base 64, you make sure you send the data to an output stream that encodes to base 64.

如果要从外部导入二进制数据,则可以使用输入流.如果该数据是使用base 64编码的,则首先需要对其进行解码,因此在将其视为二进制流之前,请确保已对其进行解码.

If you want to import your binary data from the outside then you would use an input stream. If that data is encoded in base 64 then you first need to decode it, so you make sure you decode it before treating it as a binary stream.

让我们创建一些图片.假设您有一个在面向文本的环境中运行但对二进制数据运行的应用程序.重要的部分是左侧应用程序上下文中箭头的方向.

Lets create a bit of a picture. Say you have an application that operates in a textual oriented environment but operates on binary data. The important part is the direction of the arrows from the context of the application on the left.

然后您获得输入(读取呼叫):

Then you get for the input (read calls):

{APPLICATION} <- (binary data decoding) <- (base64 decoding) <- (file input stream) <- [BASE 64 ENCODED FILE]

为此,您自然会使用输入流.

for this you naturally use input streams.

所以让我们看一下输出(写调用):

So let's look at the output (write calls):

{APPLICATION} -> (binary data encoding) -> (base64 encoding) -> (file output stream) -> [BASE 64 ENCODED FILE]

为此,您自然会使用输出流.

for this you naturally use output streams.

可以通过将它们链接在一起,即使用一个流作为另一个流的父级,将这些流彼此连接.

These stream can be connected to each other by chaining them together, i.e. using one stream as parent of the other stream.

这是Java中的示例.注意,在数据类本身中创建二进制编码器/解码器有点麻烦;通常,您会为此使用另一个类-我希望它足以满足演示目的.

Here is an example in Java. Note that creating the binary encoder/decoder in the data class itself is a bit ugly; generally you would use another class for that - I hope it suffices for demonstration purposes.

import static java.nio.charset.StandardCharsets.UTF_8;

import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Base64;

public class BinaryHandlingApplication {

    /**
     * A data class that encodes to binary output, e.g. to interact with an application in another language.
     * 
     * Binary format: [32 bit int element string size][UTF-8 element string][32 bit element count]
     * The integers are signed, big endian values.
     * The UTF-8 string should not contain a BOM.
     * Note that this class doesn't know anything about files or base 64 encoding.
     */
    public static class DataClass {
        private String element;
        private int elementCount;

        public DataClass(String element) {
            this.element = element;
            this.elementCount = 1;
        }

        public String getElement() {
            return element;
        }

        public void setElementCount(int count) {
            this.elementCount = count;
        }

        public int getElementCount() {
            return elementCount;
        }

        public String toString() {
            return String.format("%s count is %d", element, elementCount);
        }

        public void save(OutputStream out) throws IOException {

            DataOutputStream dataOutputStream = new DataOutputStream(out);

            // so here we have a chain of:
            // a dataoutputstream on a base 64 encoding stream on a fileoutputstream 


            byte[] utf8EncodedString = element.getBytes(UTF_8);
            dataOutputStream.writeInt(utf8EncodedString.length);
            dataOutputStream.write(utf8EncodedString);

            dataOutputStream.writeInt(elementCount);
        }

        public void load(InputStream in) throws IOException {
            DataInputStream dataInputStream = new DataInputStream(in);

            // so here we have a chain of:
            // a datainputstream on a base 64 decoding stream on a fileinputstream 

            int utf8EncodedStringSize = dataInputStream.readInt();
            byte[] utf8EncodedString = new byte[utf8EncodedStringSize];
            dataInputStream.readFully(utf8EncodedString);
            this.element = new String(utf8EncodedString, UTF_8);

            this.elementCount = dataInputStream.readInt();
        }

    }

    /**
     * Create the a base 64 output stream to a file; the file is the text oriented
     * environment.
     */
    private static OutputStream createBase64OutputStreamToFile(String filename) throws FileNotFoundException {
        FileOutputStream textOutputStream = new FileOutputStream(filename);
        return Base64.getUrlEncoder().wrap(textOutputStream);
    }

    /**
     * Create the a base 64 input stream from a file; the file is the text oriented
     * environment.
     */
    private static InputStream createBase64InputStreamFromFile(String filename) throws FileNotFoundException {
        FileInputStream textInputStream = new FileInputStream(filename);
        return Base64.getUrlDecoder().wrap(textInputStream);
    }

    public static void main(String[] args) throws IOException {
        // this text file acts as the text oriented environment for which we need to encode
        String filename = "apples.txt";

        // create the initial class
        DataClass instance = new DataClass("them apples");
        System.out.println(instance);

        // perform some operation on the data
        int newElementCount = instance.getElementCount() + 2;
        instance.setElementCount(newElementCount);

        // write it away
        try (OutputStream out = createBase64OutputStreamToFile(filename)) {
            instance.save(out);
        }

        // read it into another instance, who cares
        DataClass changedInstance = new DataClass("Uh yeah, forgot no-parameter constructor");
        try (InputStream in = createBase64InputStreamFromFile(filename)) {
            changedInstance.load(in);
        }
        System.out.println(changedInstance);
    }
}

特别注意流的链接,当然也没有任何缓冲区任何.我使用了URL安全的base 64(以防您想改用HTTP GET).

Especially note the chaining of the streams and of course the absence of any buffers whatsoever. I've used URL-safe base 64 (in case you want to use HTTP GET instead).

当然,在您的情况下,您可以使用URL生成HTTP POST请求,并通过包装将直接编码到检索到的OutputStream流.这样,就不需要(广泛地)缓存基本64编码数据.在OutputStream 此处中查看示例.

In your case, of course, you could generate a HTTP POST request using an URL and directly encode to the retrieved OutputStream stream by wrapping it. That way no base 64 encoded data needs to be (extensively) buffered. See examples on how to get to the OutputStream here.

请记住,如果您需要缓冲,那就错了.

如评论中所述,HTTP POST不需要base 64编码,但是无论如何,现在您知道了如何直接将base 64编码到HTTP连接.

As mentioned in the comments, HTTP POST doesn't need base 64 encoding but whatever, now you know how you can encode base 64 directly to a HTTP connection.

java.util.Base64特定说明: 尽管base 64是文本,但是base64流会生成/使用字节. 它只是假设采用ASCII编码(这对于UTF-16文本可能很有趣). 我个人认为这是一个糟糕的设计决策.他们应该用ReaderWriter包装,即使这会稍微降低编码速度.

java.util.Base64 specific note: Although base 64 is text, the base64 stream generates / consumes bytes; it simply assumes ASCII encoding (this can be fun for UTF-16 text). Personally I think this is a terrible design decision; they should have wrapped a Reader and Writer instead, even if that slows down encoding slightly.

为防御起见,各种基础64标准和RFC也弄错了.

To their defense, the various base 64 standards and RFC also get this wrong.

这篇关于为什么流行的Java Base64编码库使用OutputStreams进行编码而使用InputStreams进行编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆