以字节为单位获取String w / encoding的大小而不转换为byte [] [英] Get size of String w/ encoding in bytes without converting to byte[]

查看:97
本文介绍了以字节为单位获取String w / encoding的大小而不转换为byte []的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种情况需要知道 String / encoding对的大小,以字节为单位,但不能使用 getBytes() 方法因为1) String 非常大并且在<$中复制 String c $ c> byte [] 数组将使用大量内存,但更多内容为2) getBytes()分配 byte [] 基于 String 长度的数组*每个字符的最大可能字节数。因此,如果我有一个带有1.5B字符和UTF-16编码的 String getBytes()将尝试分配一个3GB阵列并且失败,因为数组限制为2 ^ 32 - X字节(X是特定于Java版本的)。

I have a situation where I need to know the size of a String/encoding pair, in bytes, but cannot use the getBytes() method because 1) the String is very large and duplicating the String in a byte[] array would use a large amount of memory, but more to the point 2) getBytes() allocates a byte[] array based on the length of the String * the maximum possible bytes per character. So if I have a String with 1.5B characters and UTF-16 encoding, getBytes() will try to allocate a 3GB array and fail, since arrays are limited to 2^32 - X bytes (X is Java version specific).

所以 - 是否有某种方法来计算字节大小一个 String /编码对直接来自 String 对象?

So - is there some way to calculate the byte size of a String/encoding pair directly from the String object?

更新

这是jtahlborn回答的有效实施方式:

Here's a working implementation of jtahlborn's answer:

private class CountingOutputStream extends OutputStream {
    int total;

    @Override
    public void write(int i) {
        throw new RuntimeException("don't use");
    }
    @Override
    public void write(byte[] b) {
        total += b.length;
    }

    @Override public void write(byte[] b, int offset, int len) {
        total += len;
    }
}


推荐答案

简单,只需将其写入虚拟输出流:

Simple, just write it to a dummy output stream:

class CountingOutputStream extends OutputStream {
  private int _total;

  @Override public void write(int b) {
    ++_total;
  }

  @Override public void write(byte[] b) {
    _total += b.length;
  }

  @Override public void write(byte[] b, int offset, int len) {
    _total += len;
  }

  public int getTotalSize(){
     _total;
  }
}

CountingOutputStream cos = new CountingOutputStream();
Writer writer = new OutputStreamWriter(cos, "my_encoding");
//writer.write(myString);

// UPDATE: OutputStreamWriter does a simple copy of the _entire_ input string, to avoid that use:
for(int i = 0; i < myString.length(); i+=8096) {
  int end = Math.min(myString.length(), i+8096);
  writer.write(myString, i, end - i);
}

writer.flush();

System.out.println("Total bytes: " + cos.getTotalSize());

它不仅简单,而且可能与其他复杂答案一样快。

it's not only simple, but probably just as fast as the other "complex" answers.

这篇关于以字节为单位获取String w / encoding的大小而不转换为byte []的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆