按字节截断字符串 [英] Truncating Strings by Bytes

查看:32
本文介绍了按字节截断字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了以下内容,用于将 java 中的字符串截断为具有给定字节数的新字符串.

I create the following for truncating a string in java to a new string with a given number of bytes.

        String truncatedValue = "";
        String currentValue = string;
        int pivotIndex = (int) Math.round(((double) string.length())/2);
        while(!truncatedValue.equals(currentValue)){
            currentValue = string.substring(0,pivotIndex);
            byte[] bytes = null;
            bytes = currentValue.getBytes(encoding);
            if(bytes==null){
                return string;
            }
            int byteLength = bytes.length;
            int newIndex =  (int) Math.round(((double) pivotIndex)/2);
            if(byteLength > maxBytesLength){
                pivotIndex = newIndex;
            } else if(byteLength < maxBytesLength){
                pivotIndex = pivotIndex + 1;
            } else {
                truncatedValue = currentValue;
            }
        }
        return truncatedValue;

这是我想到的第一件事,我知道我可以改进它.我在那里看到另一篇提出类似问题的帖子,但他们使用字节而不是 String.substring 截断字符串.我想在我的情况下我宁愿使用 String.substring .

This is the first thing that came to my mind, and I know I could improve on it. I saw another post that was asking a similar question there, but they were truncating Strings using the bytes instead of String.substring. I think I would rather use String.substring in my case.

我刚刚删除了 UTF8 引用,因为我更愿意为不同的存储类型执行此操作.

I just removed the UTF8 reference because I would rather be able to do this for different storage types aswell.

推荐答案

为什么不转换为字节并向前走——这样做时遵守 UTF8 字符边界——直到获得最大数量,然后转换这些字节回到一个字符串?

Why not convert to bytes and walk forward--obeying UTF8 character boundaries as you do it--until you've got the max number, then convert those bytes back into a string?

或者,如果您跟踪应该发生剪切的位置,则可以直接剪切原始字符串:

Or you could just cut the original string if you keep track of where the cut should occur:

// Assuming that Java will always produce valid UTF8 from a string, so no error checking!
// (Is this always true, I wonder?)
public class UTF8Cutter {
  public static String cut(String s, int n) {
    byte[] utf8 = s.getBytes();
    if (utf8.length < n) n = utf8.length;
    int n16 = 0;
    int advance = 1;
    int i = 0;
    while (i < n) {
      advance = 1;
      if ((utf8[i] & 0x80) == 0) i += 1;
      else if ((utf8[i] & 0xE0) == 0xC0) i += 2;
      else if ((utf8[i] & 0xF0) == 0xE0) i += 3;
      else { i += 4; advance = 2; }
      if (i <= n) n16 += advance;
    }
    return s.substring(0,n16);
  }
}

注意:在 2014 年 8 月 25 日编辑以修复错误

这篇关于按字节截断字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆