如何建立编码字符的编码点? [英] How to establish the codepoint of encoded characters?

查看：127 发布时间：2020/7/19 22:33:48 java unicode codepoint

本文介绍了如何建立编码字符的编码点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个字节流(代表字符)并对该流进行编码，我如何获得字符的代码点?

Given a stream of bytes (that represent characters) and the encoding of the stream, how would I obtain the code points of the characters?

InputStreamReader r = new InputStreamReader(bla, Charset.forName("UTF-8"));
int whatIsThis = r.read();

上面的代码段中的read()返回了什么?是unicode代码点吗?

What is returned by read() in the above snippet? Is it the unicode codepoint?

A char is (implicitly) a 16-bit code unit in the UTF-16BE encoding. This encoding can represent basic multilingual plane characters with a single char. The supplementary range is represented using two-char sequences.

Character 类型包含将UTF-16代码单元转换为Unicode代码点的方法:

The Character type contains methods for translating UTF-16 code units to Unicode code points:

需要两个char的代码点将满足 codePointAt 方法可用于从代码单元序列中提取代码点.从代码点到UTF-16代码单元，都有类似的工作方法.

A code point that requires two chars will satisfy the isHighSurrogate and isLowSurrogate when you pass in two sequential values from a sequence. The codePointAt methods can be used to extract code points from code unit sequences. There are similar methods for working from code points to UTF-16 code units.

代码点流阅读器的示例实现:

A sample implementation of a code point stream reader:

import java.io.*;
public class CodePointReader implements Closeable {
  private final Reader charSource;
  private int codeUnit;

  public CodePointReader(Reader charSource) throws IOException {
    this.charSource = charSource;
    codeUnit = charSource.read();
  }

  public boolean hasNext() { return codeUnit != -1; }

  public int nextCodePoint() throws IOException {
    try {
      char high = (char) codeUnit;
      if (Character.isHighSurrogate(high)) {
        int next = charSource.read();
        if (next == -1) { throw new IOException("malformed character"); }
        char low = (char) next;
        if(!Character.isLowSurrogate(low)) {
          throw new IOException("malformed sequence");
        }
        return Character.toCodePoint(high, low);
      } else {
        return codeUnit;
      }
    } finally {
      codeUnit = charSource.read();
    }
  }

  public void close() throws IOException { charSource.close(); }
}

这篇关于如何建立编码字符的编码点?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何建立编码字符的编码点? [英] How to establish the codepoint of encoded characters?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何建立编码字符的编码点? [英] How to establish the codepoint of encoded characters?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭