在UTF-8流中间打开InputStreamReader [英] Opening InputStreamReader in the middle of UTF-8 stream

查看:172
本文介绍了在UTF-8流中间打开InputStreamReader的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个可寻求的InputStream,它将特定位置的流返回给我。流中的底层数据用UTF-8编码。我想使用inputStreamReader打开此流,并一次读取一个字符。

I am using a seekable InputStream which returns the stream to me at a specific position. The underlying data in the stream is encoded with UTF-8. I want to open this stream using inputStreamReader and read one character at a time.

这是我的代码片段

inputStream.seek(position-1);
InputStreamReader reader = new InputStreamReader(inputStream, "UTF-8");

问题是如果position-1可能指向多字节UTF- 8序列。如何检测,确保从新的UTF-8编码序列开始?感谢提前。

The problem is that if position-1 could be pointing to the middle of a multi-byte UTF-8 sequence. How can I detect that make sure it starts from a new UTF-8 encoded sequence? Thanks in advance.

推荐答案

假设您可以随时重新定位流,您可以简单地读取字节,而前两位是10。所以这样的东西:

Assuming you can reposition the stream whenever you want, you can simply read bytes while the top two bits are "10". So something like:

// InputStream doesn't actually have a seek method, but I'll assume you're using
// a subclass which does...
inputStream.seek(position);
while (true) {
    int nextByte = inputStream.read();
    if (nextByte == -1 || (nextByte & 0xc0) != 0xc0) {
       break;
    }
    position++;
}
// Undo the last read, effectively
inputStream.seek(position);
InputStreamReader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);

这篇关于在UTF-8流中间打开InputStreamReader的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆