InputStream和InputStreamReader读取多字节字符时的区别 [英] The difference between InputStream and InputStreamReader when reading multi-byte characters

查看:18
本文介绍了InputStream和InputStreamReader读取多字节字符时的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

InputStreamInputStreamReader 的区别在于 InputStream 读作 byte,而 InputStreamReader 读作 char.例如,如果文件中的文本是 abc,那么它们都可以正常工作.但是如果文本是a你们,由一个a和两个汉字组成,那么<​​code>InputStream就不起作用了.

The difference between InputStream and InputStreamReader is that InputStream reads as byte, while InputStreamReader reads as char. For example, if the text in a file is abc,then both of them work fine. But if the text is a你们, which is composed of an a and two Chinese characters, then the InputStream does not work.

所以我们应该使用InputStreamReader,但我的问题是:

So we should use InputStreamReader, but my question is:

InputStreamReader 如何识别字符?

How does InputStreamReader recognize characters?

a是一个字节,而一个汉字是两个字节.它是否将 a 读为一个字节并将另一个字符识别为两个字节,或者对于本文中的每个字符,InputStreamReader 是否将其读为两个字节?>

a is one byte, but a Chinese character is two bytes. Does it read a as one byte and recognize the other of characters as two bytes, or for every character in this text, does the InputStreamReader read it as two bytes?

推荐答案

InputStream 读取原始八位字节(8 位)数据.在Java中,byte类型等价于C中的char类型.在C中,该类型可用于表示字符数据或二进制数据.在 Java 中,char 类型与 C wchar_t 类型有更多的相似之处.

An InputStream reads raw octet (8 bit) data. In Java, the byte type is equivalent to the char type in C. In C, this type can be used to represent character data or binary data. In Java, the char type shares greater similarities with the C wchar_t type.

InputStreamReader 然后会将数据从某种编码转换为 UTF-16.如果a你们"在磁盘上被编码为UTF-8,则为字节序列61 E4 BD A0 E4 BB AC.当您使用 UTF-8 编码将 InputStream 传递给 InputStreamReader 时,它将被读取为字符序列 0061 4F60 4EEC.

An InputStreamReader then will transform data from some encoding into UTF-16. If "a你们" is encoded as UTF-8 on disk, it will be the byte sequence 61 E4 BD A0 E4 BB AC. When you pass the InputStream to InputStreamReader with the UTF-8 encoding, it will be read as the char sequence 0061 4F60 4EEC.

Java 中的字符编码 API 包含执行此转换的算法.您可以找到 Oracle JRE 支持的编码列表 这里.如果您想了解其工作原理,ICU 项目 是一个很好的起点练习.

The character encoding API in Java contains the algorithms to perform this transformation. You can find a list of encodings supported by the Oracle JRE here. The ICU project is a good place to start if you want to understand the internals of how this works in practice.

正如 Alexander Pogrebnyak 指出,您应该差不多始终明确提供编码.byte-to-char 不指定编码的方法依赖于 JRE 默认,取决于操作系统和用户设置.

As Alexander Pogrebnyak points out, you should almost always provide the encoding explicitly. byte-to-char methods that do not specify an encoding rely on the JRE default, which is dependent on operating systems and user settings.

这篇关于InputStream和InputStreamReader读取多字节字符时的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆