在Java中:为什么有些Stream方法使用int而不是字节或甚至char? [英] In Java: why some Stream methods take int instead of byte or even char?

查看:749
本文介绍了在Java中:为什么有些Stream方法使用int而不是字节或甚至char?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么有些方法写 bytes / chars 到流需要 int 而不是 byte / char



有人告诉我 code> char :
因为在java中的 char 只有2个字节的长度,使用,但是对于某些字符符号(chines或其他),字符以超过2个字节表示,因此我们使用int。



接近真相?



编辑:
我使用字来表示二进制和字符

解决方案

$(不只是二进制流)



< b $ b

有人告诉我在int而不是char:因为char在java只有2个字节的长度,这是对大多数字符符号已经在使用中,但对于某些字符符号(中文或任何),字符以超过2个字节表示,因此我们使用int。


假设在这一点上, Reader.read()方法,来自某人的陈述实际上不正确



确实有些Unicode码点的值大于65535,因此不能表示为单个Java char 。然而, Reader API实际上产生了一系列Java char 值(或-1),而不是Unicode序列编码点。这在 javadoc



如果您的输入包含大于65535的(适当编码的)Unicode代码点,那么您实际上需要调用 read()方法两次查看它。你会得到一个UTF-16代理对;即一起代表代码点的两个Java char 值。事实上,这适合于Java String,StringBuilder和StringBuffer类都工作的方式;他们都使用基于UTF-16的表示法...嵌入的代理对。



Reader.read()返回 int 不是 char 是允许它返回 -1 表示没有更多字符要读取。同样的逻辑解释为什么 InputStream.read()返回 int 而不是 code>。



假设,Java设计师可以指定 read()方法抛出异常以表示结束流条件。但是,这将刚刚替换一个潜在的错误源(无法测试结果)与另一个(未能处理异常)。此外,异常是相对昂贵的,并且流的结束不是真的意外/异常事件。总之,目前的方法更好,IMO。



Reader API是 read(char [],...)方法的签名,如果不使用代理对,那么如何处理码点大于65535? / p>

EDIT



DataOutputStream.writeChar int)看起来有点奇怪。但是,javadoc清楚地说明该参数被写为一个2字节的值。事实上,实现清楚地只写了底部两个字节到底层流。



我不认为有一个很好的理由。无论如何,有一个bug数据库条目( 4957024 ),其标记为11-关闭,不是缺陷与以下评论:


这不是一个伟大的设计或借口,但它太烤了,我们不能改变。


...这是一种至少从设计的角度来看,确认它是一个缺陷。



但这不是一件值得做的事情,IMO。 / p>

Why some methods that write bytes/chars to streams takes int instead of byte/char??

Someone told me in case of int instead of char: because char in java is just 2 bytes length, which is OK with most character symbols already in use, but for certain character symbols (chines or whatever), the character is being represented in more than 2 bytes, and hence we use int instead.

How far this explanation is close to the truth?

EDIT: I use the stream word to represent Binary and character streams (not Just Binary streams)

Thanks.

解决方案

Someone told me in case of int instead of char: because char in java is just 2 bytes length, which is OK with most character symbols already in use, but for certain character symbols (chinese or whatever), the character is being represented in more than 2 bytes, and hence we use int instead.

Assuming that at this point you are talking specifically about the Reader.read() method, the statement from "someone" that you have recounted is in fact incorrect.

It is true that some Unicode codepoints have values greater than 65535 and therefore cannot be represented as a single Java char. However, theReader API actually produces a sequence of Java char values (or -1), not a sequence of Unicode codepoints. This clearly stated in the javadoc.

If your input includes a (suitably encoded) Unicode code point that is greater than 65535, then you will actually need to call the read() method twice to see it. What you will get will be a UTF-16 surrogate pair; i.e. two Java char values that together represent the codepoint. In fact, this fits in with the way that the Java String, StringBuilder and StringBuffer classes all work; they all use a UTF-16 based representation ... with embedded surrogate pairs.

The real reason that Reader.read() returns an int not a char is to allow it to return -1 to signal that there are no more characters to be read. The same logic explains why InputStream.read() returns an int not a byte.

Hypothetically, I suppose that the Java designers could have specified that the read() methods throw an exception to signal the "end of stream" condition. However, that would have just replaced one potential source of bugs (failure to test the result) with another (failure to deal with the exception). Besides, exceptions are relatively expensive, and an end of stream is not really an unexpected / exceptional event. In short, the current approach is better, IMO.

(Another clue to the 16 bit nature of the Reader API is the signature of the read(char[], ...) method. How would that deal with codepoints greater than 65535 if surrogate pairs weren't used?)

EDIT

The case of DataOutputStream.writeChar(int) does seem a bit strange. However, the javadoc clearly states that the argument is written as a 2 byte value. And in fact, the implementation clearly writes only the bottom two bytes to the underlying stream.

I don't think that there is a good reason for this. Anyway, there is a bug database entry for this (4957024), which marked as "11-Closed, Not a Defect" with the following comment:

"This isn't a great design or excuse, but it's too baked in for us to change."

... which is kind of an an acknowledgement that it is a defect, at least from the design perspective.

But this is not something worth making a fuss about, IMO.

这篇关于在Java中:为什么有些Stream方法使用int而不是字节或甚至char?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆