为什么string.indexOf方法的参数是Java中的一个int [英] Why is parameter to string.indexOf method is an int in Java

查看:626
本文介绍了为什么string.indexOf方法的参数是Java中的一个int的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么indexOf方法的参数是int,当描述说一个char时。



public int indexOf(int ch)

 返回这个字符串第一次出现的指定的**字符** 

http://download.oracle.com/javase/ 1,5.0 / docs / api / java / lang / String.html#indexOf%28int%29

 另外,这两个编译都很好:
char c ='p';
str.indexOf(2147483647);
str.indexOf(c);

a]基本上,我很困惑的是java中的int是32位,而unicode字符是16位。



b]为什么不使用字符本身而不是使用int。这是否有任何性能优化? char是否比int更难?如何?



我认为这应该是简单的推理,这使我更了解它!



解决方案

真实原因是 indexOf(int) 需要一个Unicode代码点,而不是一个16位的UTF-16字符。 Unicode码点实际上长达21位。



(较长代码点的UTF-16表示实际上是2个16位字符值,这些值称为前导和后置代理; D800 <分别为16和16,而DCFF为16位,分别为DFFF 16;参见 org / faq / utf_bom.htmlrel =noreferrer> Unicode FAQ - UTF-8,UTF-16,UTF-32& BOM 。)



如果您提供 indexOf(int)一个代码点> 65535,它将搜索UTF-16字符的编码代码点。



这是由javadoc声明的(虽然不是很清楚),而代码的检查表明这是该方法的实现方式。 / p>





为什么不使用16位字符?


这很明显。如果他们这样做,那么在字符串中找不到65535的代码点就不会有简单的方法。对于开发国际化应用程序的人来说,这对于文本可能包含此类代码点来说将是一个重大的不便。 (许多据称国际化的应用程序使错误的假设,一个 char 代表一个代码点,通常没有关系,但有时它是。)



但不应该对你有任何影响。如果您的Strings仅由16位代码组成,或者仅此一个ASCII代码,则该方法仍然可以工作。


I am wondering why the parameter to indexOf method an int , when the description says a char.

public int indexOf(int ch)

Returns the index within this string of the first occurrence of the specified **character**

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html#indexOf%28int%29

Also, both of these compiles fine:
char c = 'p';
str.indexOf(2147483647);
str.indexOf(c);

a]Basically, what I am confused about is int in java is 32bit , while unicode characters are 16 bits .

b]Why not use the character themselves rather than using int . Is this any performance optimization ?. Are chars difficult to represent than int ? How ?

I assume this should be simple reasoning for this and that makes me know about it even more !

Thanks!

解决方案

The real reason is that indexOf(int) expects a Unicode codepoint, not a 16-bit UTF-16 "character". Unicode code points are actually up to 21 bits in length.

(The UTF-16 representation of a longer codepoint is actually 2 16-bit "character" values. These values are known as leading and trailing surrogates; D80016 to DBFF16, and DC0016 to DFFF16 respectively; see Unicode FAQ - UTF-8, UTF-16, UTF-32 & BOM for the gory details.)

If you give indexOf(int) a code point > 65535 it will search for the pair of UTF-16 characters that encode the codepoint.

This is stated by the javadoc (albeit not very clearly), and an examination of the code indicates that this is indeed how the method is implemented.


Why not just use 16-bit characters ?

That's pretty obvious. If they did that, there wouldn't be an easy way to locate code points greater than 65535 in Strings. That would be a major inconvenience for people who develop internationalized applications where text may contain such code points. (A lot of supposedly internationalized applications make the incorrect assumption that a char represents a code point. Often it doesn't matter, but sometimes it does.)

But it shouldn't make any difference to you. The method will still work if your Strings consist of only 16 bit codes ... or, for that matter, of only ASCII codes.

这篇关于为什么string.indexOf方法的参数是Java中的一个int的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆