我们可以在ASCII和Unicode之间切换吗 [英] Can we switch between ASCII and Unicode

查看:139
本文介绍了我们可以在ASCII和Unicode之间切换吗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了"char变量为Unicode格式,但也很好地采用/映射到ASCII".有什么需要提及的?当然ASCII是1个字节,而Unicode是2.而且Unicodeitself本身包含ASCII代码(默认情况下-这是它的标准).那么,是否有某些语言的char变量支持UNICODE但不支持ASCII?

I came across "char variable is in Unicode format, but adopts / maps well to ASCII also". What is the need to mention that? Of course ASCII is 1 byte and Unicode is 2. And Unicodeitself contains ASCII code in it (by default - its the standard). So are there some languages in which a char variable supports UNICODE but not ASCII?

此外,字符格式(Unicode/ASCII)由我们使用的平台决定,对吗? (UNIX,Linux,Windows等).因此,假设我的平台使用ASCII,是否无法切换到Unicode,反之亦然?

Also, the character format (Unicode/ASCII) is decided by the platform we use, right? (UNIX, Linux, Windows etc). So suppose my platform used ASCII, is it not possible to switch to Unicode or vice-versa?

推荐答案

Java内部使用Unicode.总是. 实际上,它大多数时候都使用UTF-16,但是现在这太详细了.

Java uses Unicode internally. Always. Actually, it uses UTF-16 most of the time, but that's too much detail for now.

它不能在内部使用ASCII(例如对于String).您可以表示可以用Unicode的ASCII表示的任何字符串,所以应该没有问题.

It can not use ASCII internally (for a String for example). You can represent any String that can be represented in ASCII in Unicode, so that should not be a problem.

平台发挥作用的唯一地方是Java,而您未指定编码时必须选择一种编码.例如,当您创建FileWriter并将String值写入字符串时:此时,Java需要使用编码来指定应如何将特定字符映射到字节.如果未指定,则使用平台的默认编码.该默认编码为几乎从不ASCII .大多数Linux平台使用UTF-8,Windows 经常使用某些ISO-8859- *派生词(或其他特定于区域性的8位编码),但是当前没有操作系统使用ASCII(仅仅是因为ASCII不能代表很多重要角色).

The only place where the platform comes into play is when Java has to choose an encoding when you didn't specify one. For example, when you create a FileWriter to write String values to a String: at that point Java needs to use an encoding to specify how the specific character should be mapped to bytes. If you don't specify one, then the default encoding of the platform is used. That default encoding is almost never ASCII. Most Linux platforms use UTF-8, Windows often uses some ISO-8859-* derivatives (or other culture-specific 8-bit encodings), but no current OS uses ASCII (simply because ASCII can't represent a lot of important characters).

事实上,如今,纯ASCII几乎无关紧要:没有人使用它. ASCII作为大多数8位编码(包括UTF-8)映射的公共子集,很重要:在许多情况下,较低的128个Unicode代码点将1:1映射为数字值0-127,许多编码.但是纯ASCII(值128-255为未定义)不再有效.

In fact, pure ASCII is almost irrelevant these days: no one uses it. ASCII is only important as a common subset of the mapping of most 8-bit encodings (including UTF-8): the lower 128 Unicode codepoints map 1:1 to the numeric values 0-127 in many, many encodings. But pure ASCII (where the values 128-255 are undefined) is no longer in active use.

请注意,Java 9有一个内部优化,称为紧凑字符串",其中仅包含Latin-1中可表示的字符的字符串每个字符使用一个字节,而不是2.计算机语言"(例如XML和类似的协议),其中大部分文本位于ASCII范围内.但这对开发人员也是完全透明的,因为所有处理都是在String类内部完成的,因此从外部看不到.

As a side note, Java 9 has an internal optimization called "compact strings" where Strings that contain only characters representable in Latin-1 use a single byte per character instead of 2. This optimization is very useful for all kinds of "computer speak" like XML and similar protocols where the majority of the text is in the ASCII range. But it's also fully transparent to the developer, as all that handling is done internally in the String class and will not be visible from the outside.

这篇关于我们可以在ASCII和Unicode之间切换吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆