Unicode中的变音字符 [英] Umlaut characters in Unicode

查看:142
本文介绍了Unicode中的变音字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,


你认为这个文件是一个合适的Unicode文件吗?

http://belnet.dl.sourceforge.net/sou...t-example3.xml


<?xml version =" 1.0" encoding =" UTF-8">

...

< resource id =" 1" name ="AndreasPlüschke"功能= QUOT; 10" contacts ="" />


因为ü变形金刚字符而要求我。

我猜测作者使用的是ISO-8859 -1

环境但忘记更改编码

声明从UTF-8到ISO-8859-1。

解决方案



$ b $bJürgenKahrs写道:

你认为这个文件是一个合适的Unicode文件吗? />
http:// belnet。 dl.sourceforge.net/sou...t-example3.xml

<?xml version =" 1.0" encoding =" UTF-8"?>
...
< resource id =" 1" name ="AndreasPlüschke"功能= QUOT; 10" contacts ="" />

我问的是因为ü变形金刚字符。




为什么变音符号有问题? Unicode肯定包含/允许umlaut

个字符。

-


Martin Honnen
http://JavaScript.FAQTs.com/


Martin Honnen写道:

为什么变音符号有问题? Unicode肯定包含/允许元音符号
字符。




变音符号不是Unicode的问题。

变音符号是一个问题,如果你用ISO-8859-1模式的编辑器写一个文本



用UTF-8

模式的编辑器观看文本。


例如,在撰写此帖子时,

我使用的是ISO-8859-1模式,这是一个u-Umlaut:ü

现在,将您的新闻阅读器切换到UTF-8,然后你就会发现这个角色看起来不再是一个u-umlaut了。

u-umlaut。


文章< 2v ************* @ uni-berlin.de>,
$ b $bJürgenKahrs< Ju *********************@vr-web.de>写道:

:Martin Honnen写道:

:>为什么变音是一个问题? Unicode当然包含/允许umlaut
:>角色。

:变音符号不是Unicode的问题。
:如果你写一个文本,变音符号是一个问题
:使用ISO-8859-1模式的编辑器
:用UTF-8编辑器观看文字
:模式。

:例如,在写这篇文章的时候,
:我用的是ISO-8859 -1模式,这是一个u-Umlaut:ü
:现在,将你的新闻阅读器切换到UTF-8然后你会发现这个角色看起来不像是
:你好-umlaut了。




这正是我们在申请时遇到的问题,

以UTF格式存储数据-8编码的XML文档。


我们在Java应用程序内部维护所有内容作为

DOM的一部分,并将其保存到外部文件中根据要求。但我们没有将
强制写入文件的字节流编码为UTF-8,所以它在我们的美国系统上使用了默认的ISO-8859-1 。当下一次

尝试读取文件时(仅当出现此类字符时),

发生错误,因为存在非UTF-8字符。


我们找到的解决方案是使用UTF-8编码序列化DOM

指定(我们已经在做)然后还指定UTF-8
写入时输出文件流上的
编码。完成后,

在编辑器中打开这样一个XML文件,清楚地显示了一些东西,它们与变音符号,重音或其他特殊功能的字母不相似。 br />

=史蒂夫=

-

Steve W. Jackson

蒙哥马利,阿拉巴马州


Hello,

do you think that this file is a proper Unicode file?

http://belnet.dl.sourceforge.net/sou...t-example3.xml

<?xml version="1.0" encoding="UTF-8"?>
...
<resource id="1" name="Andreas Plüschke" function="10" contacts=""/>

I am asking because of the ü Umlaut character.
I am guessing that the author used an ISO-8859-1
environment but forgot to change the encoding
declaration from UTF-8 to ISO-8859-1.

解决方案



Jürgen Kahrs wrote:

do you think that this file is a proper Unicode file?

http://belnet.dl.sourceforge.net/sou...t-example3.xml
<?xml version="1.0" encoding="UTF-8"?>
...
<resource id="1" name="Andreas Plüschke" function="10" contacts=""/>

I am asking because of the ü Umlaut character.



Why is an umlaut a problem? Unicode certainly contains/allows umlaut
characters.
--

Martin Honnen
http://JavaScript.FAQTs.com/


Martin Honnen wrote:

Why is an umlaut a problem? Unicode certainly contains/allows umlaut
characters.



Umlaut is not a problem for Unicode.
Umlaut is a problem if you write a text
with an editor in ISO-8859-1 mode and
watch the text with an editor in UTF-8
mode.

For example, while writing this posting,
I use ISO-8859-1 mode and this is an u-Umlaut: ü
Now, switch your news reader to UTF-8 and you
will find that the character does not look like
an u-umlaut anymore.


In article <2v*************@uni-berlin.de>,
Jürgen Kahrs <Ju*********************@vr-web.de> wrote:

:Martin Honnen wrote:
:
:> Why is an umlaut a problem? Unicode certainly contains/allows umlaut
:> characters.
:
:Umlaut is not a problem for Unicode.
:Umlaut is a problem if you write a text
:with an editor in ISO-8859-1 mode and
:watch the text with an editor in UTF-8
:mode.
:
:For example, while writing this posting,
:I use ISO-8859-1 mode and this is an u-Umlaut: ü
:Now, switch your news reader to UTF-8 and you
:will find that the character does not look like
:an u-umlaut anymore.



That''s precisely the problem we''ve encountered with our application,
which stores its data in UTF-8 encoded XML documents.

We maintain everything internally in our Java application as part of a
DOM, and it''s saved to an external file on request. But we failed to
force the byte stream written to the file to be encoded to UTF-8, so it
used the default ISO-8859-1 on our American systems. When the next
attempt was made to read the file (only if such characters appeared),
errors occurred because there were non-UTF-8 characters present.

The solution we found was to serialize the DOM with UTF-8 encoding
specified (which we were already doing) and then also specify UTF-8
encoding on the output file stream when writing. When this was done,
opening such an XML file in an editor clearly showed something that did
not resemble the letter with umlaut, or accent, or other special feature.

= Steve =
--
Steve W. Jackson
Montgomery, Alabama


这篇关于Unicode中的变音字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆