比较Java中的utf-8字符串 [英] Comparing utf-8 strings in java

查看:78
本文介绍了比较Java中的utf-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的Java程序中,我正在从xml检索一些数据.此xml包含很少的国际字符,并以utf8编码.现在,我使用xml解析器阅读此xml.从xml解析器检索到特定的国际字符串后,我需要将其与一组预定义的字符串进行比较.问题是当我在内部字符串比较中使用string.equals时失败.

In my java program, I am retrieving some data from xml. This xml has few international characters and is encoded in utf8. Now I read this xml using xml parser. Once I retrieve a particular international string from xml parser, I need to compare it with set of predefined strings. Problem is when I use string.equals on internatinal string comparison fails.

如何在Java中将字符串与国际字符串进行比较?我正在使用SAXParser&XMLReader可以从xml读取字符串.

How to compare strings with international strings in java ? I am using SAXParser & XMLReader to read strings from xml.

这是比较字符串的行

 String country;
 country = getXMLNodeString();

 if(country.equals("Côte d'Ivoire"))
 {    

 } 

  getXMLNodeString()
  {

  /* Get a SAXParser from the SAXPArserFactory. */  
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();

        /* Get the XMLReader of the SAXParser we created. */
        XMLReader xr = sp.getXMLReader();
        /* Create a new ContentHandler and apply it to the XML-Reader*/
        XmlParser xmlParser = new XmlParser();  //my class to parse xml
        xr.setContentHandler(xmlParser);  

        /* Parse the xml-data from our URL. */
        xr.parse(new InputSource(url.openStream()));
        /* Parsing has finished. */


       //return string here
  }

推荐答案

Java在内部将 String s存储为 char 的数组,这些数组是16位无符号值.它基于支持64K字符的早期Unicode标准.

Java stores Strings internally as an array of chars, which are 16 bit unsigned values. This was based on an earlier Unicode standard that supported 64K characters.

您的字符串常量科特迪瓦" 就是这种格式.如果XML文档上的字符编码正确,那么从那里读取的 String 也将采用正确的格式.因此可能的错误是:

Your String constant "Côte d'Ivoire" is in this format. If your character encoding on your XML document is correct then the String read from there will also be in the correct format. So possible errors are:

  1. XML文档未声明字符编码;

  1. The XML document doesn't declare a character encoding;

声明的字符编码与使用的实际字符编码不匹配.

The declared character encoding does not match the actual character encoding used.

也许XML字符串被视为US-ASCII而不是UTF-8.我将两者同时输出并加以注意.如果它们看起来相同,则逐个字符地比较它们,以查看比较失败的地方.您可能还想将常量 String 的UTF8编码与XML文档中的内容进行比较:

Perhaps the XML string is being treated as US-ASCII instead of UTF-8. I would output both and eyeball them. If they look the same, compare them character by character to see where teh comparison fails. You may also want to compare the UTF8 encoding of your constant String to what's in the XML document:

byte[] bytes = "Côte d'Ivoire".getBytes("UTF-8");

当您开始使用补充字符"时,它会变得更加复杂.这些字符超出了最初计划的64K(Unicode术语中的代码点").请参见 Java平台中的补充字符.这与您使用的任何字符都不成问题,但出于完整性考虑,值得注意.

It gets more complicated when you start getting into "supplementary characters". These are characters beyond the originally intended 64K ("code points" in Unicode parlance). See Supplementary Characters in the Java Platform. This isn't an issue with any of the characters you're using but it's worth noting for completeness.

这篇关于比较Java中的utf-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆