解析UTF-8 Encodded XML文件 [英] Parsing an UTF-8 Encodded XML file

查看:192
本文介绍了解析UTF-8 Encodded XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含从URL检索,因此我不得不EN code它的一些阿拉伯字符的XML文件的UTF-8,因此它可以处理这样的人物。

I have an XML File containing some Arabic Characters retrieved from a URL so I had to encode it in UTF-8 so it can handle such characters.

XML文件:

<Entry>

    <lstItems>            
           <item>
        <id>1</id>
            <title>News Test 1</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/news1.jpg</img>
           </item>
           <item>
        <id>2</id>
            <title>كريم</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/news2.jpg</img>
           </item>
           <item>
        <id>3</id>
            <title>News Test 333</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/news3.jpg</img>
           </item> 
           <item>
        <id>4</id>
            <title>ربيع</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/cont20.jpg</img>
           </item> 
           <item>
        <id>5</id>
            <title>News Test 55555</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/cont21.jpg</img>
           </item>      
           <item>
        <id>6</id>
            <title>News Test 666666</title>
            <subtitle>16/7/2012</subtitle>
        <img>joelle.mobi-mind.com/imgs/cont22.jpg</img>
           </item>               
    </lstItems>
  </Entry>

我解析从URL作为字符串检索的XML,如下所示:

I parsed the XML retrieved from a URL it as String as shown below:

public String getXmlFromUrl(String url) {

    try {
        return new AsyncTask<String, Void, String>() {
            @Override
            protected String doInBackground(String... params) {
                //String xml = null;
                try {
                    DefaultHttpClient httpClient = new DefaultHttpClient();
                    HttpGet httpPost = new HttpGet(params[0]);
                    HttpResponse httpResponse = httpClient.execute(httpPost);
                    HttpEntity httpEntity = httpResponse.getEntity();
                    xml = new String(EntityUtils.toString(httpEntity).getBytes(),"UTF-8");


                } catch (Exception e) {
                    e.printStackTrace();
                }
                return xml;




            }
        }.execute(url).get();
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (ExecutionException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return xml;
}

现在返回的字符串传递给此方法得到一个文件以备以后使用,如下所示:

Now the returned String is passed to this method to get a Document for later use as shown below:

public Document getDomElement(String xml){

        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

        try {

            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource();
            StringReader xmlstring=new StringReader(xml);
            is.setCharacterStream(xmlstring);
            is.setEncoding("UTF-8");
                    //Code Stops here !
            doc = db.parse(is); 


        } catch (ParserConfigurationException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (SAXException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (IOException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        }
        // return DOM
        return doc;

}

ocured与此消息的错误:

an Error ocured with this message:

09-18 07:51:40.441: E/Error:(1210): Unexpected token (position:TEXT @1:4 in java.io.StringReader@4144c240) 

所以,code崩溃,我上面显示,出现以下错误

So the code crashes where I showed above with the following Error

09-18 07:51:40.451: E/AndroidRuntime(1210): java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.university1/com.example.university1.MainActivity}: java.lang.NullPointerException

请注意,code正常工作与ISO编码。

Kindly note that the code works fine with ISO encoding.

推荐答案

您已经添加了一个 BOM 在UTF-8的文件。这是坏的。

You've added a BOM in your UTF-8 file. Which is bad.

也许你用记事本编辑的文件,或者你应该检查你的编辑器,以确保它不添加一个BOM。

Maybe you edited your file with Notepad, or maybe you should check your editor to ensure it doesn't add a BOM.

由于BOM好像是里面的文字,而不是在开始时,你也需要通过围绕其位置delete键删除它(这是不可见的,在大多数编辑)。这可能文件串联操作过程中发生的。

As the BOM seems to be inside the text and not at start, you also need to remove it by using the delete key around its position (it's invisible in most editors). This may have happened during a file concatenation operation.

这篇关于解析UTF-8 Encodded XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆