特殊字符的XDocument [英] Special characters with XDocument

查看:321
本文介绍了特殊字符的XDocument的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图读取一个文件(不是XML,但结构类似),但我得到这个异​​常:

I'm trying to read a file (not a XML, but the structure is similar), but i'm getting this Exception:

'┴', hexadecimal value 0x15, is an invalid character. Line 8, position 7.

和文件有很多这样的符号,我不能取代,因为我不能修改文件的内容,我的目的......

and the file have a lot of this symbols, that I can't replace because I can't modify the content of the file for my purposes...

这是代码:

try
{
    XDocument doc = new XDocument(new XDeclaration("1.0", "utf-16", "yes"));
    doc = XDocument.Load(arquivo);
}
catch (Exception e)
{
    MessageBox.Show(e.Message.ToString());
}



这就是该文件的某些部分:

and that's some part of the file:

<Codepage>UTF16</Codepage>
<Segment>0000016125
    <Control>0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf
    </Control>
    <Source>To blablablah the   firewall to blablablah local IP address.    </Source>
    <Target>Para blablablah a uma blablablah local específico.  </Target>
</Segment>

请注意:该文件没有编码XML声明

Note: The file don't have the encode xml declaration.

推荐答案

此XML是非常糟糕;

This XML is pretty bad;


  1. 您有<段> 0000016125 在那里它,而不是技术上非法的(这是一个文本节点),就是那种奇

  2. 您<。 code><控制> 元素包含无效字符没有一个XML CDATA 部分

  1. You have <Segment>0000016125 in there which, while not technically illegal (it is a Text node), is just kind of odd.
  2. Your <Control> element contains invalid characters without an XML CDATA section

您可以手动正常化XML或通过字符串处理,正则表达式或,或类似的东西做它在C#。

You can manually normalize the XML or do it in C# via string manipulation, or RegEx, or something similar.

在您简单的例子,只有<控制> 元素具有无效字符;因此它是比较简单的解决它,并添加使用与string.replace()的方法,使一个 CDATA 部分它看起来像这样:

In your simple example, only the <Control> element has invalid characters; therefore it is relatively simple to fix it and add a CDATA section using the string.Replace() method, to make it look like this:

<Control><![CDATA[0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></Control>



然后就可以良好的XML加载到你的的XDocument 使用 XDocument.Parse(字符串XML)

string badXml = @"
    <temproot>
        <Codepage>UTF16</Codepage>
        <Segment>0000016125
            <Control>0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</Control>
            <Source>To blablablah the   firewall to blablablah local IP address.    </Source>
            <Target>Para blablablah a uma blablablah local específico.  </Target>
        </Segment>
    </temproot>";

// assuming only <control> element has the invalid characters
string goodXml = badXml
    .Replace("<Control>", "<Control><![CDATA[")
    .Replace("</Control>", "]]></Control>");

XDocument xDoc = XDocument.Parse(goodXml);
xDoc.Declaration = new XDeclaration("1.0", "utf-16", "yes");

// do stuff with xDoc

这篇关于特殊字符的XDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆