特殊字符的XDocument [英] Special characters with XDocument
问题描述
我试图读取一个文件(不是XML,但结构类似),但我得到这个异常:
I'm trying to read a file (not a XML, but the structure is similar), but i'm getting this Exception:
'┴', hexadecimal value 0x15, is an invalid character. Line 8, position 7.
和文件有很多这样的符号,我不能取代,因为我不能修改文件的内容,我的目的......
and the file have a lot of this symbols, that I can't replace because I can't modify the content of the file for my purposes...
这是代码:
try
{
XDocument doc = new XDocument(new XDeclaration("1.0", "utf-16", "yes"));
doc = XDocument.Load(arquivo);
}
catch (Exception e)
{
MessageBox.Show(e.Message.ToString());
}
这就是该文件的某些部分:
and that's some part of the file:
<Codepage>UTF16</Codepage>
<Segment>0000016125
<Control>0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf
</Control>
<Source>To blablablah the firewall to blablablah local IP address. </Source>
<Target>Para blablablah a uma blablablah local específico. </Target>
</Segment>
请注意:该文件没有编码XML声明
Note: The file don't have the encode xml declaration.
推荐答案
此XML是非常糟糕;
This XML is pretty bad;
- 您有
<段> 0000016125
在那里它,而不是技术上非法的(这是一个文本节点),就是那种奇 - 您<。 code><控制> 元素包含无效字符没有一个XML
CDATA
部分
- You have
<Segment>0000016125
in there which, while not technically illegal (it is a Text node), is just kind of odd. - Your
<Control>
element contains invalid characters without an XMLCDATA
section
您可以手动正常化XML或通过字符串处理,正则表达式或,或类似的东西做它在C#。
You can manually normalize the XML or do it in C# via string manipulation, or RegEx, or something similar.
在您简单的例子,只有<控制>
元素具有无效字符;因此它是比较简单的解决它,并添加使用与string.replace()
的方法,使一个 CDATA
部分它看起来像这样:
In your simple example, only the <Control>
element has invalid characters; therefore it is relatively simple to fix it and add a CDATA
section using the string.Replace()
method, to make it look like this:
<Control><![CDATA[0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></Control>
然后就可以良好的XML加载到你的的XDocument
使用 XDocument.Parse(字符串XML)
:
string badXml = @"
<temproot>
<Codepage>UTF16</Codepage>
<Segment>0000016125
<Control>0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</Control>
<Source>To blablablah the firewall to blablablah local IP address. </Source>
<Target>Para blablablah a uma blablablah local específico. </Target>
</Segment>
</temproot>";
// assuming only <control> element has the invalid characters
string goodXml = badXml
.Replace("<Control>", "<Control><![CDATA[")
.Replace("</Control>", "]]></Control>");
XDocument xDoc = XDocument.Parse(goodXml);
xDoc.Declaration = new XDeclaration("1.0", "utf-16", "yes");
// do stuff with xDoc
这篇关于特殊字符的XDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!