使XmlReaderSettings CheckCharacters适用于xml字符串 [英] Making XmlReaderSettings CheckCharacters work for xml string

查看:93
本文介绍了使XmlReaderSettings CheckCharacters适用于xml字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自Adobe PDF AcroForms的xml字符串,该字符串显然允许命名以数字字符开头的表单字段.我正在尝试将此字符串解析为XDocument:

I have an xml string coming from Adobe PDF AcroForms, which apparently allows naming form fields starting with numeric characters. I'm trying to parse this string to an XDocument:

XDocument xDocument = XDocument.Parse(xmlString);

但是,每当遇到这样的表单字段(其中名称以数字char开头)时,xml解析都会引发XmlException:

But whenever I encounter such a form field where the name starts with a numeric char, the xml parsing throws an XmlException:

名称不能以数字"字符开头

Name cannot begin with the 'number' character

我发现的其他解决方案与使用有关: XmlReaderSettings.CheckCharacters

Other solutions I found were about using: XmlReaderSettings.CheckCharacters

using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString), new XmlReaderSettings() { CheckCharacters = false }))
{
    XDocument xDocument = XDocument.Load(xmlReader);
}

但这也不起作用.一些文章指出了原因,这是MSDN文章中提到的要点之一:

But this also didn't work. Some articles pointed out the reason as one of the points mentioned in MSDN article:

如果XmlReader正在处理文本数据,它将始终检查 XML名称和文本内容有效,与属性无关 环境.将CheckCharacters设置为false将关闭字符检查 用于字符实体引用.

If the XmlReader is processing text data, it always checks that the XML names and text content are valid, regardless of the property setting. Setting CheckCharacters to false turns off character checking for character entity references.

所以我尝试使用:

using(MemoryStream memoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(xmlString)))
using (XmlReader xmlReader = XmlReader.Create(memoryStream, new XmlReaderSettings() { CheckCharacters = false }))
{
    XDocument xDocument = XDocument.Load(xmlReader);
}

这也不起作用. 谁能帮我弄清楚如何解析包含以数字字符开头的xml元素的xml字符串? 应该如何使用XmlReaderSettings.CheckCharacters标志?

This also didn't work. Can any one please help me in figuring out how to parse an xml string that contains xml elements whose name starts with numeric characters? How is the flag XmlReaderSettings.CheckCharacters supposed to be used?

推荐答案

即使标准XML解析器看起来像" XML,也无法停止解析它,因此您无法使标准XML解析器解析您的格式.不允许使用符合标准的XML解析器来解析无效的XML.这是一项设计决定,基于HTML解析引起的所有怪癖模式.

You can't make standard XML parser parse your format even if it "looks like" XML, stop trying. Standard-compliant XML parsers are disallowed to parse invalid XML. This was a design decision, based on all the problems quirks mode caused with HTML parsing.

编写自己的解析器并不难. XML非常严格,除非需要高级功能,否则语法很简单.

Writing your own parser isn't that hard. XML is very strict and, unless you need advanced features, the syntax is simple.

    可以手动编写
  1. LL解析器. lexer和解析器都很简单.

  1. LL parser can be written by hand. Both lexer and parser are simple.

LR解析器.最有可能的是,您甚至会找到XML garmmars示例.

LR parser can be generated using ANTLR and a simple grammar. Most likely, you'll even find example XML garmmars.

您还可以仅获取.NET XML解析器的源代码之一,并删除不需要的验证.您可以在GitHub上的.NET Core的存储库中找到XmlDocumentXDocument.

You can also just take either of .NET XML parsers' source code and remove validation you don't need. You can find both XmlDocument and XDocument in .NET Core's repository on GitHub.

这篇关于使XmlReaderSettings CheckCharacters适用于xml字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆