修复坏的XML文件(如转义&安培;等等) [英] Fixing bad XML file (eg. unescaped & etc.)

查看:190
本文介绍了修复坏的XML文件(如转义&安培;等等)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是从第三方,我必须在我的应用程序导入XML文件和XML曾与转义&放元素;在内部文本,他们不惯于修理好了!所以我的问题是什么是处理这个问题的最好方法是什么?

这个XML是pretty的大和修补程序有要快,我的第一个解决方案只需更换和放大器;字符与符号,但我真的不喜欢这样的解决方案原因很明显。我不知道如何使用XmlStringReader这样的XML,因为是在这样的行抛出异常,所以我不能在内部文本使用HtmlEn code。我试图设置XmlTextReader的 Settings.CheckCharacters 为false,但没有结果。

下面是样品,和放大器;位于元素,并在这一领域可以是任何东西,可以在一些公司的名字,所以我代替修复也许对于其他一些公司的名字不工作,我想用HtmlEn code莫名其妙,但仅限于内部文本课程。

 < komitent ID =001398>
  < sifra> 001398< / sifra>
  < redni_broj> 001398< / redni_broj>
  < naziv>柳比和放大器;扎克< / naziv>
  < adresa1> Odvrtnica 27 LT; / adresa1>
  < adresa2>< / adresa2>
  < drzava> HRVATSKA< / drzava>
  <毕业生>萨格勒布< /毕业生>
< / komitent>
 

解决方案

下面的关键信息是,除非你知道输入文件的具体格式,并有保证,从XML的偏差是一致的,不能以编程方式修复,而不用担心你的修复将是不正确的。

替换修复它&安培; &放大器;放大器; 是当且仅当一个可以接受的解决办法:

  1. 目前这些数据没有接受良好的来源。

    • 作为@Darin季米特洛夫的意见,试图找到更好的供应商,或得到这个供应商进行修复。
    • JS​​ON(例如)是preferable到形成不良的XML,即使你不使用的JavaScript。
  2. 这是一次性的(或至少是非常罕见的)进口。

    • 如果您在运行时获取这,那么这个解决方案将无法正常工作。
  3. 您可以继续通过迭代,制定了它新的修复,增加了解决每个问题,你遇到过。

    • 您可能会发现,一旦你的固定它通过转义&安培; 人物,还会有其他错误
  4. 您有足够的资源来手动检查固定的数据的完整性。

    • 您修复该错误可能比你想象的更微妙。
  5. 有文档中没有正确格式化的实体 -

    • 只需更换&安培; &放大器;放大器; 将错误地改变 &功放; QUOT; &放大器;放大器; QUOT; 。您可能能够解决这个问题,但不要天真如何棘手它可能(实体可在DTD定义,可能指的是单向code code点...)

    • 如果它是行为不端一个特定的元素,你可以考虑包装元素的含量<![CDATA ] ]> ,但仍然依赖于你能够找到的开始和结束标记可靠

I got an XML file from 3rd party that I must import in my app, and XML had elements with unescaped & in inner text, and they don't wont to fix that ! So my question is what is the best way to deal with this problem ?

This XML is pretty big and that fix has to be fast, my first solution is just replace & character with ampersand but really I don't like this "solution" for obvious reasons. I don't know how to use XmlStringReader with such XML because is throws exception on such lines, so I can't use HtmlEncode on inner text. I tried to set XmlTextReader Settings.CheckCharacters to false but no result.

Here is the sample, & is in element, and in that field can be anything that can be in some company name, so my replace fix maybe don't work for some other company name, I would like to use HtmlEncode somehow, but only on inner text of course.

<komitent ID="001398">
  <sifra>001398</sifra>
  <redni_broj>001398</redni_broj>
  <naziv>LJUBICA & ŽARKO</naziv>
  <adresa1>Odvrtnica 27</adresa1>
  <adresa2></adresa2>
  <drzava>HRVATSKA</drzava>
  <grad>Zagreb</grad>
</komitent>

解决方案

The key message below is that unless you know the exact format of the input file, and have guarantees that any deviation from XML is consistent, you can't programmatically fix without risking that your fixes will be incorrect.

Fixing it by replacing & with &amp; is an acceptable solution if and only if:

  1. There is no acceptable well-formed source of these data.

    • As @Darin Dimitrov comments, try to find a better provider, or get this provider to fix it.
    • JSON (for example) is preferable to poorly formed XML, even if you aren't using javascript.
  2. This is a one off (or at least extremely infrequent) import.

    • If you have to fetch this in at runtime, then this solution will not work.
  3. You can keep iterating through, devising new fixes for it, adding a solution to each problem as you come across it.

    • You will probably find that once you have "fixed" it by escaping & characters, there will be other errors.
  4. You have the resources to manually check the integrity of the "fixed" data.

    • The errors you "fix" may be more subtle than you realise.
  5. There are no correctly formatted entities in the document -

    • Simply replacing & with &amp; will erroneously change &quot; to &amp;quot;. You may be able to get around this, but don't be naive about how tricky it might be (entities may be defined in a DTD, may refer to a unicode code-point ...)

    • If it is a particular element that misbehaves, you could consider wrapping the content of the element with <![CDATA ]]>, but that still relies on you being able to find the start and end tags reliably.

这篇关于修复坏的XML文件(如转义&安培;等等)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆