什么是检查编程在C#中的XML文件的良好性最快的方法? [英] What is the fastest way to programatically check the well-formedness of XML files in C#?

查看:106
本文介绍了什么是检查编程在C#中的XML文件的良好性最快的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的XHTML文件大批量被手动更新。在更新的审查阶段我想以编程方式检查文件的良好性。我目前使用的的XmlReader
,但平均CPU所需要的时间远远长于我的预期。

I have large batches of XHTML files that are manually updated. During the review phase of the updates i would like to programmatically check the well-formedness of the files. I am currently using a XmlReader, but the time required on an average CPU is much longer than i expected.

在XHTML文件大小从4KB到40KB和验证每个文件需要几秒钟。检查是必要的,但我想保留的时间尽可能短为同时文件被读入下一工艺步骤中进行检查。

The XHTML files range in size from 4KB to 40KB and verifying takes several seconds per file. Checking is essential but i would like to keep the time as short as possible as the check is performed while files are being read into the next process step.

有更快的做一个简单的XML良构性检查的方法是什么?也许使用外部XML库?

Is there a faster way of doing a simple XML well-formedness check? Maybe using external XML libraries?


我可以证实,它验证了正规军基于XML的内容是闪电快速使用的XmlReader,和所建议的问题似乎是相关的事实,即XHTML DTD读取每一个文件被验证时间

I can confirm that validating "regular" XML based content is lightning fast using the XmlReader, and as suggested the problem seems to be related to the fact that the XHTML DTD is read each time a file is validated.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

请注意,除了DTD中,相应的.ENT文件(XHTML-lat1.ent,xhtml- symbol.ent,XHTML-special.ent)也下载。

Note that in addition to the DTD, corresponding .ent files (xhtml-lat1.ent, xhtml-symbol.ent, xhtml-special.ent) are also downloaded.

由于忽视的DTD完全不是真正的XHTML一个选项作为良好性有着密切的联系允许HTML实体(例如,A和NBSP;将及时介绍验证错误,当我们忽略的DTD)。

Since ignoring the DTD completely is not really an option for XHTML as the well-formedness is closely linked to allowed HTML entities (e.g., a &nbsp; will promptly introduce validation errors when we ignore the DTD).


这个问题通过使用自定义解决的XmlResolver 的建议,在与两个DTD和实体文件的本地(嵌入式)副本组合。

The problem was solved by using a custom XmlResolver as suggested, in combination with local (embedded) copies of both the DTD and entity files.

我将在这里发布解决方案,一旦我清理代码

I will post the solution here once i cleaned up the code

推荐答案

我期望的XmlReader >而(reader.Read)(){} 将是最快的管理的办法。这当然不应该采取的以读取40KB ...什么是输入方法,您正在使用?

I would expect that XmlReader with while(reader.Read)() {} would be the fastest managed approach. It certainly shouldn't take seconds to read 40KB... what is the input approach you are using?

你也许有一些外部(模式等)的实体来解决?如果是这样,你也许可以编写自定义的的XmlResolver 使用本地缓存模式(通过 XmlReaderSettings 设置),而不是一个远程读取...

Do you perhaps have some external (schema etc) entities to resolve? If so, you might be able to write a custom XmlResolver (set via XmlReaderSettings) that uses locally cached schemas rather than a remote fetch...

下面确实〜300KB几乎瞬间:

The following does ~300KB virtually instantly:

    using(MemoryStream ms = new MemoryStream()) {
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.CloseOutput = false;
        using (XmlWriter writer = XmlWriter.Create(ms, settings))
        {
            writer.WriteStartElement("xml");
            for (int i = 0; i < 15000; i++)
            {
                writer.WriteElementString("value", i.ToString());
            }
            writer.WriteEndElement();
        }
        Console.WriteLine(ms.Length + " bytes");
        ms.Position = 0;
        int nodes = 0;
        Stopwatch watch = Stopwatch.StartNew();
        using (XmlReader reader = XmlReader.Create(ms))
        {
            while (reader.Read()) { nodes++; }
        }
        watch.Stop();
        Console.WriteLine("{0} nodes in {1}ms", nodes,
            watch.ElapsedMilliseconds);
    }

这篇关于什么是检查编程在C#中的XML文件的良好性最快的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆