在 C# 中浏览 XML 文件的最快方法是什么? [英] What is the fastest way to go through a XML file in C#?

查看:65
本文介绍了在 C# 中浏览 XML 文件的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序可以处理数千个文件,并且必须检查它们是否具有正确的 xml 格式.问题是它需要很长时间才能完成,我认为这是因为我使用的 xml 阅读器类型.

在下面的方法中,我尝试了 3 个不同的版本,第一个是最快的,但只有 5%.(该方法不需要检查文件是否为xml)

private bool HasCorrectXmlFormat(string filePath){尝试{//-版本 1---------------------------------------------------------------------------------XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });string[] elementNames = new string[] { DocumentElement", Protocol", DateTime", Item", Value";};int i = 0;而 (reader.Read()){if (reader.NodeType == XmlNodeType.Element){if (reader.Name != elementNames.ElementAt(i)){返回假;}如果 (i >= 4){返回真;}我++;}}返回假;//--------------------------------------------------------------------------------------------------//- 版本 2 ------------------------------------------------------------------------IEnumerablexmlElements = XDocument.Load(filePath).Descendants();string[] elementNames = new string[] { DocumentElement", Protocol", DateTime", Item", Value";};for (int i = 0; i <5; i++){if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i)){返回假;}}返回真;//--------------------------------------------------------------------------------------------------//- 版本 3 ------------------------------------------------------------------------XDocument doc = XDocument.Load(filePath);if (doc.Root.Name !=DocumentElement"){返回假;}if (doc.Root.Elements().First().Name != "Protocol"){返回假;}if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime"){返回假;}if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item"){返回假;}if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value"){返回假;}返回真;//--------------------------------------------------------------------------------------------------}捕获(异常){返回假;}}

我需要的是一种更快的方法来做到这一点.有没有更快的方法来浏览 xml 文件?我只需要检查前 5 个元素的名称是否正确.

更新

Xml 文件的大小只有 2-5 KB,很少超过这个大小.文件位于本地服务器上.我使用的是带有 ssd 的笔记本电脑.

以下是一些测试结果:

我还应该补充一点,文件之前已经过过滤,因此只将 xml 文件提供给该方法.我使用以下方法获取文件:

public List获取兼容文件(){返回新的 DirectoryInfo(folderPath).EnumerateFiles("*", searchOption).AsParallel().Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false).ToList();}

我的代码中没有这个方法(它把两个方法放在一起),这只是为了展示如何调用 HasCorrectXmlFormat 方法.您不必更正此方法,我知道它可以改进.

UDPATE 2

这里是更新1末尾提到的两个完整方法:

private void WriteAllFilesInList(){allFiles = new DirectoryInfo(folderPath).EnumerateFiles("*", searchOption).AsParallel().ToList();}私有无效WriteCompatibleFilesInList(){兼容文件 = 所有文件.Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false).ToList();}

这两个方法在整个程序中只调用一次(如果 allFilescompatibleFiles 列表为空).

更新 3

似乎 WriteAllFilesInList 方法才是真正的问题,如下所示:

最终更新

看起来,我的方法不需要任何改进,因为瓶颈是别的东西.

解决方案

这是读取示例 XML 并显示 Linq/XMlReaderXmlDocument 之间的比较的示例/p>

Linq 是最快的.

示例代码

使用系统;使用 System.Diagnostics;使用 System.Linq;使用 System.Xml;使用 System.Xml.Linq;命名空间 ReadXMLInCsharp{课程计划{static void Main(string[] args){//返回包含/bin/Debug"的主目录的url;var url=System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase);//更正路径以将其指向根目录var mainpath = url.Replace("\\bin\\Debug", "") + "\\books.xml";var 秒表 = 新秒表();秒表.开始();//创建 XMLDocument 对象XmlDocument xmlDoc = new XmlDocument();//加载xml文件xmlDoc.Load(主路径);//将所有节点保存在XMLnodelist中XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes("/catalog/book");//遍历每个节点并将其值保存在NodeStr中var NodeStr = "";foreach(nodeList 中的 XmlNode 节点){NodeStr = NodeStr + "\n作者";+ node.SelectSingleNode("author").InnerText;NodeStr = NodeStr + "\n标题";+ node.SelectSingleNode("title").InnerText;NodeStr = NodeStr + "\n流派";+ node.SelectSingleNode("genre").InnerText;NodeStr = NodeStr + "\n价格";+ node.SelectSingleNode("price").InnerText;NodeStr = NodeStr + "\n描述 -";+ node.SelectSingleNode("description").InnerText;}//打印所有作者详细信息Console.WriteLine(NodeStr);秒表.停止();Console.WriteLine();Console.WriteLine("Time elapsed using XmlDocument (ms)= " + stopwatch.ElapsedMilliseconds);Console.WriteLine();秒表.重置();秒表.开始();NodeStr = "";//linq方法//获取book里面的所有元素foreach (XElement.Load(mainpath).Elements("book") 中的 XElement level1Element){//打印每个元素的值//你也可以打印 XML 属性值,而不是 .Element 使用 .AttributeNodeStr = NodeStr + "\n作者";+ level1Element.Element("author").Value;NodeStr = NodeStr + "\n标题";+ level1Element.Element("title").Value;NodeStr = NodeStr + "\n流派";+ level1Element.Element(流派").值;NodeStr = NodeStr + "\n价格";+ level1Element.Element("price").Value;NodeStr = NodeStr + "\n描述 -";+ level1Element.Element("description").Value;}//打印所有作者详细信息Console.WriteLine(NodeStr);秒表.停止();Console.WriteLine();Console.WriteLine("Time elapsed using linq(ms)= " + stopwatch.ElapsedMilliseconds);Console.WriteLine();秒表.重置();秒表.开始();//方法3//XML阅读器XmlReader xReader = XmlReader.Create(mainpath);xReader.ReadToFollowing(书");NodeStr = "";而 (xReader.Read()){开关(xReader.NodeType){案例 XmlNodeType.Element:NodeStr = NodeStr + "\n元素名称:";+ xReader.Name;休息;案例 XmlNodeType.Text:NodeStr = NodeStr +\n元素值:"+ xReader.Value;休息;案例 XmlNodeType.None://没做什么休息;}}//打印所有作者详细信息Console.WriteLine(NodeStr);秒表.停止();Console.WriteLine();Console.WriteLine("Time elapsed using XMLReader (ms)= " + stopwatch.ElapsedMilliseconds);Console.WriteLine();秒表.重置();Console.ReadKey();}}}

输出:

-- 首次运行使用 XmlDocument 的时间(毫秒)= 15使用 linq(ms)= 7 所用的时间使用 XMLReader 的时间(毫秒)= 12-- 第二次运行使用 XmlDocument 的时间(毫秒)= 18使用 linq(ms)= 3 所用的时间使用 XMLReader 的时间(毫秒)= 15

我删除了一些输出以仅显示比较数据.

来源:在 C# 中打开和读取 XML(使用 Linq、XMLReader、XMLDocument 的示例)

编辑:如果我从所有方法中注释Console.WriteLine(NodeStr)"并仅打印时间比较.这是我得到的

使用 XmlDocument 的时间 (ms)= 11使用 linq(ms)= 0 所用的时间使用 XMLReader 所用的时间 (ms)= 0

基本上,这取决于您处理数据的方式以及读取 XML 的方式.Linq/XML 阅读器曾经在速度方面看起来更有希望.

I have a program that goes through thousands of files and has to check if they have the correct xml-format. The problem is that it takes ages to complete, and I think that's because of the type of xml reader I use.

In the Method below are 3 different versions which I tried, the first one is the fastest, but only by 5%. (the method does not need to check if the file is a xml)

private bool HasCorrectXmlFormat(string filePath)
{
    try
    {
        //-Version 1----------------------------------------------------------------------------------------
        XmlReader reader = XmlReader.Create(filePath, new XmlReaderSettings() { IgnoreComments = true, IgnoreWhitespace = true });

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        int i = 0;

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element)
            {
                if (reader.Name != elementNames.ElementAt(i))
                {
                    return false;
                }

                if (i >= 4)
                {
                    return true;
                }

                i++;
            }

        }

        return false;
        //--------------------------------------------------------------------------------------------------


        //-  Version 2  ------------------------------------------------------------------------------------
        IEnumerable<XElement> xmlElements = XDocument.Load(filePath).Descendants();

        string[] elementNames = new string[] { "DocumentElement", "Protocol", "DateTime", "Item", "Value" };

        for (int i = 0; i < 5; i++)
        {
            if (xmlElements.ElementAt(i).Name != elementNames.ElementAt(i))
            {
                return false;
            }
        }

        return true;
        //--------------------------------------------------------------------------------------------------


        //-  Version 3  ------------------------------------------------------------------------------------
        XDocument doc = XDocument.Load(filePath);

        if (doc.Root.Name != "DocumentElement")
        {
            return false;
        }

        if (doc.Root.Elements().First().Name != "Protocol")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(0).Name != "DateTime")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(1).Name != "Item")
        {
            return false;
        }

        if (doc.Root.Elements().First().Elements().ElementAt(2).Name != "Value")
        {
            return false;
        }

        return true;
        //--------------------------------------------------------------------------------------------------
    }
    catch (Exception)
    {
        return false;
    }
}

What I need is a faster way to do this. Is there a faster way to go through a xml file? I only have to check if the first 5 Elements have the correct names.

UPDATE

The Xml-Files are only 2-5 KBs in size, rarely more than that. Files are located on a local server. I am on a laptop which has a ssd.

Here are some test results:

I should also add that the files are filtered before, so only xml files are given to the method. I get the files with the following Method:

public List<FileInfo> GetCompatibleFiles()
    {
        return new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                    .ToList();
    }

This Method is not in my code like this (it put two methods together), this is just to show how the HasCorrectXmlFormat Method is called. You dont have to correct this Method, I know it can be improved.

UDPATE 2

Here are the two full methods mentioned at the end of update 1:

private void WriteAllFilesInList()
    {
        allFiles = new DirectoryInfo(folderPath)
                    .EnumerateFiles("*", searchOption)
                    .AsParallel()
                    .ToList();
    }

private void WriteCompatibleFilesInList()
    {
        compatibleFiles = allFiles
                            .Where(file => file.Extension == ".xml" ? HasCorrectXmlFormat(file.FullName) : false)
                            .ToList();
    }

Both methods are only called once in the entire program (if either the allFiles or compatibleFiles List is null).

UPDATE 3

It seems like the WriteAllFilesInList Method is the real problem here, shown here:

FINAL UPDATE

As it seems, my method doesn't need any improvement as the bottleneck is something else.

解决方案

Here is the example, which reads sample XML and shows comparison between Linq/XMlReader and XmlDocument

Linq is fastest.

Sample Code

using System;
using System.Diagnostics;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

namespace ReadXMLInCsharp
{
  class Program
  {
    static void Main(string[] args)
    {
     
        //returns url of main directory which contains "/bin/Debug"
        var url=System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase);
        
        //correction in path to point it in Root directory
        var mainpath = url.Replace("\\bin\\Debug", "") + "\\books.xml";

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        //create XMLDocument object
        XmlDocument xmlDoc = new XmlDocument();
        //load xml file
        xmlDoc.Load(mainpath);
        //save all nodes in XMLnodelist
        XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes("/catalog/book");

        //loop through each node and save it value in NodeStr
        var NodeStr = "";

        foreach (XmlNode node in nodeList)
        {
            NodeStr = NodeStr + "\nAuthor " + node.SelectSingleNode("author").InnerText;
            NodeStr = NodeStr + "\nTitle " + node.SelectSingleNode("title").InnerText;
            NodeStr = NodeStr + "\nGenre " + node.SelectSingleNode("genre").InnerText;
            NodeStr = NodeStr + "\nPrice " + node.SelectSingleNode("price").InnerText;
            NodeStr = NodeStr + "\nDescription -" + node.SelectSingleNode("description").InnerText;


        }
        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XmlDocument (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();

        stopwatch.Start();
        NodeStr = "";
        //linq method
        //get all elements inside book
        foreach (XElement level1Element in XElement.Load(mainpath).Elements("book"))
        {
            //print each element value
            //you can also print XML attribute value, instead of .Element use .Attribute
            NodeStr = NodeStr + "\nAuthor " + level1Element.Element("author").Value;
            NodeStr = NodeStr + "\nTitle " + level1Element.Element("title").Value;
            NodeStr = NodeStr + "\nGenre " + level1Element.Element("genre").Value;
            NodeStr = NodeStr + "\nPrice " + level1Element.Element("price").Value;
            NodeStr = NodeStr + "\nDescription -" + level1Element.Element("description").Value;
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using linq(ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();

        stopwatch.Reset();
        stopwatch.Start();
        //method 3
        //XMLReader
        XmlReader xReader = XmlReader.Create(mainpath);

        xReader.ReadToFollowing("book");
        NodeStr = "";
        while (xReader.Read())
        {
            switch (xReader.NodeType)
            {
                case XmlNodeType.Element:
                    NodeStr = NodeStr + "\nElement name:" + xReader.Name;
                    break;
                case XmlNodeType.Text:
                    NodeStr = NodeStr + "\nElement value:" + xReader.Value;
                    break;
                case XmlNodeType.None:
                    //do nothing
                    break;

            }
        }

        //print all Authors details
        Console.WriteLine(NodeStr);
        stopwatch.Stop();
        Console.WriteLine();
        Console.WriteLine("Time elapsed using XMLReader (ms)= " + stopwatch.ElapsedMilliseconds);
        Console.WriteLine();
        stopwatch.Reset();


        Console.ReadKey();
    }
  }
}

Output:

-- First Run
Time elapsed using XmlDocument (ms)= 15

Time elapsed using linq(ms)= 7

Time elapsed using XMLReader (ms)= 12

-- Second Run
Time elapsed using XmlDocument (ms)= 18

Time elapsed using linq(ms)= 3

Time elapsed using XMLReader (ms)= 15

I have removed some output to show only comparison data.

Source: Open and Read XML in C# (Examples using Linq, XMLReader, XMLDocument)

Edit: If i comment 'Console.WriteLine(NodeStr)' from all methods and prints only time comparison. This is what I get

Time elapsed using XmlDocument (ms)= 11


Time elapsed using linq(ms)= 0


Time elapsed using XMLReader (ms)= 0

Basically it depends on how you are processing the data and how you are reading XML. Linq/XML reader once look more promising in terms of speed.

这篇关于在 C# 中浏览 XML 文件的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆