高效的XML解析 [英] Efficient Parsing of XML

查看:86
本文介绍了高效的XML解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

美好的一天,

我正在用C#.Net编写一个程序来管理商店的商品,

I'm writing a program in C# .Net to manage products of my store,

通过给定的链接,我可以检索一个XML文件,其中包含可以在店面中列出的所有可能的产品.

Following a given link I can retrieve an XML file that contains all the possible products that I can list onto my storefront.

XML结构如下:

<Product StockCode="103-10440">
    <lastUpdated><![CDATA[Fri, 20 May 2016 17:00:03 GMT]]></lastUpdated>
    <StockCode><![CDATA[103-10440]]></StockCode>
    <Brand><![CDATA[3COM]]></Brand>
    <BrandID><![CDATA[14]]></BrandID>
    <ProdName><![CDATA[BIG FLOW BLOWING JUNCTION FLEX BLOCK, TAKES 32, 40]]>     </ProdName>
    <ProdDesc/>
    <Categories>
        <TopCat><![CDATA[Accessories]]></TopCat>
        <TopCatID><![CDATA[24]]></TopCatID>
    </Categories>
    <ProdImg/>
    <ProdPriceExclVAT><![CDATA[30296.79]]></ProdPriceExclVAT>
    <ProdQty><![CDATA[0]]></ProdQty>
    <ProdExternalURL><![CDATA[http://pinnacle.eliance.co.za/#!/product/4862]]></ProdExternalURL>
</Product>

以下是我要查找的条目:

Here are the entries I'm looking for :

  • lastUpdated
  • StockCode
  • 品牌
  • ProdName
  • 产品描述
  • TopCat< ---嵌套在Categories标签中.
  • 产品
  • ProdPriceExclVAT
  • ProdQ​​ty
  • ProdExternalURL
  • lastUpdated
  • StockCode
  • Brand
  • ProdName
  • ProdDesc
  • TopCat <--- nested in Categories tag.
  • ProdImg
  • ProdPriceExclVAT
  • ProdQty
  • ProdExternalURL

这一切都很好处理,事实上我做到了:

This is all fine to handle , and in-fact I did :

public ProductList Parse() {

    XmlDocument doc = new XmlDocument();
    doc.Load(XMLLink);

    XmlNodeList ProductNodeList = doc.GetElementsByTagName("Product");
    foreach (XmlNode node in ProductNodeList) {
        Product Product = new Product();

        for (int i = 0; i < node.ChildNodes.Count; i++) {
            if (node.ChildNodes[i].Name == "StockCode") {
                Product.VariantSKU = Convert.ToString(node.ChildNodes[i].InnerText);
            }
            if (node.ChildNodes[i].Name == "Brand") {
                Product.Vendor = Convert.ToString(node.ChildNodes[i].InnerText);
            }
            if (node.ChildNodes[i].Name == "ProdName") {
                Product.Title = Convert.ToString(node.ChildNodes[i].InnerText);
                Product.SEOTitle = Product.Title;
                Product.Handle = Product.Title;
            }
            if (node.ChildNodes[i].Name == "ProdDesc") {
                Product.Body = Convert.ToString(node.ChildNodes[i].InnerText);
                Product.SEODescription = Product.Body;
                if (Product.Body == "") {
                    Product.Body = "ERROR";
                    Product.SEODescription = "ERROR";
                }
            }
            if (node.ChildNodes[i].Name == "Categories") {
                if (!tempList.Categories.Contains(node.ChildNodes[i].ChildNodes[0].InnerText)) {
                    if (!tempList.Categories.Contains("All")) {
                        tempList.Categories.Add("All");
                    }
                        tempList.Categories.Add(node.ChildNodes[i].ChildNodes[0].InnerText);
                }

                Product.Type = Convert.ToString(node.ChildNodes[i].ChildNodes[0].InnerText);
            }
            if (node.ChildNodes[i].Name == "ProdImg") {
                Product.ImageSrc = Convert.ToString(node.ChildNodes[i].InnerText);
                if (Product.ImageSrc == "") {
                    Product.ImageSrc = "ERROR";
                }
                Product.ImageAlt = Product.Title;
            }
            if (node.ChildNodes[i].Name == "ProdPriceExclVAT") {
                float baseprice = float.Parse(node.ChildNodes[i].InnerText);
                double Costprice = ((baseprice * 0.14) + (baseprice * 0.15) + baseprice);
                Product.VariantPrice = Costprice.ToString("0.##");
            }
        }
        Product.Supplier = "Pinnacle";
        if (!tempList.Suppliers.Contains(Product.Supplier)) {
            tempList.Suppliers.Add(Product.Supplier);
        }
        tempList.Products.Add(Product);
        }
    return tempList;
    }
}

但是问题是,这种方式需要大约10秒钟才能完成,而这仅仅是我必须解析的多个此类文件中的第一个.

The problem is however, that this way of doing it, takes about 10 seconds to finish, and this is only just the first of multiple such files that I have to parse.

我正在寻找解析此XML文件的最有效方法,以获取我上面提到的所有字段的数据.

I am looking for the most efficient way to parse this XML file, getting all the fields's data that I mentioned above.

在使用文件的预下载副本运行时以及在运行时从服务器下载文件时,我对代码进行了基准测试:

EDIT : I benchmarked the code when running with a pre-downloaded copy of the file, and when downloading the file from the server at runtime :

  • 使用本地副本:5秒.

  • With local copy : 5 Seconds.

无本地副本:7.30秒.

Without local copy : 7.30 Seconds.

推荐答案

对于大型XML文件,您必须使用XmlReader.以下代码将一次读取一个产品.

With large XML files you have to use an XmlReader. The code below will read one Product at a time.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create("filename");
            while(!reader.EOF)
            {
                if (reader.Name != "Product")
                {
                    reader.ReadToFollowing("Product");
                }
                if (!reader.EOF)
                {
                    XElement product = (XElement)XElement.ReadFrom(reader);
                    string lastUpdated = (string)product.Element("lastUpdated");
                }
            }
        }
    }
}

这篇关于高效的XML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆