高效的XML解析 [英] Efficient Parsing of XML
问题描述
美好的一天,
我正在用C#.Net编写一个程序来管理商店的商品,
I'm writing a program in C# .Net to manage products of my store,
通过给定的链接,我可以检索一个XML文件,其中包含可以在店面中列出的所有可能的产品.
Following a given link I can retrieve an XML file that contains all the possible products that I can list onto my storefront.
XML结构如下:
<Product StockCode="103-10440">
<lastUpdated><![CDATA[Fri, 20 May 2016 17:00:03 GMT]]></lastUpdated>
<StockCode><![CDATA[103-10440]]></StockCode>
<Brand><![CDATA[3COM]]></Brand>
<BrandID><![CDATA[14]]></BrandID>
<ProdName><![CDATA[BIG FLOW BLOWING JUNCTION FLEX BLOCK, TAKES 32, 40]]> </ProdName>
<ProdDesc/>
<Categories>
<TopCat><![CDATA[Accessories]]></TopCat>
<TopCatID><![CDATA[24]]></TopCatID>
</Categories>
<ProdImg/>
<ProdPriceExclVAT><![CDATA[30296.79]]></ProdPriceExclVAT>
<ProdQty><![CDATA[0]]></ProdQty>
<ProdExternalURL><![CDATA[http://pinnacle.eliance.co.za/#!/product/4862]]></ProdExternalURL>
</Product>
以下是我要查找的条目:
Here are the entries I'm looking for :
- lastUpdated
- StockCode
- 品牌
- ProdName
- 产品描述
- TopCat< ---嵌套在Categories标签中.
- 产品
- ProdPriceExclVAT
- ProdQty
- ProdExternalURL
- lastUpdated
- StockCode
- Brand
- ProdName
- ProdDesc
- TopCat <--- nested in Categories tag.
- ProdImg
- ProdPriceExclVAT
- ProdQty
- ProdExternalURL
这一切都很好处理,事实上我做到了:
This is all fine to handle , and in-fact I did :
public ProductList Parse() {
XmlDocument doc = new XmlDocument();
doc.Load(XMLLink);
XmlNodeList ProductNodeList = doc.GetElementsByTagName("Product");
foreach (XmlNode node in ProductNodeList) {
Product Product = new Product();
for (int i = 0; i < node.ChildNodes.Count; i++) {
if (node.ChildNodes[i].Name == "StockCode") {
Product.VariantSKU = Convert.ToString(node.ChildNodes[i].InnerText);
}
if (node.ChildNodes[i].Name == "Brand") {
Product.Vendor = Convert.ToString(node.ChildNodes[i].InnerText);
}
if (node.ChildNodes[i].Name == "ProdName") {
Product.Title = Convert.ToString(node.ChildNodes[i].InnerText);
Product.SEOTitle = Product.Title;
Product.Handle = Product.Title;
}
if (node.ChildNodes[i].Name == "ProdDesc") {
Product.Body = Convert.ToString(node.ChildNodes[i].InnerText);
Product.SEODescription = Product.Body;
if (Product.Body == "") {
Product.Body = "ERROR";
Product.SEODescription = "ERROR";
}
}
if (node.ChildNodes[i].Name == "Categories") {
if (!tempList.Categories.Contains(node.ChildNodes[i].ChildNodes[0].InnerText)) {
if (!tempList.Categories.Contains("All")) {
tempList.Categories.Add("All");
}
tempList.Categories.Add(node.ChildNodes[i].ChildNodes[0].InnerText);
}
Product.Type = Convert.ToString(node.ChildNodes[i].ChildNodes[0].InnerText);
}
if (node.ChildNodes[i].Name == "ProdImg") {
Product.ImageSrc = Convert.ToString(node.ChildNodes[i].InnerText);
if (Product.ImageSrc == "") {
Product.ImageSrc = "ERROR";
}
Product.ImageAlt = Product.Title;
}
if (node.ChildNodes[i].Name == "ProdPriceExclVAT") {
float baseprice = float.Parse(node.ChildNodes[i].InnerText);
double Costprice = ((baseprice * 0.14) + (baseprice * 0.15) + baseprice);
Product.VariantPrice = Costprice.ToString("0.##");
}
}
Product.Supplier = "Pinnacle";
if (!tempList.Suppliers.Contains(Product.Supplier)) {
tempList.Suppliers.Add(Product.Supplier);
}
tempList.Products.Add(Product);
}
return tempList;
}
}
但是问题是,这种方式需要大约10秒钟才能完成,而这仅仅是我必须解析的多个此类文件中的第一个.
The problem is however, that this way of doing it, takes about 10 seconds to finish, and this is only just the first of multiple such files that I have to parse.
我正在寻找解析此XML文件的最有效方法,以获取我上面提到的所有字段的数据.
I am looking for the most efficient way to parse this XML file, getting all the fields's data that I mentioned above.
在使用文件的预下载副本运行时以及在运行时从服务器下载文件时,我对代码进行了基准测试:
EDIT : I benchmarked the code when running with a pre-downloaded copy of the file, and when downloading the file from the server at runtime :
-
使用本地副本:5秒.
With local copy : 5 Seconds.
无本地副本:7.30秒.
Without local copy : 7.30 Seconds.
推荐答案
对于大型XML文件,您必须使用XmlReader.以下代码将一次读取一个产品.
With large XML files you have to use an XmlReader. The code below will read one Product at a time.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create("filename");
while(!reader.EOF)
{
if (reader.Name != "Product")
{
reader.ReadToFollowing("Product");
}
if (!reader.EOF)
{
XElement product = (XElement)XElement.ReadFrom(reader);
string lastUpdated = (string)product.Element("lastUpdated");
}
}
}
}
}
这篇关于高效的XML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!