在XML中搜索值而不将其加载到内存中 [英] Search for a value in XML without loading it in memory
问题描述
是否有逻辑搜索XML文件中的特定值而不将其加载到内存中? XML Document可以满足我的要求。但我希望在不加载到内存中的情况下处理文件,因为实际的XML文件大小可能是5GB +。
XMLReader是我尝试使用的例子,如如何在不加载XML的情况下打开大型XML文件文件? [ ^ ]
但是我无法找到遍历XML节点并搜索的逻辑一个特定的价值。
示例XML:
Hi,
Is there a logic to search for a particular value in an XML file without loading it in memory? XML Document is working fine for my requirement. But I want the file to be handled without loading into memory since the actual XML file might be sized to 5GB+.
XMLReader is the alternative I tried by using examples such as How to Open Large XML files without Loading the XML Files?[^]
But I'm not able to find out a logic to traverse through the XML nodes and search for a specific value.
The sample XML :
<backup>
<project>
<issues>
<issue>
<fieldvalue id="fld1">1</fieldvalue>
<fieldvalue id="fld2">test01</fieldvalue>
<fieldvalue id="fld3">some desc</fieldvalue>
</issue>
<issue>
<fieldvalue id="fld1">2</fieldvalue>
<fieldvalue id="fld2">test02</fieldvalue>
<fieldvalue id="fld3">some desc</fieldvalue>
</issue>
<issue>
<fieldvalue id="fld1">3</fieldvalue>
<fieldvalue id="fld2">test03</fieldvalue>
<fieldvalue id="fld3">some desc</fieldvalue>
</issue>
<issue>
<fieldvalue id="fld1">4</fieldvalue>
<fieldvalue id="fld2">test04</fieldvalue>
<fieldvalue id="fld3">some desc</fieldvalue>
</issue>
</issues>
</project>
</backup>
这里fld1是问题的ID。我想按ID搜索。如果ID存在于XML中,我想采用整个
here the "fld1" is the ID of the issue. I want to search by ID. if the ID exists in the XML, i want to take the entire
<issue>
进一步处理的节点。
和代码代码段
node for further processing.
And the code snippet
//Using XMLDocument
protected void Button1_Click(object sender, EventArgs e)
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(Server.MapPath(@"export_sample.xml"));
XmlNodeList addlst = xmlDoc.SelectNodes("backup/project/issues/issue/fieldvalue[@id='fld1']");
foreach (XmlNode issueNode in addlst)
{
if (issueNode.InnerText == IDTextBox.Text)
{
Status.Text = issueNode.ParentNode.InnerXml;
break;
}
else
{
Status.Text = "ID does not Exists";
}
}
}
//Using XMLTextReader
protected void Button2_Click(object sender, EventArgs e)
{
XmlTextReader myTextReader = new XmlTextReader(Server.MapPath(@"export_sample.xml"));
myTextReader.WhitespaceHandling = WhitespaceHandling.None;
while (myTextReader.Read())
{
currentIssueNode = "";
if (myTextReader.NodeType == XmlNodeType.Element &&
myTextReader.LocalName == "issue"
&& myTextReader.IsStartElement() == true)
{
currentIssueNode = myTextReader.ReadOuterXml();
if (currentIssueNode.Contains("<fieldvalue id=\"fld1\">" + IDTextBox.Text + "</fieldvalue>"))
{
squishStatus.Text = "ID exist";
currentIssueNode = "";
myTextReader.Skip();
}
else {
currentIssueNode = "";
squishStatus.Text = "ID does not exist";
}
}
myTextReader.MoveToContent();
}
myTextReader.Close();
}
推荐答案
请阅读我对该问题的评论。
我建议使用Linq,但我需要警告你。如果数据的一部分很大,则下面代码的性能可能不令人满意。
Please, read my comment to the question.
I'd suggest to use Linq, but i need to warn you. If a portion of data is huge, the performance of below code might be unsatisfying.
var qry = xDoc.Element("backup")
.Descendants("project")
.Descendants("issues")
.Descendants("issue")
.Where(x=>x.Element("fieldvalue").Attribute("id").Value=="fld1");
以上linq查询返回< issue>
nodes。
不要使用Maciej approac h,如果你期望一个+ 5GB的XML文件,肯定会得到OutOfMemoryException
异常。
这也是你使用的第二种方法XmlTextReader
是一种正确的方法,这是您可以在不加载整个文档的情况下读取XML的唯一方法,但您需要稍微调整一下。 />
首先请注意,建议使用XmlReader.Create
方法,而不是实例化XmlTextReader
,第二当您正在阅读issue
元素时,您要使用ReadInnerXml
而不是ReadOuterXml
,第三,你不想使用跳过
而是想继续阅读兄弟问题
元素。
试试这个:
Don't use Maciej approach, you will definitely get anOutOfMemoryException
exception if you are expecting an XML file of +5GB.
Also that second approach in which you are using theXmlTextReader
is a right way to go, that is the only way you can read XML without loading an entire document at once, but you need to tweak it a bit.
First note that it's recommended to useXmlReader.Create
method instead of instantiating anXmlTextReader
, second when you are reading an"issue"
element you want to useReadInnerXml
instead ofReadOuterXml
, third you don't want to useSkip
instead you want to continue with reading the sibling"issue"
elements.
Try this:
protected void Button2_Click(object sender, EventArgs e)
{
string currentIssueNode = null;
XmlReaderSettings settings = new XmlReaderSettings() { IgnoreWhitespace = true };
using (var reader = XmlReader.Create(Server.MapPath(@"export_sample.xml"), settings))
{
string fieldvalue = string.Format("<fieldvalue id=\"fld1\">{0}</fieldvalue>", IDTextBox.Text);
if (reader.ReadToFollowing("issue"))
{
do
{
currentIssueNode = reader.ReadInnerXml();
if (currentIssueNode.Contains(fieldvalue))
break;
else
currentIssueNode = null;
} while (reader.ReadToNextSibling("issue"));
}
}
if (!string.IsNullOrEmpty(currentIssueNode))
Status.Text = currentIssueNode;
else
Status.Text = "ID does not Exists";
}
这篇关于在XML中搜索值而不将其加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!