在XML中搜索值而不将其加载到内存中 [英] Search for a value in XML without loading it in memory

查看:87
本文介绍了在XML中搜索值而不将其加载到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





是否有逻辑搜索XML文件中的特定值而不将其加载到内存中? XML Document可以满足我的要求。但我希望在不加载到内存中的情况下处理文件,因为实际的XML文件大小可能是5GB +。



XMLReader是我尝试使用的例子,如如何在不加载XML的情况下打开大型XML文件文件? [ ^ ]



但是我无法找到遍历XML节点并搜索的逻辑一个特定的价值。



示例XML:

Hi,

Is there a logic to search for a particular value in an XML file without loading it in memory? XML Document is working fine for my requirement. But I want the file to be handled without loading into memory since the actual XML file might be sized to 5GB+.

XMLReader is the alternative I tried by using examples such as How to Open Large XML files without Loading the XML Files?[^]

But I'm not able to find out a logic to traverse through the XML nodes and search for a specific value.

The sample XML :

<backup>
  <project>
    <issues>
       <issue>
          <fieldvalue id="fld1">1</fieldvalue>
          <fieldvalue id="fld2">test01</fieldvalue>
          <fieldvalue id="fld3">some desc</fieldvalue>
       </issue>
       <issue>
          <fieldvalue id="fld1">2</fieldvalue>
          <fieldvalue id="fld2">test02</fieldvalue>
          <fieldvalue id="fld3">some desc</fieldvalue>
       </issue>
       <issue>
          <fieldvalue id="fld1">3</fieldvalue>
          <fieldvalue id="fld2">test03</fieldvalue>
          <fieldvalue id="fld3">some desc</fieldvalue>
       </issue>
       <issue>
          <fieldvalue id="fld1">4</fieldvalue>
          <fieldvalue id="fld2">test04</fieldvalue>
          <fieldvalue id="fld3">some desc</fieldvalue>
       </issue>
    </issues>
  </project>
</backup>





这里fld1是问题的ID。我想按ID搜索。如果ID存在于XML中,我想采用整个



here the "fld1" is the ID of the issue. I want to search by ID. if the ID exists in the XML, i want to take the entire

<issue>

进一步处理的节点。



和代码代码段

node for further processing.

And the code snippet

//Using XMLDocument
		protected void Button1_Click(object sender, EventArgs e)
        {
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.Load(Server.MapPath(@"export_sample.xml"));
            XmlNodeList addlst = xmlDoc.SelectNodes("backup/project/issues/issue/fieldvalue[@id='fld1']");
            foreach (XmlNode issueNode in addlst)
            {
                if (issueNode.InnerText == IDTextBox.Text)
                {
                    Status.Text = issueNode.ParentNode.InnerXml;
                    break;
                }
                else
                {
                    Status.Text = "ID does not Exists";
                }
            }
        }
		//Using XMLTextReader
        protected void Button2_Click(object sender, EventArgs e)
        {
            XmlTextReader myTextReader = new XmlTextReader(Server.MapPath(@"export_sample.xml"));
            myTextReader.WhitespaceHandling = WhitespaceHandling.None;
            while (myTextReader.Read())
            {
                currentIssueNode = "";
                if (myTextReader.NodeType == XmlNodeType.Element &&
                    myTextReader.LocalName == "issue" 
                    && myTextReader.IsStartElement() == true)
                {
                    currentIssueNode = myTextReader.ReadOuterXml();
                    if (currentIssueNode.Contains("<fieldvalue id=\"fld1\">" + IDTextBox.Text + "</fieldvalue>"))
                    {
                        squishStatus.Text = "ID exist";
                        currentIssueNode = "";
                        myTextReader.Skip();
                    }
                    else {
                        currentIssueNode = "";
                        squishStatus.Text = "ID does not exist";
                    }
                 }
                myTextReader.MoveToContent();
                }
            myTextReader.Close();
         }

推荐答案

请阅读我对该问题的评论。



我建议使用Linq,但我需要警告你。如果数据的一部分很大,则下面代码的性能可能不令人满意。



Please, read my comment to the question.

I'd suggest to use Linq, but i need to warn you. If a portion of data is huge, the performance of below code might be unsatisfying.

var qry = xDoc.Element("backup")
            .Descendants("project")
            .Descendants("issues")
            .Descendants("issue")
            .Where(x=>x.Element("fieldvalue").Attribute("id").Value=="fld1");





以上linq查询返回< issue> nodes。


不要使用Maciej approac h,如果你期望一个+ 5GB的XML文件,肯定会得到 OutOfMemoryException 异常。

这也是你使用的第二种方法 XmlTextReader 是一种正确的方法,这是您可以在不加载整个文档的情况下读取XML的唯一方法,但您需要稍微调整一下。 />
首先请注意,建议使用 XmlReader.Create 方法,而不是实例化 XmlTextReader ,第二当您正在阅读issue元素时,您要使用 ReadInnerXml 而不是 ReadOuterXml ,第三,你不想使用跳过而是想继续阅读兄弟问题元素。



试试这个:

Don't use Maciej approach, you will definitely get an OutOfMemoryException exception if you are expecting an XML file of +5GB.
Also that second approach in which you are using the XmlTextReader is a right way to go, that is the only way you can read XML without loading an entire document at once, but you need to tweak it a bit.
First note that it's recommended to use XmlReader.Create method instead of instantiating an XmlTextReader, second when you are reading an "issue" element you want to use ReadInnerXml instead of ReadOuterXml, third you don't want to use Skip instead you want to continue with reading the sibling "issue" elements.

Try this:
protected void Button2_Click(object sender, EventArgs e)
{
    string currentIssueNode = null;
    XmlReaderSettings settings = new XmlReaderSettings() { IgnoreWhitespace = true };
    using (var reader = XmlReader.Create(Server.MapPath(@"export_sample.xml"), settings))
    {
        string fieldvalue = string.Format("<fieldvalue id=\"fld1\">{0}</fieldvalue>", IDTextBox.Text);
        if (reader.ReadToFollowing("issue"))
        {
            do
            {
                currentIssueNode = reader.ReadInnerXml();
                if (currentIssueNode.Contains(fieldvalue))
                    break;
                else
                    currentIssueNode = null;
            } while (reader.ReadToNextSibling("issue"));
        }
    }
    if (!string.IsNullOrEmpty(currentIssueNode))
        Status.Text = currentIssueNode;
    else
        Status.Text = "ID does not Exists";
}


这篇关于在XML中搜索值而不将其加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆