复制&将元素附加到XML文档,而不缓冲到RAM [英] Copying & Appending an Element to an XML Document without buffering to RAM

查看:105
本文介绍了复制&将元素附加到XML文档,而不缓冲到RAM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题我需要的日志数据附加到一个XML文件没有缓冲到RAM中。 XML文件是由LogEntry元素,其中包括含有数据的82子元素组成。这些文件可以得到相当大,看到它会形成我们有非常有限的内存在Windows CE6计划的一部分。

As the title suggests I need to append log data to an XML file without buffering to RAM. The XML File is made up of LogEntry elements, which contain 82 child elements that contain data. These files can get quite large and seeing as it will form part of a Windows CE6 program we have very limited memory.

做完研究了相当数量的它明显,最常用的方法是使用的XDocument 的LINQ to XML 追加到它,写出来之前,现有的文件中读取新文档。使用的XmlWriter 的XmlReader 演唱会似乎是为我追加到该文件的最佳方式,但我所有的尝试到目前为止是非常不切实际的,需要if语句来指导怎么做才能防止被重复写入数据或更少的元件写。

Having done a fair amount of research it's apparent that the most common methods are to use XDocument or Linq to XML to read in the existing document before appending to it and writing out the new document. Using XmlWriter and XmlReader in concert seems to be the best way for me to append to the file, but all my attempts so far are hugely impractical and require IF Statements to direct what to write in order to prevent duplicate or data less elements being written.

的我在做什么的本质是:

The essence of what I'm doing is:

//Create an XmlReader to read current WorkLog.
using (XmlReader xmlRead = XmlTextReader.Create("WorkLog.xml"))
{
   //Create a XmlWriterSettings and set indent 
   //to true to correctly format the document
   XmlWriterSettings writerSettings = new XmlWriterSettings();
   writerSettings.Indent = true;
   writerSettings.IndentChars = "\t";

   //Create a new XmlWriter to output to
   using (XmlWriter xmlWriter = XmlWriter.Create("New.xml", writerSettings))
   {
      //Starts the document
      xmlWriter.WriteStartDocument();

      //While the XmlReader is still reading (essentially !EOF)
      while (xmlRead.Read())
      {
         //FSM to direct writing of OLD Log data to new file
         switch (xmlRead.NodeType)
         {
            case XmlNodeType.Element:
               //Handle the copying of an element node
               //Contains many if statements to handle root node &  
               //attributes and to skip nodes that contain text
               break;
            case XmlNodeType.Text:
               //Handle the copying of an text node
               break;
            case XmlNodeType.EndElement: 
               //Handle the copying of an End Element node
               break;
         }
      }

      xmlWriter.WriteEndDocument();
   }
}



我相信我可以追加到这个文件方式,但它是非常不切实际这样做? -

I'm confident I could append to the file this way but it is highly impractical to do so - does anyone know of any memory efficient methods that my hours of searching hasn't turned up?

我很高兴发布如果需要的话我当前的代码要做到这一点 - 但正如我所说这是非常大的,实际上是目前非常讨厌,所以我会离开它现在

I'm happy to post my current code to do this if required - but as I mentioned it is extremely large and is actually pretty nasty at the moment so I'll leave it out for now.

推荐答案

您使用的方法的XmlReader 其实是要走的路......但你也可以说,这是非常不切实际的。

Your approach to use XmlReader is actually the way to go... but as you also say, it's very impractical.

所以是合理的黑客攻击?

这样做的原因是,XML有一堆,你可能会遇到的功能,这需要你从顶部向底部阅读。通常的XmlReader 这些情况对付,让你用普通标签等。例如,给出如下声明:

The reason for this is that XML has a bunch of features that you might encounter, which require you to read it from the top to the bottom. Normally XmlReader copes with these situations, leaving you with plain tags and so on. For example, given the following declarations:

<!ENTITY % pub    "&#xc9;ditions Gallimard" >
<!ENTITY   rights "All rights reserved" >
<!ENTITY   book   "La Peste: Albert Camus, &#xA9; 1947 %pub;. &rights;" >



那么实体替换文本是:

La Peste: Albert Camus,
© 1947 Éditions Gallimard. &rights;

如果你还没有看过实体标签,这是不可能做翻译为正确的XML。这就是说,幸好有没有很多使用这些类型的建筑物的人,所以也没关系假设你的XML不使用它们来重写根标记。

If you haven't read the ENTITY tags, it's impossible to do the 'translation' to the correct XML. That said, fortunately there aren't a lot of people using these kinds of constructions, so it's okay assume your XML doesn't use them to rewrite the root tag.

这就是说,唯一有效的XML的方式来关闭一个标签是使用< /美孚>在结尾之前可选的空格> 。 (请参见 http://www.w3.org/TR/ 2008 / REC-XML的20081126 /#秒starttags )。这基本上意味着你可以跳到结束,读取足够的数据,检查它是否包含结束标记 - 如果确实如此,你可以插入自己的代码。如果没有,寻求了一下腰,然后再试一次。

That said, the only valid way in XML to close a tag is to use </Foo> with optional spaces before the trailing >. (see http://www.w3.org/TR/2008/REC-xml-20081126/#sec-starttags). This basically means you can skip to the end, read enough data, check if it contains the end tag - and if it does, you can insert your own code. If not, seek a bit back and try again.

讨厌的小编码

要注意的最后一件事是你的文件的编码。虽然你可以从一个流构建的XmlTextReader ,流使用的字节,你读检测编码和开始读。幸运的是,的XmlTextReader 公开编码财产,所以你可以使用它。因为你可能需要比每个字符只有1个字节编码更重要的是,尤其是当你遇到UTF-16或UTF-32,这可能是一个问题。处理这个问题的方法是将您的令牌转换为字节,然后做字节的匹配。

The last thing to be aware of is the encoding of your file. While you can construct an XmlTextReader from a stream, the stream uses bytes and you reader detects the encoding and starts reading. Fortunately, XmlTextReader exposes the Encoding as property, so you can use that. Encoding is important because you might need more than just 1 byte for each character; especially when you encounter UTF-16 or UTF-32 this might be an issue. The way to handle this is to convert your token to bytes and then do the matching on bytes.

根=拖车假设

由于我不觉得自己真的要检查空格和尾随'>'(见上W3C链接),我还以为这是一个有效的XML文件,这意味着每个开放标签被关闭为好。这意味着你可以简单地检查< /根,使得匹配过程更容易一点。 (注:你甚至可能只是检查最后< / 中的文件,但我更喜欢我的代码是反对不正确的XML有点更强大的

Since I don't really feel like checking the spaces and trailing '>' (see W3C link above), I also assume it's a valid XML file, which means that every opening tag is closed as well. This means you can simply check for </root, making the matching process a bit easier. (NOTE: you might even just check for the last </ in the file, but I prefer my code to be a bit more robust against incorrect XML)

全部放在一起

下面去...(我没有测试它,但它应该更多或更少的工作)

Here goes... (I haven't tested it, but it should more or less work)

public bool FindAppendPoint(Stream stream)
{
    XmlTextReader xr = new XmlTextReader(stream);
    string rootElement = null;
    while (xr.Read())
    {
        if (xr.NodeType == XmlNodeType.Element)
        {
            rootElement = xr.Name;
            break;
        }
    }

    if (rootElement == null)
    {
        // Well, apparently there's no root... You can start a new file I suppose
        return false;
    }
    else
    {
        long start = stream.Position; // the position we're currently reading (end of start tag)
        long len = stream.Length;
        long end = Math.Min(start, len - 1024);

        byte[] endTag = xr.Encoding.GetBytes("</" + rootElement);

        while (end >= start)
        {
            byte[] data = new byte[len - end];
            stream.Seek(start, SeekOrigin.Begin);
            stream.Read(data, 0, data.Length); // FIXME: read returns an int that we should use!!!

            // Loop backwards till we find the end tag
            for (int i = data.Length - endTag.Length; i >= 0; --i)
            {
                int j;
                for (j = 0; j < endTag.Length && endTag[j] == data[i + j]; ++j) { }
                if (j == endTag.Length)
                {
                    // We found a match!
                    stream.Seek(len - data.Length - i, SeekOrigin.Begin);
                    AppendXml(stream, xr.Encoding)
                    return true;
                }
            }

            // Hmm, we've found </xml with a lot of spaces... oh well
            //
            // It's okay to skip back a bit, just have to make sure that we don't skip <0
            if (end == start)
            {
                end = start - 1; // end the loop
            }
            else
            {
                end = Math.Min(start, end - 1024);
            }
        }

        // Nope, no go.
        return false;
    }
}

这篇关于复制&amp;将元素附加到XML文档,而不缓冲到RAM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆