使用 XmlReader 读取文件时更新 XLSX 文件更改 [英] Update XLSX file changes whilst reading the file with XmlReader

查看:35
本文介绍了使用 XmlReader 读取文件时更新 XLSX 文件更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个代码将 Excel XLSX 文档加载到内存中,对其进行一些修改并将其保存回来.

We had a code which was loading the Excel XLSX document into the memory, doing some modifications with it and saving it back.

XmlDocument doc = new XmlDocument();
doc.Load(pp.GetStream());
XmlNode rootNode = doc.DocumentElement;

if (rootNode == null) return;
ProcessNode(rootNode);

if (this.fileModified)
{
    doc.Save(pp.GetStream(FileMode.Create, FileAccess.Write));
}

这对小文件很有效,但对于一些大 Excel 文件会抛出 OutOfMemory 异常.所以我们决定改变方法并使用 XmlReader 类不立即将文件加载到内存中.

This was working good with small files, but throwing OutOfMemory exceptions with some large Excel files. So we decided to change the approach and use XmlReader class to not load the file into the memory at once.

PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
     if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;

     using (XmlReader reader = XmlReader.Create(pp.GetStream()))
     {
          reader.MoveToContent();
          while (reader.EOF == false)
          {
             XmlDocument doc;
             XmlNode rootNode;
             if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
             {
                   doc = new XmlDocument();
                   rootNode = doc.ReadNode(reader);
                   if (rootNode != null)
                   {
                        doc.AppendChild(rootNode);
                        ProcessNode(rootNode);  // how can I save updated changes back to the file?
                   }
              }
              else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
              {
                    doc = new XmlDocument();
                    rootNode = doc.ReadNode(reader);

                    if (rootNode != null)
                    {
                        doc.AppendChild(rootNode);
                        ProcessNode(rootNode); // how can I save updated changes back to the file?
                    }
              }
              else
              {
                    reader.Read();
              }
          }
     }
}

这会逐个节点读取文件并处理我们需要的节点(并更改其中的一些值).但是,我不确定如何将这些值更新回原始 Excel 文件.我尝试将 XmlWriterXmlReader 一起使用,但无法使其工作.有什么想法吗?

This reads the file node by node and processes nodes we need (and changes some values there). However, I'm not sure how we can update those values back to the original Excel file. I tried to use XmlWriter together with the XmlReader, but was not able to make it work. Any ideas?

更新:

我尝试从评论部分使用@dbc 的建议,但对我来说似乎太慢了.对于大文件,它可能不会抛出 OutOfMemory 异常,但处理将花费很长时间.

I tried to use @dbc's suggestions from the comments section, but it seems too slow to me. It probably will not throw OutOfMemory exceptions for huge files, but processing will take forever.

PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
     if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;

     StringBuilder strBuilder = new StringBuilder();
     
     using (XmlReader reader = XmlReader.Create(pp.GetStream()))
     {
        using (XmlWriter writer = this.Package.FileOpenAccess == FileAccess.ReadWrite ? XmlWriter.Create(strBuilder) : null)
        {
          reader.MoveToContent();
          while (reader.EOF == false)
          {
             XmlDocument doc;
             XmlNode rootNode;
             if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
             {
                   doc = new XmlDocument();
                   rootNode = doc.ReadNode(reader);
                   if (rootNode != null)
                   {
                        doc.AppendChild(rootNode);
                        ProcessNode(rootNode);
                        writer?.WriteRaw(rootNode.OuterXml);
                   }
              }
              else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
              {
                    doc = new XmlDocument();
                    rootNode = doc.ReadNode(reader);

                    if (rootNode != null)
                    {
                        doc.AppendChild(rootNode);
                        ProcessNode(rootNode);
                        writer?.WriteRaw(rootNode.OuterXml);
                    }
              }
              else
              {
                    WriteShallowNode(writer, reader); // Used from the @dbc's suggested stackoverflow answers
                    reader.Read();
              }
            }

            writer?.Flush();
         }
      }
}

注意 1:我使用 StringBuilder 进行测试,但最终计划切换到临时文件.注意 2:我尝试在每 100 个元素后刷新 XmlWriter,但它仍然很慢.

NOTE 1: I'm using StringBuilder for the test, but was planning to switch to a temp file in the end. NOTE 2: I tried flushing the XmlWriter after every 100 elements, but it's still slow.

有什么想法吗?

推荐答案

我在 @dbc 的帮助下做了一些修改,现在它可以正常工作了.

I did some more modifications with @dbc's help and now it works as I wanted.

PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
  try
  {
     if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;

     string tempFilePath = GetTempFilePath();
     
     using (XmlReader reader = XmlReader.Create(pp.GetStream()))
     {
        using (XmlWriter writer = this.Package.FileOpenAccess == FileAccess.ReadWrite ? XmlWriter.Create(tempFilePath) : null)
        {
          while (reader.EOF == false)
          {
             if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
             {
                   XmlDocument doc = new XmlDocument();
                   XmlNode rootNode = doc.ReadNode(reader);
                   if (rootNode != null)
                   {
                        ProcessNode(rootNode);
                        if (writer != null)
                        {
                            rootNode.WriteTo(writer);
                        }
                   }
              }
              else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
              {
                    XmlDocument doc = new XmlDocument();
                    XmlNode rootNode = doc.ReadNode(reader);

                    if (rootNode != null)
                    {
                        ProcessNode(rootNode);
                        if (writer != null)
                        {
                            rootNode.WriteTo(writer);
                        }
                    }
              }
              else
              {
                    WriteShallowNode(writer, reader); // Used from the @dbc's suggested StackOverflow answers
                    reader.Read();
              }
            }
         }
      }


      if (this.packageChanged) // is being set in ProcessNode method
      {
          this.packageChanged = false;

          using (var tempFile = File.OpenRead(tempFilePath))
          {
               tempFile.CopyTo(pp.GetStream(FileMode.Create, FileAccess.Write));
          }
       }
   }
   catch (OutOfMemoryException)
   {
        throw;
   }
   catch (Exception ex)
   {
      Log.Exception(ex, @"Failed to process a file."); // our inner log method
   }
   finally
   {
       if (!string.IsNullOrWhiteSpace(tempFilePath))
       {
            // Delete temp file
       }
   }
}

这篇关于使用 XmlReader 读取文件时更新 XLSX 文件更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆