Word OpenXml Word发现不可读的内容 [英] Word OpenXml Word Found Unreadable Content

查看:190
本文介绍了Word OpenXml Word发现不可读的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试根据某些条件来操作Word文档以删除段落.但是,当我们尝试使用错误打开它时,产生的单词文件总是以损坏的方式结束:

单词发现不可读的内容

下面的代码破坏了文件,但是如果我们删除该行:

Document document = mdp.Document;

文件已保存并打开,没有问题.我有明显的问题想念吗?

 var readAllBytes = File.ReadAllBytes(@"C:\Original.docx");


    using (var stream = new MemoryStream(readAllBytes))
    {
    using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
    {
        MainDocumentPart mdp = wpd.MainDocumentPart;
        Document document = mdp.Document;

    }
}

File.WriteAllBytes(@"C:\New.docx", readAllBytes);

更新:

using (WordprocessingDocument wpd = WordprocessingDocument.Open(@"C:\Original.docx", true))
            {
                MainDocumentPart mdp = wpd.MainDocumentPart;
                Document document = mdp.Document;

                document.Save();
            }

在物理文件上运行上面的代码,我们仍然可以打开Original.docx而不会出现错误,因此它似乎仅限于修改流.

解决方案

这是一种将文档读入MemoryStream的方法:

 public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
    byte[] buffer = File.ReadAllBytes(path);
    var destStream = new MemoryStream(buffer.Length);
    destStream.Write(buffer, 0, buffer.Length);
    destStream.Seek(0, SeekOrigin.Begin);
    return destStream;
}
 

请注意如何实例化MemoryStream.我传递的是容量而不是缓冲区(就像您自己的代码一样).为什么会这样?

使用MemoryStream()MemoryStream(int)时,您正在创建可调整大小的MemoryStream实例,以防您对文档进行更改.在使用MemoryStream(byte[])时(如在代码中一样),MemoryStream实例不可调整大小,除非您不对文档进行任何更改或更改只会使其尺寸缩小,否则这将是有问题的.

现在,要将Word文档读入MemoryStream,在内存中处理该Word文档,并得到一致的MemoryStream,您将必须执行以下操作:

 // Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(@"C:\Original.docx");

// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
    MainDocumentPart mdp = wpd.MainDocumentPart;
    Document document = mdp.Document;
    // Manipulate document ...
}

// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(@"C:\New.docx", stream.GetBuffer());
 

请注意,只要您的下一个操作依赖于MemoryStream.Position属性(例如CopyToCopyToAsync),就必须通过调用stream.Seek(0, SeekOrigin.Begin)来重置stream.Position属性.离开using语句后,流的位置将等于其长度.

We are trying to manipulate a word document to remove a paragraph based on certain conditions. But the word file produced always ends up being corrupted when we try to open it with the error:

Word found unreadable content

The below code corrupts the file but if we remove the line:

Document document = mdp.Document;

The the file is saved and opens without issue. Is there an obvious issue that I am missing?

 var readAllBytes = File.ReadAllBytes(@"C:\Original.docx");


    using (var stream = new MemoryStream(readAllBytes))
    {
    using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
    {
        MainDocumentPart mdp = wpd.MainDocumentPart;
        Document document = mdp.Document;

    }
}

File.WriteAllBytes(@"C:\New.docx", readAllBytes);

UPDATE:

using (WordprocessingDocument wpd = WordprocessingDocument.Open(@"C:\Original.docx", true))
            {
                MainDocumentPart mdp = wpd.MainDocumentPart;
                Document document = mdp.Document;

                document.Save();
            }

Running the code above on a physical file we can still open Original.docx without the error so it seems limited to modifying a stream.

解决方案

Here's a method that reads a document into a MemoryStream:

public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
    byte[] buffer = File.ReadAllBytes(path);
    var destStream = new MemoryStream(buffer.Length);
    destStream.Write(buffer, 0, buffer.Length);
    destStream.Seek(0, SeekOrigin.Begin);
    return destStream;
}

Note how the MemoryStream is instantiated. I am passing the capacity rather than the buffer (as in your own code). Why is that?

When using MemoryStream() or MemoryStream(int), you are creating a resizable MemoryStream instance, which you will want in case you make changes to your document. When using MemoryStream(byte[]) (as in your code), the MemoryStream instance is not resizable, which will be problematic unless you don't make any changes to your document or your changes will only ever make it shrink in size.

Now, to read a Word document into a MemoryStream, manipulate that Word document in memory, and end up with a consistent MemoryStream, you will have to do the following:

// Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(@"C:\Original.docx");

// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
    MainDocumentPart mdp = wpd.MainDocumentPart;
    Document document = mdp.Document;
    // Manipulate document ...
}

// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(@"C:\New.docx", stream.GetBuffer());

Note that you will have to reset the stream.Position property by calling stream.Seek(0, SeekOrigin.Begin) whenever your next action depends on that MemoryStream.Position property (e.g., CopyTo, CopyToAsync). Right after having left the using statement, the stream's position will be equal to its length.

这篇关于Word OpenXml Word发现不可读的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆