如何通过页码访问OpenXML内容? [英] How to access OpenXML content by page number?

查看:276
本文介绍了如何通过页码访问OpenXML内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用OpenXML,我可以按页码读取文档内容吗?

Using OpenXML, can I read the document content by page number?

wordDocument.MainDocumentPart.Document.Body提供完整文档的内容.

wordDocument.MainDocumentPart.Document.Body gives content of full document.

  public void OpenWordprocessingDocumentReadonly()
        {
            string filepath = @"C:\...\test.docx";
            // Open a WordprocessingDocument based on a filepath.
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Open(filepath, false))
            {
                // Assign a reference to the existing document body.  
                Body body = wordDocument.MainDocumentPart.Document.Body;
                int pageCount = 0;
                if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
                {
                    pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
                }
                for (int i = 1; i <= pageCount; i++)
                {
                    //Read the content by page number
                }
            }
        }

MSDN 参考

MSDN Reference

更新1:

看起来分页符设置如下

<w:p w:rsidR="003328B0" w:rsidRDefault="003328B0">
        <w:r>
            <w:br w:type="page" />
        </w:r>
    </w:p>

因此,现在我需要通过上述检查将XML拆分,并为每个检查取InnerTex,这将为我提供页面老虎钳文本.

So now I need to split the XML with above check and take InnerTex for each, that will give me page vise text.

现在的问题变成了如何通过上述检查拆分XML?

Now question becomes how can I split the XML with above check?

更新2:

仅当您有分页符时才设置分页符,但是如果文本从一页浮动到另一页,则没有设置分页符XML元素,因此它又回到了相同的挑战,即如何识别分页符

Page breaks are set only when you have page breaks, but if text is floating from one page to other pages, then there is no page break XML element is set, so it revert back to same challenge how o identify the page separations.

推荐答案

这就是我最终这样做的方式.

This is how I ended up doing it.

  public void OpenWordprocessingDocumentReadonly()
        {
            string filepath = @"C:\...\test.docx";
            // Open a WordprocessingDocument based on a filepath.
            Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
            int pageCount = 0;
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Open(filepath, false))
            {
                // Assign a reference to the existing document body.  
                Body body = wordDocument.MainDocumentPart.Document.Body;
                if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
                {
                    pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
                }
                int i = 1;
                StringBuilder pageContentBuilder = new StringBuilder();
                foreach (var element in body.ChildElements)
                {
                    if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
                    {
                        pageContentBuilder.Append(element.InnerText);
                    }
                    else
                    {
                        pageviseContent.Add(i, pageContentBuilder.ToString());
                        i++;
                        pageContentBuilder = new StringBuilder();
                    }
                    if (body.LastChild == element && pageContentBuilder.Length > 0)
                    {
                        pageviseContent.Add(i, pageContentBuilder.ToString());
                    }
                }
            }
        }

缺点:这不适用于所有情况.仅当您有分页符时才有效,但是如果文本从第1页扩展到第2页,则没有标识符可知道您在第2页中.

这篇关于如何通过页码访问OpenXML内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆