如何获取Word文档的格式化内容？ [英] How to get formatted content of Word document?

查看：233 发布时间：2019/6/17 18:52:45 oxmlsdk

本文介绍了如何获取Word文档的格式化内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，

解决方案

您好，

文档文本包含在所有w：t节点中。要获取文本，您可以使用以下示例代码：

< tr>

< td> XmlNodeList textNodes = paragraphNode.SelectNodes（" .// w：t" ，nsManager ）;

class TextExtractor

{

public 字符串文字（ string fileName）

{

const string wordmlNamespace = " http：// schemas.openxmlformats.org/wordprocessingml/2006/main";

StringBuilder textBuilder = new StringBuilder（）;

使用（WordprocessingDocument wdDoc = WordprocessingDocument.Open（fileName， false ））

{

//管理命名空间以执行XPath查询。

NameTable nt = < font style ="color：blue"> new NameTable（）;

XmlNamespaceManager nsManager = new XmlNamespaceManager（nt）;

nsManager.AddNamespace（ " w" ，wordmlNamespace）;

//从包中获取文档部分。

//将文档部分中的XML加载到XmlDocument实例中。

XmlDocument xdoc = < font style ="color：blue"> new XmlDocument（nt）;

xdoc.Load（wdDoc.MainDocumentPart.GetStream（））;

XmlNodeList paragraphNodes = xdoc.SelectNodes（ " // w：p" ， nsManager）;

foreach （XmlNode paragraphNode paragraphNodes）

{

foreach （System.Xml.XmlNode textNode textNodes）

{

textBuilder.Append（textNode.InnerText）;

}

textBuilder.Append（Environment.NewLine）;

}

}

return textBuilder.ToString（）;

}

}

Hello everyone,

解决方案

Hi,

the document text is contained in all the w:t nodes. To get the text you could use the following sample code:

    class TextExtractor

    {

        public string Text(string fileName)

        {

            const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

            StringBuilder textBuilder = new StringBuilder();

            using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(fileName, false))

            {

                // Manage namespaces to perform XPath queries.

                NameTable nt = new NameTable();

                XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);

                nsManager.AddNamespace("w", wordmlNamespace);

                // Get the document part from the package.

                // Load the XML in the document part into an XmlDocument instance.

                XmlDocument xdoc = new XmlDocument(nt);

                xdoc.Load(wdDoc.MainDocumentPart.GetStream());

                XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager);

                foreach (XmlNode paragraphNode in paragraphNodes)

                {

                    XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsManager);

                    foreach (System.Xml.XmlNode textNode in textNodes)

                    {

                        textBuilder.Append(textNode.InnerText);

                    }

                    textBuilder.Append(Environment.NewLine);

                }



            }

            return textBuilder.ToString();

        }

    }

这篇关于如何获取Word文档的格式化内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何获取Word文档的格式化内容？ [英] How to get formatted content of Word document?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何获取Word文档的格式化内容？ [英] How to get formatted content of Word document?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭