如何从C#中的Word(docx)文档中获取文本? [英] How to grab text from word (docx) document in C#?

查看：474 发布时间：2020/5/21 18:41:18 xpath docx openxml wordprocessingml

本文介绍了如何从C#中的Word(docx)文档中获取文本?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从Word文档中获取纯文本.具体来说，xpath给我带来了麻烦.您如何选择标签?这是我的代码.

I'm trying to get the plain text from a word document. Specifically, the xpath is giving me trouble. How do you select the tags? Here's the code I have.

public static string TextDump(Package package)
{
    StringBuilder builder = new StringBuilder();

    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream());

    foreach (XmlNode node in xmlDoc.SelectNodes("/descendant::w:t"))
    {
        builder.AppendLine(node.InnerText);
    }
    return builder.ToString();
}

推荐答案

您的问题是XML名称空间. SelectNodes不知道如何将<w:t/>转换为完整的名称空间.因此，您需要使用以XmlNamespaceManager作为第二个参数的重载.我对您的代码进行了一些修改，它似乎可以正常工作:

Your problem is the XML namespaces. SelectNodes don't know how to translate <w:t/> to the full namespace. Therefore, you need to use the overload, that takes an XmlNamespaceManager as the second argument. I modified your code a bit, and it seems to work:

    public static string TextDump(Package package)
    {
        StringBuilder builder = new StringBuilder();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream());
        XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
        mgr.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");

        foreach (XmlNode node in xmlDoc.SelectNodes("/descendant::w:t", mgr))
        {
            builder.AppendLine(node.InnerText);
        }
        return builder.ToString();
    }

这篇关于如何从C#中的Word(docx)文档中获取文本?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从C#中的Word(docx)文档中获取文本? [英] How to grab text from word (docx) document in C#?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从C#中的Word(docx)文档中获取文本? [英] How to grab text from word (docx) document in C#?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭