如何获取Word文档的格式化内容? [英] How to get formatted content of Word document?
本文介绍了如何获取Word文档的格式化内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
大家好,
解决方案
您好,
文档文本包含在所有w:t节点中。要获取文本,您可以使用以下示例代码:
class TextExtractor
{
public 字符串 文字( string fileName)
{
const string wordmlNamespace = " http:// schemas.openxmlformats.org/wordprocessingml/2006/main";
StringBuilder textBuilder = new StringBuilder();
使用 (WordprocessingDocument wdDoc = WordprocessingDocument.Open(fileName, false ))
{
< tr>
//管理命名空间以执行XPath查询。
NameTable nt = < font style ="color:blue"> new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace( " w" ,wordmlNamespace);
//从包中获取文档部分。
//将文档部分中的XML加载到XmlDocument实例中。
XmlDocument xdoc = < font style ="color:blue"> new XmlDocument(nt);
xdoc.Load(wdDoc.MainDocumentPart.GetStream());
XmlNodeList paragraphNodes = xdoc.SelectNodes( " // w:p" , nsManager);
foreach (XmlNode paragraphNode paragraphNodes)
{
< td> XmlNodeList textNodes = paragraphNode.SelectNodes( " .// w:t" ,nsManager );
foreach (System.Xml.XmlNode textNode textNodes)
{
textBuilder.Append(textNode.InnerText);
}
textBuilder.Append(Environment.NewLine);
}
}
return textBuilder.ToString();
}
}
Hello everyone,
解决方案Hi,
the document text is contained in all the w:t nodes. To get the text you could use the following sample code:
class TextExtractor { public string Text(string fileName) { const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"; StringBuilder textBuilder = new StringBuilder(); using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(fileName, false)) { // Manage namespaces to perform XPath queries. NameTable nt = new NameTable(); XmlNamespaceManager nsManager = new XmlNamespaceManager(nt); nsManager.AddNamespace("w", wordmlNamespace); // Get the document part from the package. // Load the XML in the document part into an XmlDocument instance. XmlDocument xdoc = new XmlDocument(nt); xdoc.Load(wdDoc.MainDocumentPart.GetStream()); XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager); foreach (XmlNode paragraphNode in paragraphNodes) { XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsManager); foreach (System.Xml.XmlNode textNode in textNodes) { textBuilder.Append(textNode.InnerText); } textBuilder.Append(Environment.NewLine); } } return textBuilder.ToString(); } }
这篇关于如何获取Word文档的格式化内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取
|
15天全站免登陆