如何分别(使用.NET)在Word文档中抓取文本的每一页? [英] How can I grab each page of text in a Word doc separately (using .NET)?

查看:68
本文介绍了如何分别(使用.NET)在Word文档中抓取文本的每一页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要确定关键字出现在Word文档的哪些页面上.我有一些工具可以让我获取文档的文本,但是没有任何工具可以告诉我该文本出现在哪些页面上.有人对我有一个好的起点吗?我正在使用.NET

I need to determine which pages of a Word document that a keyword occurs on. I have some tools that can get me the text of the document, but nothing that tells me which pages the text occurs on. Does anyone have a good starting place for me? I'm using .NET

谢谢!

附加限制:我不能使用任何Interop内容.

edit: Additional constraint: I can't use any of the Interop stuff.

edit2:如果有人知道可以做到这一点的稳定库,那也将有所帮助.我使用Aspose,但据我所知没有任何东西.

edit2: If anybody knows of stable libraries that can do this, that'd also be helpful. I use Aspose, but as far as I know that doesn't have anything.

推荐答案

这是我获取文字的方式,我相信您可以将选择范围设置为页面,然后可以测试该文字,可能会有点从您需要的位置倒退,但可能是一个起点.

This is how I get the text out, I believe you can set set the selection range to a page, then you could test that text, might be a little backwards from what you need but could be a place to start.

Microsoft.Office.Interop.Word.Application wordApplication = new Microsoft.Office.Interop.Word.Application();
object missing = Type.Missing;
object fileName = @"c:\file.doc";
object objFalse = false;

wordApplication.DisplayAlerts = Microsoft.Office.Interop.Word.WdAlertLevel.wdAlertsNone;
Microsoft.Office.Interop.Word.Document doc = wordApplication.Documents.Open(ref fileName, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,ref objFalse, ref missing, ref missing, ref missing, ref missing);

//I belevie you can define a SelectionRange and insert here
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();

IDataObject data = Clipboard.GetDataObject();
string text = data.GetData(DataFormats.Text).ToString();

doc.Close(ref missing, ref missing, ref missing);
doc = null;

wordApplication.Quit(ref missing, ref missing, ref missing);
wordApplication = null;

这篇关于如何分别(使用.NET)在Word文档中抓取文本的每一页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆