有没有办法读取由线word文档线 [英] is there a way to read a word document line by line

查看：114 发布时间：2016/8/28 15:24:24 c# ms-word

本文介绍了有没有办法读取由线word文档线的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图提取Word文档中的所有单词。我能做到这一切一气呵成如下...

  Word.Application字=新Word.Application（）;
DOC = word.Documents.Open（@C：\\ SampleText.doc）;
doc.Activate（）;的foreach（Word.Range docRange在doc.Words）//加载所有文件的话
{
    IEnumerable的＆LT;串GT; sortedSubstrings = Enumerable.Range（0，docRange.Text.Trim（）。长度）
        。选择（ⅰ= GT; docRange.Text.Substring（i））的
        .OrderBy（S =＆GT; s.Length 3; S：s.Remove（2，Math.Min（s.Length  -  2,2）））;    wordPosition =
        （int）的
        docRange.get_Information（
            Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber）;    的foreach（在sortedSubstrings VAR子）
    {
        指数= docRange.Text.IndexOf（子）+ wordPosition;
        charLocation [指数] =子;
    }
}

不过我会pferred同时加载文档一行$ P $ ...是有可能这样做？

我可以通过加载项然而它我无法通过段迭代提取所有单词。

 的foreach（在doc.Paragraphs Word.Paragraph段）
{
    的foreach（Word.Range docRange在段）//错误：类型Word.para不enumeranle **
    {
        IEnumerable的＆LT;串GT; sortedSubstrings = Enumerable.Range（0，docRange.Text.Trim（）。长度）
            。选择（ⅰ= GT; docRange.Text.Substring（i））的
            .OrderBy（S =＆GT; s.Length 3; S：s.Remove（2，Math.Min（s.Length  -  2,2）））;        wordPosition =
            （int）的
            docRange.get_Information（
                Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber）;        的foreach（在sortedSubstrings VAR子）
        {
            指数= docRange.Text.IndexOf（子）+ wordPosition;
            charLocation [指数] =子;
        }    }
}

解决方案

我建议按照此页面的这里

问题的症结所在是，你有Word.ApplicationClass（Microsoft.Interop.Word）对象读取它，但在那里他得到了医生的对象是超越我。我会假设你与ApplicationClass创建它。

编辑：文件是通过调用这个检索：

  Word.Document DOC = wordApp.Documents.Open（REF文件，文献nullobj，REF nullobj，
                                      REF nullobj，裁判nullobj，REF nullobj，
                                      REF nullobj，裁判nullobj，REF nullobj，
                                      REF nullobj，裁判nullobj，裁判nullobj）;

可悲的是我链接的页面上的code的格式是不是所有容易。

EDIT2：从那里你可以遍历文档段落，但据我可以看有没有通过线循环方式。我会建议使用一些模式匹配找换行。

为了提取段落中的文本，使用的 Word.Paragraph.Range 的文本，这将返回一个段落中的所有文本。然后，你必须寻找断行的字符。我会使用 string.IndexOf（）。

另外，如果用线你想在一次提取一个句子，你可以简单地通过的 Range.Sentences

I am trying to extract all the words in a Word document. I am able to do it all in one go as follows...

Word.Application word = new Word.Application();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();

foreach (Word.Range docRange in doc.Words) // loads all words in document
{
    IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
        .Select(i => docRange.Text.Substring(i))
        .OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));

    wordPosition =
        (int)
        docRange.get_Information(
            Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);

    foreach (var substring in sortedSubstrings)
    {
        index = docRange.Text.IndexOf(substring) + wordPosition;
        charLocation[index] = substring;
    }
}

However I would have preferred to load the document one line at a time... is it possible to do so?

I can load it by paragraph however I am unable to iterate through the paragraphs to extract all words.

foreach (Word.Paragraph para in doc.Paragraphs)
{
    foreach (Word.Range docRange in para) // Error: type Word.para is not enumeranle**
    {
        IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
            .Select(i => docRange.Text.Substring(i))
            .OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));

        wordPosition =
            (int)
            docRange.get_Information(
                Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);

        foreach (var substring in sortedSubstrings)
        {
            index = docRange.Text.IndexOf(substring) + wordPosition;
            charLocation[index] = substring;
        }

    }
}

解决方案

I would suggest following the code on this page here

The crux of it is that you read it with a Word.ApplicationClass (Microsoft.Interop.Word) object, although where he's getting the "Doc" object is beyond me. I would assume you create it with the ApplicationClass.

EDIT: Document is retrieved by calling this:

Word.Document doc = wordApp.Documents.Open(ref file, ref nullobj, ref nullobj,
                                      ref nullobj, ref nullobj, ref nullobj,
                                      ref nullobj, ref nullobj, ref nullobj,
                                      ref nullobj, ref nullobj, ref nullobj);

Sadly the formatting of the code on the page I linked wasn't all to easy.

EDIT2: From there you can loop through doc paragraphs, however as far as I can see there is no way of looping through lines. I would suggest using some pattern matching to find linebreaks.

In order to extract the text from a paragraph, use Word.Paragraph.Range .Text, this will return all the text inside a paragraph. Then you must search for linebreak characters. I'd use string.IndexOf().

Alternatively, if by lines you want to extract one sentence at a time, you can simply iterate through Range.Sentences

这篇关于有没有办法读取由线word文档线的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有没有办法读取由线word文档线 [英] is there a way to read a word document line by line

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

有没有办法读取由线word文档线 [英] is there a way to read a word document line by line

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭