如何从word文件中读取除表数据之外的文本? [英] How to read text from word file excluding Table data?

查看:78
本文介绍了如何从word文件中读取除表数据之外的文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HI,


在我的应用程序中,我必须通过排除其他形状(如表格,图表等)逐行读取单词(DOCX)文件中的内容。从下面的代码我能够读取内容,但它也包括表格中的文本。

 private void GetParaDetail(Word.Document doc)
{
foreach(Word.Paragraph para in doc.Paragraphs)
{
string temp = para.Range.Text.Trim();
}
}

我上传了一个文件到这个位置(https://1drv.ms/w/s!Ah-Jh2Ok5SuHcCKzdzlY6etFDv8),使用上面的代码为我顺序得到以下段落  

 1111111111111 
2222222222222
3333333333333
4444444444444
5555555555555




kkkkkkkkkkk


但我需要以下文字。我搜索了很多,但没有找到任何有用的信息。所有人都只参考上述代码。 

 1111111111111 
2222222222222
kkkkkkkkkkk


解决方案

最简单的方法可能是删除所有形状,inlineshapes&文件中的表格。但是,您可以考虑将它们转换为文本,而不是删除表。删除/转换内容后,您可以在
一遍中阅读整个文档。在  VBA中,可以这么简单:


Sub Demo()

With ActiveDocument

  Do While .InlineShapes.Count> 0

    .InlineShapes(1)。删除

 循环

  Do While.Shapes.Count> 0

    .Shapes(1)。删除

 循环

  Do While .Tables.Count> 0

    .Tables(1)。删除$
 循环

结束与
结束子


我将留给你做C#实现。


HI,

In my application I have to read the content from a word(DOCX) file line by line by excluding other shapes(like table,chart etc). From the below code I am able to read the content but it also include the text from a table.

private void GetParaDetail(Word.Document doc)
        {
            foreach(Word.Paragraph para in doc.Paragraphs)
            {
                string temp = para.Range.Text.Trim();
            }
        }

I uploaded a file to this location(https://1drv.ms/w/s!Ah-Jh2Ok5SuHcCKzdzlY6etFDv8), by using above code for the file I got the below paragraphs sequentially   

1111111111111
2222222222222
3333333333333
4444444444444
5555555555555
.
.
.
.
kkkkkkkkkkk

but I need the below text. I searched a lot but didnt find any helpful information. all are referring the above code only. 

1111111111111
2222222222222
kkkkkkkkkkk

解决方案

The simplest method might be to delete all shapes, inlineshapes & tables from the document. Instead of deleting tables, though, you might consider converting them to text. Once you've deleted/converted the content, you can read the whole document in one pass. In VBA that could be as simple as:

Sub Demo()
With ActiveDocument
  Do While .InlineShapes.Count > 0
    .InlineShapes(1).Delete
  Loop
  Do While .Shapes.Count > 0
    .Shapes(1).Delete
  Loop
  Do While .Tables.Count > 0
    .Tables(1).Delete
  Loop
End With
End Sub

I'll leave it to you to do the C# implementation.


这篇关于如何从word文件中读取除表数据之外的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆