C#.net阅读Word文档 [英] C#.net reading word document

查看:206
本文介绍了C#.net阅读Word文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取单词文档并拆分每个单词而没有任何特殊字符,并且提取..时也不会出现数字.

提取的单词存储在数据库中

I want to get the word document and split each and every word without any special character and also no number will not present while extracting ..

extracted words are stored into database

推荐答案

基本步骤将是:

1)使用Word API(基于COM)提取文档文本. Web上有很多与此相关的文章,包括MSDN.

2)将文本拆分为单词.可以为此使用String.Split或Regex.

3)遍历单词并删除那些不需要的字符.也可以使用LINQ. Google提供示例.
Basic steps would be:

1) Extract the document text using the Word API (COM based). There are numerous articles on this on the web, including on MSDN.

2) Split the text into words. Can use String.Split or Regex for this.

3) Iterate through the words and remove those with the characters you don''t want. Can use LINQ too. Google for samples.


查看此链接


http://www.c-sharpcorner.com/UploadFile/Globalking/fileAccessingusingcsharp02242006050050AM/fileAccessingusingcsharp02.aspx [ ^ ]
see this link


http://www.c-sharpcorner.com/UploadFile/Globalking/fileAccessingusingcsharp02242006050207AM/fileAccessingusingcsharp.aspx[^]


TextReader trs = new StreamReader(@"D:\path.txt");
            String s=trs.ReadLine();
            while (s != null)
            {
                String x = "";

               for (int i = 0; i < s.Length; i++)
               {

                   if(s[i]!='.')//add the required special characters along withit
                       x += s[i];
               }
               s = x;
               s = trs.ReadLine();
            }





试试这个替换您自己的路径.





try this replace the path to your own..


这篇关于C#.net阅读Word文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆