使用c#拆分doc文件 [英] split a doc file using c#

查看:77
本文介绍了使用c#拆分doc文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个doc文件,其中包含很多文章,每篇文章都有4个进入区分...

我想分开每篇文章,我正在使用这段代码..





I have a doc file, which contains many articles, and each article is differentiate by 4 enters...
I wan to separate each article, i'm using this code..


string[] seperator=new string[] {"\n\n\n\n"};
foreach(word.range range in doc.StoryRanges)
{
  string[] files = richTextBox1.Text.Split(seperator, StringSplitOptions.None);
}





但它不起作用......

请帮我这个..



but it's not working...
please help me with this..

推荐答案

我首先在richtextbox中加载整个文件,然后对richtextbox使用相同的代码,代码现在正在运行......我还是不行得到它,为什么它为richtextbox工作,如果我直接访问该文件...你能解释一下吗





非常简单:RichTextBox的Text属性返回没有任何格式信息的内容。原始DOC文件不包含:它包含格式化信息,并且很可能不会单独使用单个'\ n'来表示换行符,或者\ n \\\\\\ n在序列中指示四个。



有一种简单的方法可以检查:在不解释格式代码的文本编辑器中加载文件(记事本,或者我使用PSPad)并查看它。找到第一篇文章的结尾,看看有什么。如果您使用十六进制编辑器而不是文本编辑器,如果文件内容包含二进制信息,则可能会有所帮助。
"i first loaded whole file in a richtextbox and then used the same code for the richtextbox and the code is working now... I still dont get it, why it's working for richtextbox not if i direct access the file... Can you please explain it to me"


Pretty simple: the Text property of a RichTextBox returns the content without any formatting information. A raw DOC file doesn't: it contains formatting information, and very likely does not use a single '\n' on it's own to indicate a line break, or "\n\n\n\n" to indicate four in a sequence.

There is an easy way to check: load your file in a text editor that doesn't interpret the formatting codes (Notepad, or I use PSPad) and look at it. Find the end of the first article, and see what is there. It may help if you use a Hex Editor instead of a text editor, if teh file content contains binary info.


这篇关于使用c#拆分doc文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆