使用c#拆分doc文件 [英] split a doc file using c#
本文介绍了使用c#拆分doc文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个doc文件,其中包含很多文章,每篇文章都有4个进入区分...
我想分开每篇文章,我正在使用这段代码..
I have a doc file, which contains many articles, and each article is differentiate by 4 enters...
I wan to separate each article, i'm using this code..
string[] seperator=new string[] {"\n\n\n\n"};
foreach(word.range range in doc.StoryRanges)
{
string[] files = richTextBox1.Text.Split(seperator, StringSplitOptions.None);
}
但它不起作用......
请帮我这个..
but it's not working...
please help me with this..
推荐答案
我首先在richtextbox中加载整个文件,然后对richtextbox使用相同的代码,代码现在正在运行......我还是不行得到它,为什么它为richtextbox工作,如果我直接访问该文件...你能解释一下吗
非常简单:RichTextBox的Text属性返回没有任何格式信息的内容。原始DOC文件不包含:它包含格式化信息,并且很可能不会单独使用单个'\ n'来表示换行符,或者\ n \\\\\\ n在序列中指示四个。
有一种简单的方法可以检查:在不解释格式代码的文本编辑器中加载文件(记事本,或者我使用PSPad)并查看它。找到第一篇文章的结尾,看看有什么。如果您使用十六进制编辑器而不是文本编辑器,如果文件内容包含二进制信息,则可能会有所帮助。
"i first loaded whole file in a richtextbox and then used the same code for the richtextbox and the code is working now... I still dont get it, why it's working for richtextbox not if i direct access the file... Can you please explain it to me"
Pretty simple: the Text property of a RichTextBox returns the content without any formatting information. A raw DOC file doesn't: it contains formatting information, and very likely does not use a single '\n' on it's own to indicate a line break, or "\n\n\n\n" to indicate four in a sequence.
There is an easy way to check: load your file in a text editor that doesn't interpret the formatting codes (Notepad, or I use PSPad) and look at it. Find the end of the first article, and see what is there. It may help if you use a Hex Editor instead of a text editor, if teh file content contains binary info.
这篇关于使用c#拆分doc文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文