编码doc和docx文档 [英] Encoding doc and docx documents
问题描述
大家好,
我正在尝试通过代码读取word文档,并用REGEX替换某些表达式.
I'm trying to read a word document by code and replace some expressions by REGEX.
我的文档不是英语,因此我无法正确阅读它并使用我的Regex模板.
My document is non-English and because of that i cant read it properly and use my Regex template.
当我在简单文本文件(.txt)上执行此操作时,它可以正常工作,但不适用于msword文档.
When I do it on simple text file (.txt) it works fine, but not with msword documents.
我尝试阅读具有很多编码类型的文档,但没有成功.
I tried to read the document with a lot of encoding types, but no success.
有人可以推荐使用c#读取(希伯来语)msword文档的最佳方法吗?
Can someone recommend on the best way to read (hebrew) msword documents by c#?
我的代码:
字符串文本= System.IO.File.ReadAllText(TemplatePath,Encoding.Unicode);
MatchCollection coll = Regex.Matches(text,"\\ [\\ [.*?\\] \\]",RegexOptions.RightToLeft);
string text = System.IO.File.ReadAllText(TemplatePath,Encoding.Unicode);
MatchCollection coll = Regex.Matches(text,"\\[\\[.*?\\]\\]",RegexOptions.RightToLeft);
即使我的文本中有此表达式,Regex也找不到它,因为它得到的文本不正确.
Even Though I have this expression in my text the Regex cannot find it because the wrong text it gets.
非常感谢!
推荐答案
Hi shira co_,
谢谢您在这里发布.
>>有人可以推荐使用c#读取(希伯来语)msword文档的最佳方法吗?
>>Can someone recommend on the best way to read (hebrew) msword documents by c#?
您可以尝试使用 StreamReader (String,Encoding)
You can try using StreamReader(String, Encoding) to read word document
该方法为指定的流初始化StreamReader类的新实例.
The method initializes a new instance of the StreamReader class for the specified stream.
请参考以下代码:
//Get a new StreamReader in ASCII format from a
//file using a buffer and byte order mark detection
StreamReader srAsciiFromFileFalse512 =
new StreamReader("C:\\Temp\\Test.txt",
System.Text.Encoding.ASCII, false, 512);
我希望回复对您有帮助.
I hope the reply would be helpful to you.
最好的问候,
哈特
这篇关于编码doc和docx文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!