编码doc和docx文档 [英] Encoding doc and docx documents

查看:62
本文介绍了编码doc和docx文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我正在尝试通过代码读取word文档,并用REGEX替换某些表达式.

I'm trying to read a word document by code and replace some expressions by REGEX.

我的文档不是英语,因此我无法正确阅读它并使用我的Regex模板.

My document is non-English and because of that i cant read it properly and use my Regex template.

当我在简单文本文件(.txt)上执行此操作时,它可以正常工作,但不适用于msword文档.

When I do it on simple text file (.txt) it works fine, but not with msword documents.

我尝试阅读具有很多编码类型的文档,但没有成功.

I tried to read the document with a lot of encoding types, but no success.

有人可以推荐使用c#读取(希伯来语)msword文档的最佳方法吗?

Can someone recommend on the best way to read (hebrew) msword documents by c#?

我的代码:

字符串文本= System.IO.File.ReadAllText(TemplatePath,Encoding.Unicode);
MatchCollection coll = Regex.Matches(text,"\\ [\\ [.*?\\] \\]",RegexOptions.RightToLeft);

string text = System.IO.File.ReadAllText(TemplatePath,Encoding.Unicode);
MatchCollection coll = Regex.Matches(text,"\\[\\[.*?\\]\\]",RegexOptions.RightToLeft);

即使我的文本中有此表达式,Regex也找不到它,因为它得到的文本不正确.

Even Though I have this expression in my text the Regex cannot find it because the wrong text it gets.

非常感谢!

推荐答案

您好

Hi shira co_,

谢谢您在这里发布.

>>有人可以推荐使用c#读取(希伯来语)msword文档的最佳方法吗?

>>Can someone recommend on the best way to read (hebrew) msword documents by c#?

您可以尝试使用 StreamReader (String,Encoding)

You can try using StreamReader(String, Encoding) to read word document

该方法为指定的流初始化StreamReader类的新实例.

The method initializes a new instance of the StreamReader class for the specified stream.

请参考以下代码:

//Get a new StreamReader in ASCII format from a
    //file using a buffer and byte order mark detection
    StreamReader srAsciiFromFileFalse512 = 
        new StreamReader("C:\\Temp\\Test.txt",
        System.Text.Encoding.ASCII, false, 512);

我希望回复对您有帮助.

I hope the reply would be helpful to you.

最好的问候,

哈特


这篇关于编码doc和docx文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆