RTF文件的编码 [英] Encoding of rtf file

查看:1266
本文介绍了RTF文件的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到一个base64连接codeD字符串,再presents一个RTF文件。

I get a base64 encoded string which represents a rtf-file.

如果我看原文重新presentation(带的base64 code)之前我看到字符序列 FR 。这应该代表毛皮,在浏览器中显示时。 RTF格式文件的标题包含 ansicpg1252 因此除另有变化,这应该是编码(转义序列,字体定义,..)。

If I look the original text representation (before base64 encode) I see the character sequence F¸r. This should stand for Für, when displayed in a viewer. The header of the rtf-file contains ansicpg1252 so this should be the encoding except otherwise changed (escape sequences, font definitions, ..).

我现在的问题是,我不能正确地去code基64字符串到原来的再presentation。我从来没有得到 FR 了。相反,我有毛皮甚至˚F\\'FCR 。通过这次在查看器显示德codeD RTF时变音符重新​​presentation是错误的。

My problem now is that I can't correctly decode the base 64 string to its original representation. I never get F¸r anymore. Instead I have Für or even F\'fcr. Through this the representation of the umlaut is wrong when displaying the decoded rtf in a viewer.

那么,什么是RTF文件的原始编码?或者是怎么回事错在这里?

So what is the original encoding of the rtf-file? Or what is going wrong here?

您可以看看到一个样本文件 rel=\"nofollow\">。 是基地64 CS codeD字符串,我得到。

You can have a look into a sample file here. This is the base 64 encoded string I get.

编辑:

我没有code的编码,但我想我可以重建。这是我的code此:

I don't have the code for the encoding, but I think I can reconstruct that. This is my code for this:

string path = "/some/path/ltxt1 Kopie.rtf";
byte[] document = File.ReadAllBytes(path);
string base64string = Convert.ToBase64String(document);
var isoBytes = Convert.FromBase64String(base64string);

File.WriteAllText ("/some/path/sketch.rtf", System.Text.Encoding.GetEncoding("iso-8859-1").GetString(isoBytes));

我试图更改编码,但窗口1252 我得到一个错误(素描:不支持编码名称,真实的项目:数组不为null)。

I tried to change the encoding, but with windows-1252 I get an error (sketch: encoding name not supported, real project: array not null).

推荐答案

您的问题不是该文件的编码。如果您运行code和比较的结果,文在每个相同的。

Your issue is not the encoding of the file. If you run your code and compare the results, the text is the same in each.

您的问题是,源文件是ANSI EN $ C $的CD和您的第二个文件是UTF-8 EN codeD。的然而的,在文本的RTF指令告诉什么是间preting的RTF那就是它的ANSI连接codeD(即 ansicpg1252 部分)。因此,然后进行解码它由于不匹配的一塌糊涂。

Your issue is that the source file is ANSI encoded and your second file is UTF-8 encoded. However, the RTF directive in the text tells whatever is interpreting the RTF that is it ANSI encoded (the ansicpg1252 part). So it then makes a total mess of decoding it due to the mismatch.

解决这个问题的最简单的方法是确保您使用匹配的编码写回光盘:

The simplest way around this is to make sure you write it back to disc using the matching encoding:

var iso = Encoding.GetEncoding("ISO-8859-1");
File.WriteAllText("/some/path/sketch.rtf", iso.GetString(isoBytes), iso);

或者更简单地说:

Or, more simply:

File.WriteAllBytes("/some/path/sketch.rtf", isoBytes);

这篇关于RTF文件的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆