使用UTF-7忽略+字符的File.ReadAllText [英] File.ReadAllText with UTF-7 ignoring + characters

查看:122
本文介绍了使用UTF-7忽略+字符的File.ReadAllText的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在程序写入的磁盘上有一个文件,其中一些数据用Json编码.

I have a file on the disk that has been written by the program, with some data encoded in Json.

我正在使用C#的File.ReadAllText(字符串路径,编码编码)在以后读取它.由于不相关的原因,我们必须使用UTF-7.

I am using C#'s File.ReadAllText(string path, Encoding encoding) to read it later. For unrelated reasons, we have to work with UTF-7.

我们的行如下所示:

var content = File.ReadAllText(fileName, Encoding.UTF7);

对于基本上我们需要的所有内容,它工作正常,先编写然后阅读.唯一的例外是加号(+).如果我们的文件中有+号,则此代码将返回整个字符串,而忽略所有这些字符串.所以

It works fine, writing then reading, for basically everything we need. The only exception is the plus sign (+). If there is a + sign in our file, this code returns the entire string ignoring all of those. So

{ "commandValue": "testvalue + otherValue" }

变成

{ "commandValue": "testvalue  otherValue" }

我已经检查了文件字节,并且+号确实是char 0x2B,这在UTF-7中是正确的字符(在UTF-8中也是相同的char,不确定是否重要).

I have checked the file bytes, and the + sign is indeed char 0x2B, which is the right character in UTF-7 (and also the same char in UTF-8, not sure if it matters).

我不知道为什么他们在阅读时会消失.

I can't figure out why they disappear when reading it.

为了测试,我尝试使用

var content = File.ReadAllText(fileName, Encoding.UTF8);

,效果很好.字符并没有消失.

and it worked fine. The chars did not disappear.

我可能会做错什么,如何使File.ReadAllText(fileName,Encoding.UTF7)不忽略那些字符?

What could I possibly be doing wrong, and how could I make File.ReadAllText(fileName, Encoding.UTF7) not ignore those characters?

到目前为止,我还没有找到另一个有此问题的字符,但显然我没有测试所有这些字符.

As of now, I haven't found another char that has this problem, but I obviously did not test all of them.

推荐答案

该文件未使用UTF7写入. "+"是UTF7编码方案中的特殊字符,用于表示修饰的base64"序列的开始.因此,当文件读取为UTF7时,解码器会看到"+",期望修改后的base64序列(但找不到),然后照常继续解码文件.结果,从输出中抑制了"+".

The file is not being written using UTF7. The '+' is a special character in the UTF7 encoding scheme used to denote the start of a "modified base64" sequence. So, when the file is read as UTF7, the decoder sees the '+', expects a modified base64 sequence (but finds none), and then continues decoding the file as usual. The '+' is suppressed from the output as a result.

要解决您遇到的问题,可以潜在地尝试以UTF8格式读取文件,或者可以更新写入文件的代码以确保其使用UTF7编码.

To fix the issue you're seeing, you could potentially try reading the file as UTF8, or you could update the code that writes the file to ensure that it uses UTF7 encoding.

这篇关于使用UTF-7忽略+字符的File.ReadAllText的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆