MemoryStream的从字符串 - 混淆编码使用 [英] MemoryStream from string - confusion about Encoding to use

查看:736
本文介绍了MemoryStream的从字符串 - 混淆编码使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张code表示字符串转换成内存流:

I have a piece of code that converts string into memory stream:

using (MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(applicationForm)))

不过我有点困惑,如果它是正确的。基本上,我总是感到困惑.NET编码。

However I'm a bit confused if it's correct. Basically I'm always confused about .NET encoding.

底线:我用正确的编码对象( UTF8 ),以获得字节?

Bottom line: do I use correct encoding object (UTF8) to get bytes?

我知道,在内部.NET存储字符串作为 UTF-16 ,但我的 applicationForm 变量是基于文件的文本这是保存在 UTF-8 编码。

I know that internally .NET stores string as UTF-16, but my applicationForm variable was based on file with text which was saved in UTF-8 encoding.

谢谢你,帕维尔

编辑1:让我们来解释一下我究竟是如何获得的 applicationForm 变量。我确实有机会获得大会,公开类方法的 GenerateApplicationForm 。该方法返回的字符串。但是我知道,幕后的某个地方,组件使用存储在drive.Content文件,这些文件均设有连接使用UTF-8 codeD。所以我无法读取文件直接等等。我只是有一个字符串,我知道了,原来,UTF-8 EN codeD文件被使用。 在客户端code,所使用的一个 GenerateApplicationForm 部分,我必须转换 applicationForm 变量转换成流,COS等组成(从另一个组件)期待一个<强>流。这就是使用.... 中提到的问题泉水变成行动的语句。

EDIT 1: Let's explain exactly how I get applicationForm variable. I do have access to assembly that exposes class with method GenerateApplicationForm. That method returns string. However I know, that somewhere behind the scenes, component uses files stored on drive.Content of those files are encoded using UTF-8. So I can't read file directly etc. I only have that string and I know, that originally, UTF-8 encoded file is used. In client code, the one that used GenerateApplicationForm component, I have to convert applicationForm variable into stream, cos other components (from another assembly) is expecting a Stream. That's where using.... statement mentioned in question springs into action.

推荐答案

假设 applicationForm 是一些读取字符串 UTF8 文本文件。这将是 UTF16 / 统一code ,不论源文件的编码。当你加载的文件到字符串的转换发生。

Assuming applicationForm is a string you read from some UTF8 text file. It will be UTF16/Unicode, whatever the encoding of the source file. The conversion happened when you loaded the file into the string.

您code将连接code中的 applicationForm 串入的的MemoryStream UTF8 字节。

Your code will encode the applicationForm string into a MemoryStream of UTF8 bytes.

这可能会或可能不正确取决于你想用它做什么。

This may or may not be correct depending on what you want to do with it.

净字符串总是 UTF16 统一code 。当字符串转换为文件,流或字节[] ,也可以是连接codeD以不同的方式。 1字节不足以存储所有在所有的语言,因此更加复杂串需要连接$ C $光盘中使用的不同characheters因此一个charachter可以重新presented由一个以上的字节,有时或总是根据编码用了。

.Net strings are always UTF16 or Unicode. When Strings are converted to files, streams or byte[], they can be encoded in different ways. 1 byte is not enough to store all the different characheters used in all languages so more complicated strings need to be encoded so one charachter can be represented by more than one byte, Sometimes or always depending on the encoding used.

如果你使用像 ASCII一个简单的编码一characheter总是包含一个字节,但该数据将被限制在了 ASCII charachter集。如有多字节characheters用于任何UTF编码转换为ASCII可能会丢失数据。

If you use a simple encoding like ASCII one characheter will always comprise of one byte but the data will be limited to the ASCII charachter set. Converting to 'ASCII' from any UTF encoding could lose data if any multi-byte characheters are used.

有关单code全貌去这里

编辑1: 除非在 GenerateApplicationForm 的组件上进一步信息,enconding UTF8 很可能是正确的选择。如果do​​esent工作,尽量 ASCII UTF16 。最重要的是,咨询组件源$ C ​​$ c或组件提供商。

EDIT 1: Barring further info on the GenerateApplicationForm component, enconding UTF8 is likely to be the right choice. If that doesent work, try ASCII or UTF16. Best of all, consult the component source code or the component provider.

编辑2: 绝对 UTF8 那么,你是正确的。

EDIT 2: Definitely UTF8 then, you were right all along.

这篇关于MemoryStream的从字符串 - 混淆编码使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆