在C#中将字符串存储为UTF8 [英] Storing a string as UTF8 in C#

查看:324
本文介绍了在C#中将字符串存储为UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在C#中进行很多字符串操作,确实需要将每个字符的字符串存储一个字节.这是因为我在内存中同时需要千兆字节的文本,这导致内存不足的问题.我肯定知道该文本永远不会包含非ASCII字符,因此就我的目的而言,System.String和System.Char将所有内容存储为每个字符两个字节这一事实既没有必要,又是一个真正的问题.

I'm doing a lot of string manipulation in C#, and really need the strings to be stored one byte per character. This is because I need gigabytes of text simultaneously in memory and it's causing low memory issues. I know for certain that this text will never contain non-ASCII characters, so for my purposes, the fact that System.String and System.Char store everything as two bytes per character is both unnecessary and a real problem.

我将开始编码自己的CharAscii和StringAscii类-字符串one将基本上将其数据保存为byte [],并公开类似于System.String的字符串操作方法.但是,似乎要做很多看起来很标准的问题,这似乎需要大量工作,因此,我在此发布的内容确实是为了检查是否已经存在一个更简单的解决方案.例如,是否有某种方法可以使System.String在内部将数据存储为我没有注意到的UTF8,或者以其他方式解决问题?

I'm about to start coding my own CharAscii and StringAscii classes - the string one will basically hold its data as byte[], and expose string manipulation methods similar to the ones that System.String does. However this seems a lot of work to do something that seems like a very standard problem, so I'm really posting here to check that there isn't already an easier solution. Is there for example some way I can make System.String internally store data as UTF8 that I haven't noticed, or some other way round the problem?

推荐答案

如您所见,CLR使用UTF-16进行字符编码.您最好的选择是使用Encoding类&一个BitConverter来处理文本.这个问题有一些在两种编码之间进行转换的好例子:

As you've found, the CLR uses UTF-16 for character encoding. Your best bet may be to use the Encoding classes & a BitConverter to handle the text. This question has some good examples for converting between the two encodings:

在C#中将字符串(UTF-16)转换为UTF-8

这篇关于在C#中将字符串存储为UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆