如何生成.net中的UTF-8字符集中的所有字符 [英] How to Generate all the characters in the UTF-8 charset in .net

查看:127
本文介绍了如何生成.net中的UTF-8字符集中的所有字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经获得了生成UTF-8字符集中的所有字符的任务,以测试系统如何处理每个字符。我没有太多的字符编码经验。接近我试图增加一个计数器,然后尝试将基数十号转换为等效的UTF-8字符,但到目前为止我还没有找到一个有效的方法到这里在C#3.5

I have been given the task of generating all the characters in the UTF-8 character set to test how a system handles each of them. I do not have much experience with character encoding. The approaching I was going to try was to increment a counter, and then try to translate that base ten number into it's equivalent UTF-8 character, but so far I have no been able to find an effective way to to this in C# 3.5

任何建议都将非常感激。

Any suggestions would be greatly appreciated.

推荐答案

System.Net.WebClient client = new System.Net.WebClient();
string definedCodePoints = client.DownloadString(
                         "http://unicode.org/Public/UNIDATA/UnicodeData.txt");
System.IO.StringReader reader = new System.IO.StringReader(definedCodePoints);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
while(true) {
  string line = reader.ReadLine();
  if(line == null) break;
  int codePoint = Convert.ToInt32(line.Substring(0, line.IndexOf(";")), 16);
  if(codePoint >= 0xD800 && codePoint <= 0xDFFF) {
    //surrogate boundary; not valid codePoint, but listed in the document
  } else {
    string utf16 = char.ConvertFromUtf32(codePoint);
    byte[] utf8 = encoder.GetBytes(utf16);
    //TODO: something with the UTF-8-encoded character
  }
}


b $ b

上面的代码应该遍历当前分配的Unicode字符。您可能需要在本地解析 UnicodeData 文件,并修复任何C#

The above code should iterate over the currently assigned Unicode characters. You'll probably want to parse the UnicodeData file locally and fix any C# blunders I've made.

当前分配的Unicode字符集小于可以定义的集合。当然,你打印一个字符是否看到一个字符取决于很多其他因素,如字体和其他应用程序,它会通过,然后被发射到你的眼球。

The set of currently assigned Unicode characters is less than the set that could be defined. Of course, whether you see a character when you print one of them out depends on a great many other factors, like fonts and the other applications it'll pass through before it is emitted to your eyeball.

这篇关于如何生成.net中的UTF-8字符集中的所有字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆