如何将 Unicode 字符转换为其等效的 ASCII 字符 [英] How to convert a Unicode character to its ASCII equivalent

查看:24
本文介绍了如何将 Unicode 字符转换为其等效的 ASCII 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题来了:

在 C# 中,我从旧版 ACCESS 数据库中获取信息.在将内容交给我之前,.NET 会将数据库的内容(在此问题的情况下为字符串)转换为 Unicode.

In C# I'm getting information from a legacy ACCESS database. .NET converts the content of the database (in the case of this problem a string) to Unicode before handing the content to me.

如何将此 Unicode 字符串转换回它的 ASCII 等效字符串?

How do I convert this Unicode string back to it's ASCII equivalent?

<小时>编辑
Unicode char 710 确实是 MODIFIER LETTER CIRCUMFLEX ACCENT.这是更精确的问题:


Edit
Unicode char 710 is indeed MODIFIER LETTER CIRCUMFLEX ACCENT. Here's the problem a bit more precise:

 -> (Extended) ASCII character ê (Extended ASCII 136) was inserted in the database.
 -> Either Access or the reading component in .NET converted this to U+02C6 U+0065
    (MODIFIER LETTER CIRCUMFLEX ACCENT + LATIN SMALL LETTER E)
 -> I need the (Extended) ASCII character 136 back.

<小时>这是我尝试过的(我现在明白为什么这不起作用......):


Here's what I've tried (I see now why this did not work...):

string myInput = Convert.ToString(Convert.ToChar(710));
byte[] asBytes = Encoding.ASCII.GetBytes(myInput);

但这不会导致 94 而是一个值为 63 的字节...
这是一个新的尝试,但它仍然不起作用:

But this does not result in 94 but a byte with value 63...
Here's a new try but it still does not work:

byte[] bytes = Encoding.ASCII.GetBytes("ê");

<小时>解决方案
感谢 csgerobzlm 用于指向在正确的方向我解决了问题这里.


Soltution
Thanks to both csgero and bzlm for pointing in the right direction I solved the problem here.

推荐答案

好的,让我们详细说明.csgerobzlm 指向右侧方向.

Okay, let's elaborate. Both csgero and bzlm pointed in the right direction.

因为 blzm 的回复,我在 wiki 上查找了 Windows-1252 页面,发现它被称为代码页.代码页 的维基百科文章说明如下:

Because of blzm's reply I looked up the Windows-1252 page on wiki and found that it's called a codepage. The wikipedia article for Code page which stated the following:

对于这些‘扩展字符集'没有正式的标准;IBM 只是将这些变体称为代码页,就像它一直对 EBCDIC 编码的变体所做的那样.

No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.

这让我看到了代码页 437:

This led me to codepage 437:

n 与 ASCII 兼容的代码页,较低的 128 个字符保持其标准 US-ASCII 值,并且可以在较高的 128 个字符中使用不同的页面(或字符集).例如,为北美市场构建的 DOS 计算机使用 代码页 437,其中包括重音法语、德语和其他一些欧洲语言所需的字符,以及一些图形画线字符.

n ASCII-compatible code pages, the lower 128 characters maintained their standard US-ASCII values, and different pages (or sets of characters) could be made available in the upper 128 characters. DOS computers built for the North American market, for example, used code page 437, which included accented characters needed for French, German, and a few other European languages, as well as some graphical line-drawing characters.

因此,代码页 437 是我称之为扩展 ASCII"的代码页,它的 ê 作为字符 136,所以我也查找了一些其他字符,它们看起来是正确的.

So, codepage 437 was the codepage I was calling 'extended ASCII', it had the ê as character 136 so I looked up some other chars as well and they seem right.

csgero 带有 Encoding.GetEncoding() 提示,我用它创建了以下语句来解决我的问题:

csgero came with the Encoding.GetEncoding() hint, I used it to create the following statement which solves my problem:

byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");

这篇关于如何将 Unicode 字符转换为其等效的 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆