Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char [英] Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

查看:19
本文介绍了Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要 .NET 转换/将 SQL nchar 转换为 char.
更具体地说,将 nchar UNICODE 转换为 char ASCII.

Need a .NET cast / convert of the SQL nchar to char.
More specifically cast of the nchar UNICODE to the char ASCII.

这很复杂的地方是 SQL char 使用完整字节.
不是 128 的纯 ASCII.
TSQL 函数 ASCII 返回 0-255.

Where this is complicated is SQL char uses the full byte.
Not the pure ASCII of 128.
The TSQL function ASCII returns 0-255.

理想情况下会有一个 FormByte 的 NormalizationForm.
它不会是一个精确的文本值——而是一个接近的逻辑值或 ?.
SQL 将使用 FormByte 将 nchar 转换为 char.
NormalizationForm

Ideally there would be a NormalizationForm of FormByte.
It would not be an exact textual value - rather a close logical value or ?.
And SQL would use the FormByte to cast from nchar to char.
NormalizationForm

编码解码对我不起作用,我尝试了所有口味.

Encode Decode did not work for me and I tried all flavors.

在 SQL 中,许多字符(字节)被映射到 63.63是?.
不仅仅是超过 255 的字符映射到 63.
130 到 140 都映射到 63.

In SQL many char (byte) are mapped to 63. 63 is ?.
Not just char over 255 that are mapped to 63.
130 to 140 are all mapped to 63.

字符 160-255 都返回 160-255

Characters 160-255 all return 160-255

超过 255 并非全部都映射到 63.
例如,许多变音符号被映射到 ASCII.

Over 255 not all are mapped to 63.
For example many diacritics are mapped to the ASCII.

TSQL 具有 UNICODE 和 ACSII 功能.
所以我只是将所有 Unicode 字符加载到 char 和 nchar 列中.

TSQL has UNICODE and ACSII functions.
So I just loaded all Unicode characters into both char and nchar columns.

SQL 返回的字符错误为 29 个字符.
并且为坏字符返回的 ASCII() 没有意义 - 130 - 160 范围内的所有控制字符.
检查二进制文件中是否有错误的 29,存储的是 ASCII() 返回的内容.
对于 27,从 char 返回的是 nchar,而对于 2,它甚至不是正确的 nchar.它们都应该映射到 ?或等效的 ACSII.
和"映射到(但会占用?)‘和’映射到‘
– (破折号) 和 — (破折号) 映射到 -
...

The char returned by SQL is wrong for 29 characters.
And the ASCII() returned for the bad chars does not make sense - all control chars in the 130 - 160 range.
Checked the binary for the incorrect 29 and what is is stored is what is returned by ASCII().
For 27 what is being return from char is nchar and for 2 it is not even the correct nchar. They should all be mapped to ? or a an ACSII equivalent.
" and " mapped to " (but will take ?) ‘ and ’ mapped to '
– (en dash) and — (em dash) mapped to -
...

我知道你不相信我.
将 'Œ' 插入字符列并选择它 - 它将返回 'Œ'.
你甚至可以搜索它 - char = 'Œ' 返回 true.
选择 ASCII('Œ') 返回 140,这就是实际存储的内容(检查二进制).
140/8C 的 UNICODE 定义是 Partial Line Backward.
我检查了那个字符的二进制值,它是 8C (140).
返回的是 unicode 'Œ' Int16 338.
看起来 SQL 正在做一些输入输出映射并且出错了.

I know you don't believe me.
Insert 'Œ' into a char column and select it - it will return 'Œ'.
And you can even search on it - char = 'Œ' returns true.
Select ASCII('Œ') returns 140 and that is what is actually stored (check the binary).
The UNICODE definition of 140 / 8C is Partial Line Backward.
I checked the binary value for that char and it is 8C (140).
What is returned is the unicode 'Œ' Int16 338.
It appears SQL is doing some input output mapping and getting it wrong.

ASCII 函数对于 575 个未映射到 ? 的 unicode 字符是正确的.
char 值与 ACSII 匹配,它们都有意义.
例如.u 的 12 种不同形式都映射到 u.
32163 个字符,除了 ?被映射到?(63).

ASCII function is correct for 575 unicode characters that are not mapped to ?.
The char value matches the ACSII and they all make sense.
E.G. 12 different forms of u are all mapped to u.
32163 characters other than ? are mapped to ? (63).

以下是返回错误值的 29 个字符.
列顺序:
字符
nchar
ASCII(字符)
UNICODE(nchar)

Below are the 29 char that return the wrong value.
Column order:
char
nchar
ASCII(char)
UNICODE(nchar)

     sqlCharASCIIbackToString did not match  ΠΠ140 338
     sqlCharASCIIbackToString did not match  œ œ 156 339
     sqlCharASCIIbackToString did not match  Š Š 138 352
     sqlCharASCIIbackToString did not match  š š 154 353
     sqlCharASCIIbackToString did not match  Ÿ Ÿ 159 376
     sqlCharASCIIbackToString did not match  Ž Ž 142 381
     sqlCharASCIIbackToString did not match  ž ž 158 382
     sqlCharASCIIbackToString did not match  ƒ Ƒ 131 401
     sqlCharASCIIbackToString did not match  ƒ ƒ 131 402
     sqlCharASCIIbackToString did not match  ˆ ˆ 136 710
     sqlCharASCIIbackToString did not match  ˜ ˜ 152 732
     sqlCharASCIIbackToString did not match  – – 150 8211
     sqlCharASCIIbackToString did not match  — — 151 8212
     sqlCharASCIIbackToString did not match  ‘ ‘ 145 8216
     sqlCharASCIIbackToString did not match  ’ ’ 146 8217
     sqlCharASCIIbackToString did not match  ‚ ‚ 130 8218
     sqlCharASCIIbackToString did not match  " " 147 8220
     sqlCharASCIIbackToString did not match  " " 148 8221
     sqlCharASCIIbackToString did not match  „ „ 132 8222
     sqlCharASCIIbackToString did not match  † † 134 8224
     sqlCharASCIIbackToString did not match  ‡ ‡ 135 8225
     sqlCharASCIIbackToString did not match  • • 149 8226
     sqlCharASCIIbackToString did not match 
     … … 133 8230
     sqlCharASCIIbackToString did not match  ‰ ‰ 137 8240
     sqlCharASCIIbackToString did not match  ‹ ‹ 139 8249
     sqlCharASCIIbackToString did not match  › › 155 8250
     sqlCharASCIIbackToString did not match  € € 128 8364
     sqlCharASCIIbackToString did not match  ™ ™ 153 8482
     sqlCharASCIIbackToString did not match  ˜ ≈ 152 8776
     count63 =  32163 countMis =  29 countCorrect =  575

运行以下 .NET 以查看 SQL 返回哪个 'Œ'

Ran the following .NET to see which 'Œ' if being returned by SQL

char char338 = (char)338;
System.Diagnostics.Debug.WriteLine(char338);
sqlCmd.CommandText = "select [char] from [charNchar] where [char] = @char;";
sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = char338;
string string338= sqlCmd.ExecuteScalar().ToString();
char338 = string338.ToCharArray()[0];
System.Diagnostics.Debug.WriteLine(char338 + " " + ((Int16)char338).ToString());

以上代码返回 Œ 338.
SQL 返回一个大于字节的值,数据类型应该存储为字节.
如果我搜索 (char)140 那么?63 返回.

The above code returns Π338.
SQL is returning a value larger than byte to datatype is supposedly stored as byte.
If I search on (char)140 then ? 63 is returned.

有趣的是在 char 上搜索 'Œ' 和 N'Œ' 会产生不同的结果.
即在左侧搜索 (140) Œ.
在右侧搜索 (338) Œ 字符搜索未找到任何内容.
Nchar 使用任一输入查找两个结果.

What is interesting is a search on 'Œ' versus N'Œ' on char produce different results.
That is searching on the left (140) Œ.
Search on the right (338) Πthe char search finds nothing.
Nchar finds both results with either input.

  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = N'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = N'Œ'


int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Π                                                 140         Π                                                 338

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Π                                                 140         Π                                                 338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Π                                                 140         Π                                                 338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Π                                                 140         Π                                                 338
339    œ                                                  156         œ                                                  339

≈ search 对四个查询中的任何一个都没有找到任何内容.检查图表,这是 8776 的正确字符,并且数学几乎等于.

≈ search finds nothing with any of the four queries.   Checked the charts and that is the correct char for 8776 and is math almost equal to. 

~ 是粘贴到 SSMS 中的零宽度,但它好像是粘贴到从蓝色变为黑色的东西.

˜ is zero width pasted into SSMS but it is something as if it is pasted into the FROM the blue turn to black. 

我是否遗漏了什么 - 这对我来说似乎是一个错误.
这不仅仅是他错误的值,而是一个无效的值.
返回一个 Int16.
假设我想使用字节来存储字符以节省空间 - 它会在 SQL 字符上中断,因为 29 个字符不是作为字节返回的.

Am I missing something - this seems like a bug to me.
It is not just he wrong value it is an invalid value.
An Int16 is returned.
Lets say I wanted to use byte to store character to save space - it would break on SQL char as 29 chars are not returned as byte.

这是我使用的代码:

public void SQLchar()
{

    SqlConnection sqlCon = new SqlConnection(connString);  
    try
    {         
        sqlCon.Open();
        SqlCommand sqlCmd = sqlCon.CreateCommand();
        SqlDataReader rdr;
        sqlCmd.CommandText = "delete charNchar";
        sqlCmd.ExecuteNonQuery();
        for(Int16 i = 0; i < Int16.MaxValue; i ++)
        {
            sqlCmd.CommandText = "insert into charNchar (int16,char,nchar) values (@int16, @char, @nchar);";
            sqlCmd.CommandType = System.Data.CommandType.Text;
            sqlCmd.Parameters.Clear();
            sqlCmd.Parameters.Add("@int16", SqlDbType.Int).Value = i;
            sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = (char)i;
            sqlCmd.Parameters.Add("@nchar", SqlDbType.NChar).Value = (char)i;
            sqlCmd.ExecuteNonQuery();
        }
        string sqlChar;
        string sqlNChar;
        Int16 sqlCharASCII;
        Int16 sqlNCharUnicode;
        string sqlCharASCIIbackToString;
        sqlCmd.CommandText = "select char,nchar,ASCII(char),UNICODE(nchar) from charNchar order by int16;";
        rdr = sqlCmd.ExecuteReader();
        Int16 count63 = 0;
        Int16 countMis = 0;
        Int16 countCorrect = 0;
        while (rdr.Read())
        {
            sqlChar = rdr.IsDBNull(0) ? "dbNull" : rdr.GetString(0);
            sqlNChar = rdr.IsDBNull(1) ? "dbNull" : rdr.GetString(1);
            sqlCharASCII = rdr.IsDBNull(2) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(2);
            sqlNCharUnicode = rdr.IsDBNull(3) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(3);
            if(sqlCharASCII == 63 && sqlNCharUnicode != 63)
            {
                count63 ++;
                continue;  // ?
            }
            if (sqlCharASCII < 0)
            {
                System.Diagnostics.Debug.WriteLine("ASCII(char) null for " + sqlChar + " " + sqlNChar);
            }
            else
            {
                sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();
                if (string.CompareOrdinal(sqlChar, sqlCharASCIIbackToString) != 0)
                {
                    countMis++;
                    System.Diagnostics.Debug.WriteLine(" sqlCharASCIIbackToString did not match " + sqlCharASCIIbackToString + " " + sqlChar + " " + sqlNChar + " " + sqlCharASCII + " " + sqlNCharUnicode);
                }
                else
                {
                    countCorrect++;
                }
            }
        }
        rdr.Close();
        System.Diagnostics.Debug.WriteLine("count63 =  " + count63.ToString() + " countMis =  " + countMis.ToString() + " countCorrect =  " + countCorrect.ToString());
    }
    catch (Exception Ex)
    {
        System.Diagnostics.Debug.WriteLine(Ex.Message);
    }
    finally 
    {
        sqlCon.Close();
    }
}

至于为什么.
解析 .NET 中的字符串数据,该数据是 FK.
而不是往返 SQL 来获取 FK 的 ID,而是使用 .NET 字典来提高速度.
字典是反向查找以从值中获取键.
解析器具有已被解析器使用的字符的 Int16.
因此,如果字符的 ASCII 错误,则反向查找将失败.
我想我可以对不正确的 ASCII 结果进行硬编码修复.
但我想在走上一条从补丁开始的道路之前了解这里发生了什么.
char 是否存在一些根本性缺陷?
可以只使用 nchar,但我们更喜欢 char.
应用程序的本质是我们想要匹配.
u 的 6 个变音符号全部匹配 ascii u 是一件好事.

As for why.
Parse string data in .NET and that data is a FK.
Rather than round trip to SQL to get the ID for the FK use a .NET Dictionary for speed.
The Dictionary is a reverse lookup to to get the key from the value.
The parser has the Int16 of the char as that is already used by the parser.
So if the ASCII of the char is wrong then the reverse lookup fails.
I think I could hard code fixes for the incorrect ASCII results.
But I want to understand what is going on here before I go down a path that starts with patches.
Does char have some fundamental flaws?
Could just use nchar but we prefer char.
The nature of the application is we want matches.
6 diacritics of u to all match ascii u is a good thing.

推荐答案

你严重混淆了代码点值和编码字节值.

You are massively confusing code point values and encoded byte values.

代码点 U+0152(338 或 Œ)在 Windows-1252 中被编码为字节 0x8C 或十进制的 140,这就是名称错误的 ASCII() 函数返回的内容.巧合的是,Windows-1252 中的许多代码点的编码方式是,被编码的代码点与该代码点的编码字节值具有相同的值.

The code point U+0152 (338 or Œ), is encoded in Windows-1252 as the byte 0x8C or 140 in decimal, that's what the badly named ASCII() function returns you. It is just a coincidence that many code points in Windows-1252 are encoded in a way that the code point being encoded has the same value as the encoded byte's value for that code point.

Windows-1252 只能编码:

Windows-1252 can only encode:

0-127
160-255

那些在一个范围内不整齐的:

And these that don't go neatly in a range:

338,339,352,353,376,381,382,402,
710,732,8211,8212,8216,8217,8218,
8220,8221,8222,8224,8225,8226,
8230,8240,8249,8250,8364,8482

第二批中的任何代码点都不会以字节值<->代码点值进行编码,这正是您所期望的.

None of the code points in the second batch will be encoded in byte value<->code point value which is what you seem to be expecting.

Windows-1252 无法编码范围 128-159,因此尝试转换任何内容在该范围内(例如 130 或 140)只会被编码为 ? 或 0x3F.那个范围反正都是无用的 C1 控制字符.

Windows-1252 cannot encode the range 128-159, so attempting to convert anything in that range (e.g. 130 or 140) just gets encoded as ? or 0x3F. That range is pretty much useless C1 control characters anyway.

它也没有利用它拥有的完整 256 个字符空间,它只编码251 个不同的字符.所以你不能将它用作伪字节,因为5 个字节是无效的 Windows-1252.如果这就是你试图做的,它就行不通了.

It also doesn't utilize the full 256 character space it has, it only encodes 251 different characters. So you cannot use it as pseudo-bytes because 5 bytes are invalid Windows-1252. If that's what you tried to do, it won't work.

实际上并不清楚你想要做什么高级别的事情,所以我会有一个猜测.

It is not actually clear what high level thing you are even trying to do so I'll have a guess.

如果您想不区分重音进行匹配,那么只需使用不区分重音的排序规则.然后üúù等将全部匹配 u.与编码无关.

If you wanted to match accent-insensitively, then just use an accent-insensitive collation. Then ü,ú, ù etc will all match u. Nothing to do with encodings.

CREATE TABLE Mytable (
    Mycolumn NVARCHAR(10) COLLATE Latin1_General_CI_AI
)

INSERT INTO Mytable (myColumn) VALUES( 'ü' ), ('ú'), ( 'ù' )

SELECT Mycolumn
FROM Mytable
WHERE Mycolumn = 'u'

--Results

MYCOLUMN
ü
ú
ù

这是一个演示 http://sqlfiddle.com/#!3/67752/2.

要将 SQLAscii 转换为 'Œ',试试这个:

To convert the SQLAscii to 'Œ', try this:

public static char Windows1252CPtoChar(int cp)
{
    Encoding win1252 = Encoding.GetEncoding("Windows-1252"); //this could be made static
    return win1252.GetString(new byte[] { (byte)cp })[0];
}

public static void Main(string[] args) {
    Console.WriteLine(Windows1252CPtoChar(140) == 'Œ');
}

所以代替:

sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();

sqlCharASCIIbackToString = (Windows1252CPtoChar(sqlCharASCII)).ToString();

这篇关于Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆