Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char [英] Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

查看：19 发布时间：2021/9/10 19:19:00 .net sql-server tsql unicode

本文介绍了Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

需要 .NET 转换/将 SQL nchar 转换为 char.
更具体地说，将 nchar UNICODE 转换为 char ASCII.

Need a .NET cast / convert of the SQL nchar to char.
More specifically cast of the nchar UNICODE to the char ASCII.

这很复杂的地方是 SQL char 使用完整字节.
不是 128 的纯 ASCII.
TSQL 函数 ASCII 返回 0-255.

Where this is complicated is SQL char uses the full byte.
Not the pure ASCII of 128.
The TSQL function ASCII returns 0-255.

理想情况下会有一个 FormByte 的 NormalizationForm.
它不会是一个精确的文本值——而是一个接近的逻辑值或 ?.
SQL 将使用 FormByte 将 nchar 转换为 char.
NormalizationForm

Ideally there would be a NormalizationForm of FormByte.
It would not be an exact textual value - rather a close logical value or ?.
And SQL would use the FormByte to cast from nchar to char.
NormalizationForm

编码解码对我不起作用，我尝试了所有口味.

Encode Decode did not work for me and I tried all flavors.

在 SQL 中，许多字符(字节)被映射到 63.63是?.
不仅仅是超过 255 的字符映射到 63.
130 到 140 都映射到 63.

In SQL many char (byte) are mapped to 63. 63 is ?.
Not just char over 255 that are mapped to 63.
130 to 140 are all mapped to 63.

字符 160-255 都返回 160-255

Characters 160-255 all return 160-255

超过 255 并非全部都映射到 63.
例如，许多变音符号被映射到 ASCII.

Over 255 not all are mapped to 63.
For example many diacritics are mapped to the ASCII.

TSQL 具有 UNICODE 和 ACSII 功能.
所以我只是将所有 Unicode 字符加载到 char 和 nchar 列中.

TSQL has UNICODE and ACSII functions.
So I just loaded all Unicode characters into both char and nchar columns.

SQL 返回的字符错误为 29 个字符.
并且为坏字符返回的 ASCII() 没有意义 - 130 - 160 范围内的所有控制字符.
检查二进制文件中是否有错误的 29，存储的是 ASCII() 返回的内容.
对于 27，从 char 返回的是 nchar，而对于 2，它甚至不是正确的 nchar.它们都应该映射到 ?或等效的 ACSII.
和"映射到(但会占用?)‘和’映射到‘
– (破折号) 和 — (破折号) 映射到 -
...

The char returned by SQL is wrong for 29 characters.
And the ASCII() returned for the bad chars does not make sense - all control chars in the 130 - 160 range.
Checked the binary for the incorrect 29 and what is is stored is what is returned by ASCII().
For 27 what is being return from char is nchar and for 2 it is not even the correct nchar. They should all be mapped to ? or a an ACSII equivalent.
" and " mapped to " (but will take ?) ‘ and ’ mapped to '
– (en dash) and — (em dash) mapped to -
...

我知道你不相信我.
将 'Œ' 插入字符列并选择它 - 它将返回 'Œ'.
你甚至可以搜索它 - char = 'Œ' 返回 true.
选择 ASCII('Œ') 返回 140，这就是实际存储的内容(检查二进制).
140/8C 的 UNICODE 定义是 Partial Line Backward.
我检查了那个字符的二进制值，它是 8C (140).
返回的是 unicode 'Œ' Int16 338.
看起来 SQL 正在做一些输入输出映射并且出错了.

I know you don't believe me.
Insert 'Œ' into a char column and select it - it will return 'Œ'.
And you can even search on it - char = 'Œ' returns true.
Select ASCII('Œ') returns 140 and that is what is actually stored (check the binary).
The UNICODE definition of 140 / 8C is Partial Line Backward.
I checked the binary value for that char and it is 8C (140).
What is returned is the unicode 'Œ' Int16 338.
It appears SQL is doing some input output mapping and getting it wrong.

ASCII 函数对于 575 个未映射到 ? 的 unicode 字符是正确的.
char 值与 ACSII 匹配，它们都有意义.
例如.u 的 12 种不同形式都映射到 u.
32163 个字符，除了 ?被映射到?(63).

ASCII function is correct for 575 unicode characters that are not mapped to ?.
The char value matches the ACSII and they all make sense.
E.G. 12 different forms of u are all mapped to u.
32163 characters other than ? are mapped to ? (63).

以下是返回错误值的 29 个字符.
列顺序:
字符
nchar
ASCII(字符)
UNICODE(nchar)

Below are the 29 char that return the wrong value.
Column order:
char
nchar
ASCII(char)
UNICODE(nchar)

     sqlCharASCIIbackToString did not match  Œ Œ 140 338
     sqlCharASCIIbackToString did not match  œ œ 156 339
     sqlCharASCIIbackToString did not match  Š Š 138 352
     sqlCharASCIIbackToString did not match  š š 154 353
     sqlCharASCIIbackToString did not match  Ÿ Ÿ 159 376
     sqlCharASCIIbackToString did not match  Ž Ž 142 381
     sqlCharASCIIbackToString did not match  ž ž 158 382
     sqlCharASCIIbackToString did not match  ƒ Ƒ 131 401
     sqlCharASCIIbackToString did not match  ƒ ƒ 131 402
     sqlCharASCIIbackToString did not match  ˆ ˆ 136 710
     sqlCharASCIIbackToString did not match  ˜ ˜ 152 732
     sqlCharASCIIbackToString did not match  – – 150 8211
     sqlCharASCIIbackToString did not match  — — 151 8212
     sqlCharASCIIbackToString did not match  ‘ ‘ 145 8216
     sqlCharASCIIbackToString did not match  ’ ’ 146 8217
     sqlCharASCIIbackToString did not match  ‚ ‚ 130 8218
     sqlCharASCIIbackToString did not match  " " 147 8220
     sqlCharASCIIbackToString did not match  " " 148 8221
     sqlCharASCIIbackToString did not match  „ „ 132 8222
     sqlCharASCIIbackToString did not match  † † 134 8224
     sqlCharASCIIbackToString did not match  ‡ ‡ 135 8225
     sqlCharASCIIbackToString did not match  • • 149 8226
     sqlCharASCIIbackToString did not match 
     … … 133 8230
     sqlCharASCIIbackToString did not match  ‰ ‰ 137 8240
     sqlCharASCIIbackToString did not match  ‹ ‹ 139 8249
     sqlCharASCIIbackToString did not match  › › 155 8250
     sqlCharASCIIbackToString did not match  € € 128 8364
     sqlCharASCIIbackToString did not match  ™ ™ 153 8482
     sqlCharASCIIbackToString did not match  ˜ ≈ 152 8776
     count63 =  32163 countMis =  29 countCorrect =  575

运行以下 .NET 以查看 SQL 返回哪个 'Œ'

Ran the following .NET to see which 'Œ' if being returned by SQL

char char338 = (char)338;
System.Diagnostics.Debug.WriteLine(char338);
sqlCmd.CommandText = "select [char] from [charNchar] where [char] = @char;";
sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = char338;
string string338= sqlCmd.ExecuteScalar().ToString();
char338 = string338.ToCharArray()[0];
System.Diagnostics.Debug.WriteLine(char338 + " " + ((Int16)char338).ToString());

以上代码返回 Œ 338.
SQL 返回一个大于字节的值，数据类型应该存储为字节.
如果我搜索 (char)140 那么?63 返回.

The above code returns Œ 338.
SQL is returning a value larger than byte to datatype is supposedly stored as byte.
If I search on (char)140 then ? 63 is returned.

有趣的是在 char 上搜索 'Œ' 和 N'Œ' 会产生不同的结果.
即在左侧搜索 (140) Œ.
在右侧搜索 (338) Œ 字符搜索未找到任何内容.
Nchar 使用任一输入查找两个结果.

What is interesting is a search on 'Œ' versus N'Œ' on char produce different results.
That is searching on the left (140) Œ.
Search on the right (338) Œ the char search finds nothing.
Nchar finds both results with either input.

  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = N'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = N'Œ'


int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339

≈ search 对四个查询中的任何一个都没有找到任何内容.检查图表，这是 8776 的正确字符，并且数学几乎等于.

≈ search finds nothing with any of the four queries. Checked the charts and that is the correct char for 8776 and is math almost equal to.

~ 是粘贴到 SSMS 中的零宽度，但它好像是粘贴到从蓝色变为黑色的东西.

˜ is zero width pasted into SSMS but it is something as if it is pasted into the FROM the blue turn to black.

我是否遗漏了什么 - 这对我来说似乎是一个错误.
这不仅仅是他错误的值，而是一个无效的值.
返回一个 Int16.
假设我想使用字节来存储字符以节省空间 - 它会在 SQL 字符上中断，因为 29 个字符不是作为字节返回的.

Am I missing something - this seems like a bug to me.
It is not just he wrong value it is an invalid value.
An Int16 is returned.
Lets say I wanted to use byte to store character to save space - it would break on SQL char as 29 chars are not returned as byte.

这是我使用的代码:

public void SQLchar()
{

    SqlConnection sqlCon = new SqlConnection(connString);  
    try
    {         
        sqlCon.Open();
        SqlCommand sqlCmd = sqlCon.CreateCommand();
        SqlDataReader rdr;
        sqlCmd.CommandText = "delete charNchar";
        sqlCmd.ExecuteNonQuery();
        for(Int16 i = 0; i < Int16.MaxValue; i ++)
        {
            sqlCmd.CommandText = "insert into charNchar (int16,char,nchar) values (@int16, @char, @nchar);";
            sqlCmd.CommandType = System.Data.CommandType.Text;
            sqlCmd.Parameters.Clear();
            sqlCmd.Parameters.Add("@int16", SqlDbType.Int).Value = i;
            sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = (char)i;
            sqlCmd.Parameters.Add("@nchar", SqlDbType.NChar).Value = (char)i;
            sqlCmd.ExecuteNonQuery();
        }
        string sqlChar;
        string sqlNChar;
        Int16 sqlCharASCII;
        Int16 sqlNCharUnicode;
        string sqlCharASCIIbackToString;
        sqlCmd.CommandText = "select char,nchar,ASCII(char),UNICODE(nchar) from charNchar order by int16;";
        rdr = sqlCmd.ExecuteReader();
        Int16 count63 = 0;
        Int16 countMis = 0;
        Int16 countCorrect = 0;
        while (rdr.Read())
        {
            sqlChar = rdr.IsDBNull(0) ? "dbNull" : rdr.GetString(0);
            sqlNChar = rdr.IsDBNull(1) ? "dbNull" : rdr.GetString(1);
            sqlCharASCII = rdr.IsDBNull(2) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(2);
            sqlNCharUnicode = rdr.IsDBNull(3) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(3);
            if(sqlCharASCII == 63 && sqlNCharUnicode != 63)
            {
                count63 ++;
                continue;  // ?
            }
            if (sqlCharASCII < 0)
            {
                System.Diagnostics.Debug.WriteLine("ASCII(char) null for " + sqlChar + " " + sqlNChar);
            }
            else
            {
                sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();
                if (string.CompareOrdinal(sqlChar, sqlCharASCIIbackToString) != 0)
                {
                    countMis++;
                    System.Diagnostics.Debug.WriteLine(" sqlCharASCIIbackToString did not match " + sqlCharASCIIbackToString + " " + sqlChar + " " + sqlNChar + " " + sqlCharASCII + " " + sqlNCharUnicode);
                }
                else
                {
                    countCorrect++;
                }
            }
        }
        rdr.Close();
        System.Diagnostics.Debug.WriteLine("count63 =  " + count63.ToString() + " countMis =  " + countMis.ToString() + " countCorrect =  " + countCorrect.ToString());
    }
    catch (Exception Ex)
    {
        System.Diagnostics.Debug.WriteLine(Ex.Message);
    }
    finally 
    {
        sqlCon.Close();
    }
}

至于为什么.
解析 .NET 中的字符串数据，该数据是 FK.
而不是往返 SQL 来获取 FK 的 ID，而是使用 .NET 字典来提高速度.
字典是反向查找以从值中获取键.
解析器具有已被解析器使用的字符的 Int16.
因此，如果字符的 ASCII 错误，则反向查找将失败.
我想我可以对不正确的 ASCII 结果进行硬编码修复.
但我想在走上一条从补丁开始的道路之前了解这里发生了什么.
char 是否存在一些根本性缺陷?
可以只使用 nchar，但我们更喜欢 char.
应用程序的本质是我们想要匹配.
u 的 6 个变音符号全部匹配 ascii u 是一件好事.

As for why.
Parse string data in .NET and that data is a FK.
Rather than round trip to SQL to get the ID for the FK use a .NET Dictionary for speed.
The Dictionary is a reverse lookup to to get the key from the value.
The parser has the Int16 of the char as that is already used by the parser.
So if the ASCII of the char is wrong then the reverse lookup fails.
I think I could hard code fixes for the incorrect ASCII results.
But I want to understand what is going on here before I go down a path that starts with patches.
Does char have some fundamental flaws?
Could just use nchar but we prefer char.
The nature of the application is we want matches.
6 diacritics of u to all match ascii u is a good thing.

Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char [英] Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

Char 返回 29 个 unicode 字符的错误值 - 需要 .NET 将 nchar 转换为 char [英] Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭