如何从SQL Server 2000 TEXT列中的C#字符串存储UTF-8字节 [英] How to store UTF-8 bytes from a C# String in a SQL Server 2000 TEXT column

查看:38
本文介绍了如何从SQL Server 2000 TEXT列中的C#字符串存储UTF-8字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个现有的SQL Server 2000数据库,该数据库在TEXT列中存储文本的UTF-8表示形式.我没有修改列类型的选项,并且必须能够将来自C#程序的非ASCII Unicode数据存储到该列中.

I have an existing SQL Server 2000 database that stores UTF-8 representations of text in a TEXT column. I don't have the option of modifying the type of the column, and must be able to store non-ASCII Unicode data from a C# program into that column.

代码如下:

sqlcmd.CommandText =
    "INSERT INTO Notes " +
    "(UserID, LocationID, Note) " +
    "VALUES (" +
        Note.UserId.ToString() + ", " +
        Note.LocationID.ToString() + ", " +
        "@note); " +
    "SELECT CAST(SCOPE_IDENTITY() AS BIGINT) ";

SqlParameter noteparam = new SqlParameter( "@note", System.Data.SqlDbType.Text, int.MaxValue );

在这一点上,我尝试了几种不同的方法来将我的UTF-8数据放入参数中.例如:

At this point I've tried a few different ways to get my UTF-8 data into the parameter. For example:

// METHOD ONE
byte[] bytes = (byte[]) Encoding.UTF8.GetBytes( Note.Note );
char[] characters = bytes.Select( b => (char) b ).ToArray();
noteparam.Value = new String( characters );

我也尝试过

// METHOD TWO
noteparam.Value = Note.Note;

还有

// METHOD THREE
byte[] bytes = (byte[]) Encoding.UTF8.GetBytes( Note.Note );
noteparam.Value = bytes;

继续,这是其余的代码:

Continuing, here's the rest of the code:

sqlcmd.Parameters.Add( noteparam );
sqlcmd.Prepare();

try
    {
    Note.RecordId = (Int64) sqlcmd.ExecuteScalar();
    }
catch
    {
    return false;
    }

方法一(将UTF8字节转换为字符串)做了一些奇怪的事情-我认为这是第二次对字符串进行UTF-8编码.

Method one (get UTF8 bytes into a string) does something strange -- I think it is UTF-8 encoding the string a second time.

方法二存储垃圾.

方法三在ExecuteScalar()中引发异常,声称它无法将参数转换为字符串.

Method three throws an exception in ExecuteScalar() claiming it can't convert the parameter to a String.

我已经知道的事情,所以无需告诉我:

Things I already know, so no need telling me:

  • SQL Server 2000已经/即将终止寿命
  • TEXT列不适用于Unicode文本
  • 严重的是,SQL Server 2000很旧.您需要升级.

有什么建议吗?

推荐答案

如果您的数据库排序规则是SQL_Latin1_General_CP1(SQL Server 2000美国版的默认设置),则可以使用以下技巧将Unicode文本存储为UTF-8在 char varchar text 列中:

If your database collation is SQL_Latin1_General_CP1 (the default for the U.S. edition of SQL Server 2000), then you can use the following trick to store Unicode text as UTF-8 in a char, varchar, or text column:

byte[] bytes = Encoding.UTF8.GetBytes(Note.Note);
noteparam.Value = Encoding.GetEncoding(1252).GetString(bytes);

稍后,当您想读回文本时,请逆向操作:

Later, when you want to read back the text, reverse the process:

SqlDataReader reader;
// ...
byte[] bytes = Encoding.GetEncoding(1252).GetBytes((string)reader["Note"]);
string note = Encoding.UTF8.GetString(bytes);

如果您的数据库排序规则是 not SQL_Latin1_General_CP1,则您需要将1252替换为正确的

If your database collation is not SQL_Latin1_General_CP1, then you will need to replace 1252 with the correct code page.

注意:如果您在企业管理器或查询分析器中查看存储的文本,则会看到奇怪的字符代替非ASCII文本,就像打开了UTF-8文档一样在不支持Unicode的文本编辑器中.

Note: If you look at the stored text in Enterprise Manager or Query Analyzer, you'll see strange characters in place of non-ASCII text, just as if you opened a UTF-8 document in a text editor that didn't support Unicode.

工作原理::在非Unicode列中存储Unicode文本时,SQL Server会自动将文本从Unicode转换为数据库排序规则指定的代码页.目标代码页中不存在的所有Unicode字符都将被不可逆地修改,这就是为什么前两种方法不起作用的原因.

How it works: When storing Unicode text in a non-Unicode column, SQL Server automatically converts the text from Unicode to the code page specified by the database collation. Any Unicode characters that don't exist in the target code page will be irreversibly mangled, which is why your first two methods didn't work.

但是使用方法一,您走在正确的轨道上.缺少的步骤是使用Windows-1252代码页将原始UTF-8字节转换为Unicode,从而保护"原始UTF-8字节.现在,当SQL Server执行从Unicode到Windows-1252的自动转换时,它将恢复原始的UTF-8字节.

But you were on the right track with method one. The missing step is to "protect" the raw UTF-8 bytes by converting them to Unicode using the Windows-1252 code page. Now, when SQL Server performs the automatic conversion from Unicode to Windows-1252, it gets back the original UTF-8 bytes untouched.

这篇关于如何从SQL Server 2000 TEXT列中的C#字符串存储UTF-8字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆