将UTF8数据插入到SQL Server 2008中 [英] Insert UTF8 data into a SQL Server 2008

查看：132 发布时间：2017/8/17 0:33:32 c# encoding

本文介绍了将UTF8数据插入到SQL Server 2008中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有编码问题。我想将数据从UTF-8编码的文件放入SQL Server 2008数据库。 SQL Server仅使用UCS-2编码，因此我决定明确转换检索的数据。

  //连接到页面文件
 _fsPage = new FileStream（mySettings.filePage，FileMode.Open，FileAccess.Read）; 
 _streamPage = new StreamReader（_fsPage，System.Text.Encoding.UTF8）;

这是数据的转换例程：

 私有字符串ConvertTitle（string title）
 {
 string utf8_String = Regex.Replace（Regex.Replace（title，@\\。，_myEvaluator ），@（？< = [^ \\]）_，）; 
 byte [] utf8_bytes = System.Text.Encoding.UTF8.GetBytes（utf8_String）; 
 byte [] ucs2_bytes = System.Text.Encoding.Convert（System.Text.Encoding.UTF8，System.Text.Encoding.Unicode，utf8_bytes）; 
 string ucs2_String = System.Text.Encoding.Unicode.GetString（ucs2_bytes）; 
 
 return ucs2_String; 
}

当查看关键标题的代码时，变量手表会显示正确的字符utf-8和ucs-2串。但在数据库中 - 部分错误。

错误：ń成为n

右边：É或é例如正确插入。

任何想法可能是什么，如何解决？

提前，
Frank

解决方案

我想你有对什么编码的误解。编码用于将一串字节转换为字符串。字符串本身本身不具有与之相关联的编码。

在内部，字符串以UTF-16LE字节存储在内存中（这就是为什么Windows通过调用UTF-16LE编码只是Unicode）。但是你不需要知道 - 对你来说，他们只是字符串。

你的功能是什么：

获取字符串并将其转换为UTF-8字节。

将这些UTF-8字节转换为UTF-16LE字节。（您可以在步骤1中直接编码为UTF-16LE而不是UTF-8）。

将这些UTF-16LE字节转换为字符串。这给了你完全相同的字符串，你拥有的首先！

所以这个功能是多余的;您实际上可以从.NET传递一个正常的String到SQL Server，而不用担心。

带反斜杠的位确实做了一些事情，大概是应用程序特定的我不了解它是什么。但是，该功能中的任何内容都不会导致Windows将ńㄧten ten ten ten ten ten ten。。>>>>>>>>>>>>>>>''''''''''''''''在数据库自己的数据库中编码。大概é是好的，因为这个字符是你的默认编码cp1252西欧，但ń不是那么被弄脏。

SQL Server确实使用'UCS2' UTF-16LE再次）来存储Unicode字符串，但是您已经通知使用NATIONAL CHARACTER（NCHAR / NVARCHAR）列类型而不是纯CHAR。

I have an issue with encoding. I want to put data from a UTF-8-encoded file into a SQL Server 2008 database. SQL Server only features UCS-2 encoding, so I decided to explicitly convert the retrieved data.

// connect to page file
_fsPage = new FileStream(mySettings.filePage, FileMode.Open, FileAccess.Read);
_streamPage = new StreamReader(_fsPage, System.Text.Encoding.UTF8);

Here's the conversion routine for the data:

private string ConvertTitle(string title)
{
  string utf8_String = Regex.Replace(Regex.Replace(title, @"\\.", _myEvaluator), @"(?<=[^\\])_", " ");
  byte[] utf8_bytes = System.Text.Encoding.UTF8.GetBytes(utf8_String);
  byte[] ucs2_bytes = System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.Unicode, utf8_bytes);
  string ucs2_String = System.Text.Encoding.Unicode.GetString(ucs2_bytes);

  return ucs2_String;
}

When stepping through the code for critical titles, variable watch shows the correct characters for both utf-8 and ucs-2 string. But in the database its - partially wrong. Some special chars are saved correctly, others not.

Wrong: ń becomes an n
Right: É or é are for example inserted correctly.

Any idea where the problem might be and how to solve it?

Thans in advance, Frank

解决方案

I think you have a misunderstanding of what encodings are. An encoding is used to convert a bunch of bytes into a character string. A String does not itself have an encoding associated with it.

Internally, Strings are stored in memory as UTF-16LE bytes (which is why Windows persists in confusing everyone by calling the UTF-16LE encoding just "Unicode"). But you don't need to know that — to you, they're just strings of characters.

What your function does is:

Takes a string and converts it to UTF-8 bytes.
Takes those UTF-8 bytes and converts them to UTF-16LE bytes. (You could have just encoded straight to UTF-16LE instead of UTF-8 in step one.)
Takes those UTF-16LE bytes and converts them back to a string. This gives you the exact same String you had in the first place!

So this function is redundant; you can actually just pass a normal String to SQL Server from .NET and not worry about it.

The bit with the backslashes does do something, presumably application-specific I don't understand what it's for. But nothing in that function will cause Windows to flatten characters like ń to n.

What /will/ cause that kind of flattening is when you try to put characters that aren't in the database's own encoding in the database. Presumably é is OK because that character is in your default encoding of cp1252 Western European, but ń is not so it gets mangled.

SQL Server does use ‘UCS2’ (really UTF-16LE again) to store Unicode strings, but you have tell it to, typically by using a NATIONAL CHARACTER (NCHAR/NVARCHAR) column type instead of plain CHAR.

这篇关于将UTF8数据插入到SQL Server 2008中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将UTF8数据插入到SQL Server 2008中 [英] Insert UTF8 data into a SQL Server 2008

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

将UTF8数据插入到SQL Server 2008中 [英] Insert UTF8 data into a SQL Server 2008

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭