TSQL MD5哈希不同,以C#.NET MD5 [英] TSQL md5 hash different to C# .NET md5

查看:338
本文介绍了TSQL MD5哈希不同,以C#.NET MD5的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经生成一个MD5哈希值如下:

I've generated an md5 hash as below:

DECLARE @varchar varchar(400) 

SET @varchar = 'è'

SELECT CONVERT(VARCHAR(2000), HASHBYTES( 'MD5', @varchar ), 2)

它输出:

785D512BE4316D578E6650613B45E934

但是使用生成MD5哈希值:

However generating an MD5 hash using:

System.Text.Encoding.UTF8.GetBytes("è")

生成:

0a35e149dbbb2d10d744bf675c7744b1

在C#.NET方法将编码设置为UTF8,我曾经以为,VARCHAR也是UTF8,任何想法,我做错了吗?

The encoding in the C# .NET method is set to UTF8 and I had assumed that varchar was also UTF8, any ideas on what I'm doing wrong?

推荐答案

如果你正在处理 NVARCHAR / NCHAR 数据(存储为 UTF-16小字节序),那么你可以使用统一code 编码,而不是 BigEndianUni code 。在.NET中,UTF-16被称为统一code ,而其他的Uni code编码由它们的实际名称简称:UTF7,UTF8和UTF32。因此,统一code 本身是小端,而不是 BigEndianUni code 更新:请参阅有关UCS-2和增补字符的部分在最后

If you are dealing with NVARCHAR / NCHAR data (which is stored as UTF-16 Little Endian), then you would use the Unicode encoding, not BigEndianUnicode. In .NET, UTF-16 is called Unicode while other Unicode encodings are referred to by their actual names: UTF7, UTF8, and UTF32. Hence, Unicode by itself is Little Endian as opposed to BigEndianUnicode. UPDATE: Please see the section at the end regarding UCS-2 and Supplementary Characters.

在数据库方面:

SELECT HASHBYTES('MD5', N'è') AS [HashBytesNVARCHAR]
-- FAC02CD988801F0495D35611223782CF

在.NET方面:

On the .NET side:

System.Text.Encoding.ASCII.GetBytes("è")
// D1457B72C3FB323A2671125AEF3EAB5D

System.Text.Encoding.UTF7.GetBytes("è")
// F63A0999FE759C5054613DDE20346193

System.Text.Encoding.UTF8.GetBytes("è")
// 0A35E149DBBB2D10D744BF675C7744B1

System.Text.Encoding.UTF32.GetBytes("è")
// 86D29922AC56CF022B639187828137F8

System.Text.Encoding.BigEndianUnicode.GetBytes("è")
// 407256AC97E4C5AEBCA825DEB3D2E89C

System.Text.Encoding.Unicode.GetBytes("è")  // this one matches HASHBYTES('MD5', N'è')
// FAC02CD988801F0495D35611223782CF

不过,这个问题涉及到 VARCHAR / CHAR 数据,这是ASCII,等东西都有点更复杂的

However, this question pertains to VARCHAR / CHAR data, which is ASCII, and so things are a bit more complicated.

在数据库方面:

SELECT HASHBYTES('MD5', 'è') AS [HashBytesVARCHAR]
-- 785D512BE4316D578E6650613B45E934

我们已经看到上面的.NET的一面。从这些哈希值应该有两个问题:

We already see the .NET side above. From those hashed values there should be two questions:

  • 为什么不的任意的他们相匹配的 HASHBYTES 值?
  • 为什么会出现sqlteam.com文章中@Eric J.的回答显示链接其中三人( ASCII UTF7 UTF8 )所有符合 HASHBYTES 值?
  • Why don't any of them match the HASHBYTES value?
  • Why does the "sqlteam.com" article linked in @Eric J.'s answer show that three of them (ASCII, UTF7, and UTF8) all match the HASHBYTES value?

有一个答案包括两个问题:$ C C页面$。在测试中的sqlteam文章进行使用的是在0安全的ASCII字符 - 127范围(在整型/十进制值计)不code页之间变化。但是,128 - 255系列 - 在这里,我们发现了E字 - 是的扩展的设置,它由code页面有所不同。

There is one answer that covers both questions: Code Pages. The test done in the "sqlteam" article used "safe" ASCII characters that are in the 0 - 127 range (in terms of the int / decimal value) that do not vary between Code Pages. But the 128 - 255 range -- where we find the "è" character -- is the Extended set that does vary by Code Page.

现在试试:

SELECT HASHBYTES('MD5', 'è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [HashBytes]
-- D1457B72C3FB323A2671125AEF3EAB5D

匹配 ASCII 散列值(并再次,因为sqlteam文章/测试在0使用的值 - 127范围内,他们并没有看到任何改变时使用 COLLATE )。太好了,现在我们终于找到了一种方法来匹配 VARCHAR / CHAR 数据。所有的好?

That matches the ASCII hashed value (and again, because the "sqlteam" article / test used values in the 0 - 127 range, they did not see any changes when using COLLATE). Great, now we finally found a way to match VARCHAR / CHAR data. All good?

嗯,不是真的。让我们来看看,看一下我们实际上散列:

Well, not really. Let's take a look-see at what we were actually hashing:

SELECT 'è' AS [TheChar],
       ASCII('è') AS [TheASCIIvalue],
       'è' COLLATE SQL_Latin1_General_CP1255_CI_AS AS [CharCP1255],
       ASCII('è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [TheASCIIvalueCP1255];

返回:

TheChar TheASCIIvalue   CharCP1255  TheASCIIvalueCP1255
è       232             ?           63

A ?只是为了验证,运行:

A ? ? Just to verify, run:

SELECT CHAR(63) AS [WhatIs63?];
-- ?

啊,所以code页1255不具备è字符,所以它被转换为大家的喜爱。但后来为什么使用ASCII编码时匹配.NET中的MD5哈希值?难道说我们没有真正匹配è的散列值,而是进行了匹配?:

Ah, so Code Page 1255 doesn't have the è character, so it gets translated as everyone's favorite ?. But then why did that match the MD5 hashed value in .NET when using the ASCII encoding? Could it be that we weren't actually matching the hashed value of è, but instead were matching the hashed value of ?:

SELECT HASHBYTES('MD5', '?') AS [HashBytesVARCHAR]
-- 0xD1457B72C3FB323A2671125AEF3EAB5D

是啊。真正的 ASCII 字符集的只是的前128个字符(值0 - 127)。正如我们刚才所看到的,è为232。所以,使用 ASCII 在.NET中的编码是不是有帮助。也不是用 COLLATE 在T-SQL的一面。

Yup. The true ASCII character set is just the first 128 characters (values 0 - 127). And as we just saw, the è is 232. So, using the ASCII encoding in .NET is not that helpful. Nor was using COLLATE on the T-SQL side.

时有可能得到的.NET方更好的编码?是,通过 Encoding.GetEncoding(Int32)已。根据这个MSDN页面上 Windows排序规则名称,Latin1_General或法文:都使用code页1252。由于我们的默认排序规则是规则SQL_Latin1_General_CP1_CI_AS,让我们尝试code页1252。

Is it possible to get a better encoding on the .NET side? Yes, by using Encoding.GetEncoding(Int32). According to this MSDN page on Windows Collation Name, "Latin1_General or French: both use code page 1252". Since our default collation is "SQL_Latin1_General_CP1_CI_AS", let's try Code Page 1252.

System.Text.Encoding.GetEncoding(1252).GetBytes("è") // Matches HASHBYTES('MD5', 'è')
// 785D512BE4316D578E6650613B45E934

宇豪!我们有一个匹配,使我们的SQL Server默认的排序规则:) VARCHAR数据。当然,如果将数据从数据库或字段集来到了一个不同的排序规则,那么 GetEncoding(1252)将无法正常工作,你将不得不寻找实际的匹配$ C $的 code网页(名单ÇPages是在备注一开始一节)。

Woo hoo! We have a match for VARCHAR data that keeps our default SQL Server collation :). Of course, if the data is coming from a database or field set to a different collation, then GetEncoding(1252) won't work and you will have to find the actual matching Code Page (list of Code Pages is at the start of the "Remarks" section).

其他信息与实际存储在 NVARCHAR / NCHAR 字段:

Additional info related to what is actually stored in NVARCHAR / NCHAR fields:

不限 UTF-16 的字符(2或4个字节)可以被存储,虽然的内置功能的默认行为假设所有字符是UCS-2(各2个字节),它是UTF-16的子集。启动SQL Server 2012中,它可以访问一组支持被称为增补字符的4字节字符的Windows排序规则。使用这些Windows排序规则在 _SC 结束的一个,无论是一列或直接在查询中指定,将使内置的功能,妥善处理4个字节的字符。

Any UTF-16 character (2 or 4 bytes) can be stored, though the default behavior of the built-in functions assumes that all characters are UCS-2 (2 bytes each), which is a subset of UTF-16. Starting in SQL Server 2012, it is possible to access a set of Windows collations that support the 4 byte characters known as Supplementary Characters. Using one of these Windows collations ending in _SC, either specified for a column or directly in a query, will allow the built-in functions to properly handle the 4 byte characters.

这篇关于TSQL MD5哈希不同,以C#.NET MD5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆