SHA256相同字符串的不同值 [英] SHA256 different values for same String

查看:238
本文介绍了SHA256相同字符串的不同值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在生成以下字符串的SHA256

I am generating the SHA256 of the following string

{
    "billerid": "MAHA00000MUM01",
    "authenticators": 
    [
        {
            "parameter_name": "CA Number",
            "value": "210000336768"
        }
    ],
    "customer": 
    {
        "firstname": "ABC",
        "lastname": "XYZ",
        "mobile": "9344895862",
        "mobile_alt": "9859585525",
        "email": "abc@billdesk.com",
        "email_alt": "abc2@billdesk.com",
        "pan": "BZABC1234L",
        "aadhaar": "123123123123"
    },
    "metadata": 
    {
        "agent": 
        {
            "agentid": "DC01DC31MOB528199558"
        },
        "device": 
        {
            "init_channel": "Mobile",
            "ip": "124.124.1.1",
            "imei": "490154203237518",
            "os": "Android",
            "app": "AGENTAPP"
        }
    },
    "risk":
    [
        {
          "score_provider": "DC31",
          "score_value": "030",
          "score_type": "TXNRISK"
        },
        {
          "score_provider": "BBPS",
          "score_value": "030",
          "score_type": "TXNRISK"
        }
    ]
}

我从不同来源获得不同的SHA256输出.该网站: https://www.freeformatter.com/sha256-generator.html#ad-output 计算上述字符串的SHA256:053353867b8171a8949065500d7313c69fe7517c9d69eaff11164c35fcb14457

I am getting different SHA256 output from different sources. This website: https://www.freeformatter.com/sha256-generator.html#ad-output calculates the SHA256 of the above string: 053353867b8171a8949065500d7313c69fe7517c9d69eaff11164c35fcb14457

此网站( https://emn178.github.io/online-tools/sha256.html )将SHA256设置为eae5c26759881d48a194a6b82a9d542485d6b6ce96297275c136b1fa6712f253

This website(https://emn178.github.io/online-tools/sha256.html) gives the SHA256 as eae5c26759881d48a194a6b82a9d542485d6b6ce96297275c136b1fa6712f253

我正在使用Javascript中的CryptoJs库计算SHA256,这也为eae5c26759881d48a194a6b82a9d542485d6b6ce96297275c136b1fa6712f253提供了此结果.

I am using CryptoJs library in Javascript to calculate SHA256 which also gives eae5c26759881d48a194a6b82a9d542485d6b6ce96297275c136b1fa6712f253 this result.

我希望计算的SHA256为:053353867b8171a8949065500d7313c69fe7517c9d69eaff11164c35fcb14457

I want the SHA256 calculated to be: 053353867b8171a8949065500d7313c69fe7517c9d69eaff11164c35fcb14457

为什么这些在不同位置的SHA256计算有区别?

Why these is difference in SHA256 calculation over different places?

推荐答案

您遇到的问题是由于 encoding 差异造成的.相同字符串的编码可能产生不同结果的原因有很多:

The problem that you are experiencing is due to encoding differences. There are several reasons why encoding of the same string may produce different results:

  • 不同的行尾(Windows为CR/LF,Linux为LF,经典MacOS为CR);
  • 空白的其他区别(制表符或空格,行尾的空白);
  • 不同的字符编码(语言实现中的Windows-1252,UTF-8和UTF-16或内部字符表示);
  • 元信息的存在(字节顺序标记的存在);
  • 处理编码中特殊字符的不同方式(字符后跟组合波浪号和带组合波浪号的字符,请参见 Unicode等价);
  • different line endings (CR/LF for Windows, LF for Linux, CR for classic MacOS);
  • other differences in whitespace (tab or spaces, whitespace in line endings);
  • different character encodings (Windows-1252, UTF-8 and UTF-16 or internal character representation within language implementations);
  • the presence of meta information (presence of a Byte Order Mark);
  • different ways of handling special characters within an encoding (a character followed by a combining tilde and the character with the combining tilde, see Unicode equivalence);

还有可能导致不同结果的不可见错误:

There are also possible invisible errors that may produce different results:

  • 存在无法打印的字符/控制代码(字符串末尾的空值 0x00 可能是最好的示例);
  • the presence of unprintable characters / control codes (a null value, 0x00, at the end of the string is probably the best example);

除了任何(结构化)文本可能存在的所有这些差异之外,JSON数据结构还可以具有等效的值.最好的例子可能是数字前的 + 字符.这完全是虚假的,但仍会导致不同的文本表示形式,但数字的值相同.

Besides all of these differences that may be present for any (structured) text, JSON data structures could also have equivalent values. Probably the best example is a leading + character before a number. This is entirely spurious but will still result in a different textual representation but an identical value for the number.

如果字符串的编码不同,则哈希算法的二进制输入也不同,对于普通的密码哈希,您得到的结果将相差约50%.产生相同输入的方法称为规范化(或C14N,因为规范化的C和N之间有14个字符).

If the encoding of the string differs then the binary input of the hash algorithm differs, and you will get results that differ by about 50% of the bits for a common cryptographic hash. The way to produce the same input is called canonicalization (or C14N, as there are 14 characters between the C and N of canonicalization).

对于XML,很早以前已经定义了规范形式.对于JSON而言并非如此,即使JSON的规范化要容易得多.毕竟,JSON具有较少复杂的规则集.有尝试规范化JSON,请参见例如此RFC草案明确提到了加密哈希:

For XML a canonical form has been defined long ago. For JSON this is not the case, even though canonicalization of JSON would be much easier. JSON has a much less convoluted set of rules after all. There are attempts to canonicalize JSON, see e.g. this draft RFC explicitly mentions cryptographic hashes:

例如,当加密散列应用于JSON时文档中,单个物理表示形式允许散列通过删除以下内容中的变化形式来表示文档的逻辑内容内容如何以JSON编码.

For example when a cryptographic hash is applied over a JSON document, a single physical representation allows the hash to represent the logical content of the document by removing variation in how that content is encoded in JSON.

此RFC草案看起来要多一些彻底,顺便说一句.

This draft RFC looks a bit more thorough, by the way.

现在您可以保留RFC草案之一.如果要保留换行符,则可以使用这些定义良好的规则来序列化JSON,并将其用作哈希函数的输入,同时保持JSON本身不变.这样,格式不同的 JSON仍会生成相同的哈希.

For now you could keep to one of the draft RFCs. If you want to keep the newlines then you could serialize the JSON using these well defined rules and use that as input to the hash function, while keeping the JSON itself untouched. That way differently formatted JSON would still generate the same hash.

[Input JSON] -> (parse) -> (canonicalize & serialize) -> (hash) -> [hash value]
[Input JSON'] -> (parse) -> (canonicalize & serialize) -> (hash) -> [hash value']

如果 Input JSON Input JSON'在结构/语义上相同,则哈希输出将是相同的,因为规范化将消除差异.

Here the hash output would be identical if the Input JSON and Input JSON' are structurally / semantically the same, as the canonicalization would smooth out the differences.

请注意,JSON Web签名(JWS)方面解决了此问题.签名毕竟在内部使用哈希.签名位于包含的有效负载上,并且仅使用该有效负载的编码.只要中间系统不重新编码JSON,就可以了.签名不必相同,它们只需要验证数据即可.

Note that JSON Web Signatures (JWS) side steps this issue. Signatures use a hash internally after all. The signature is over an included payload, and the encoding of that payload is simply used. This is fine as long as an intermediate system doesn't re-encode the JSON. Signatures do not have to be identical, they just need to verify the data.

不幸的是,哈希不是这种情况.但是,实际上,您可以将JSON定义为文件并使用相同的推理.缺点当然是如果得到差异,则必须执行二进制比较以找到差异,然后追溯引入变更的位置.在语义相同的情况下(例如,在替换或更新JSON库时),工作系统可能会破坏哈希.

Unfortunately, that's not the case for hashes. However, in practice, you could define the JSON as a file and use the same reasoning. The drawback is of course that if you get a difference you will have to perform a binary compare to find the differences and then trace back where the change was introduced. Working systems may break the hash while the semantics are still the same (e.g. when replacing or updating a JSON library).

这篇关于SHA256相同字符串的不同值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆