人类可读订单代码的完美散列函数 [英] Perfect Hash Function for Human Readable Order Codes

查看：228 发布时间：2017/8/16 21:26:27 php encoding hash

本文介绍了人类可读订单代码的完美散列函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试生成从（let say）一个从1开始的无符号32位内部ID，并为每个新订单自动递增的非顺序人类可读订单代码。

在我下面的示例代码中，每个 $ hash 是唯一的吗？（我打算对 $ hash 进行base34编码，使其可读取。）

 <？php 
函数int_hash（$ key）{
 $ key =（$ key ^ 0x47cb8a8c）^（$ key<< 12）; 
 $ key =（$ key ^ 0x61a988bc）^（$ key>> 19）; 
 $ key =（$ key ^ 0x78d2a3c8）^（$ key<< 5）; 
 $ key =（$ key ^ 0x5972b1be）^（$ key<< 9）; 
 $ key =（$ key ^ 0x2ea72dfe）^（$ key<< 3）; 
 $ key =（$ key ^ 0x5ff1057d）^（$ key>> 16）; 
 return $ key; 
} 
 
（$ order_id = 1; $ order_id <= PHP_INT_MAX; ++ $ order_id）{
 $ hash = int_hash（$ order_id）; 
} 
？>

如果没有，有没有关于如何替换 int_hash ？

结果是，base34编码一个 md5（$ order_id）太长了我的喜好

解决方案

在下面的示例代码中，每个 $哈希是唯一的？

几乎（我猜，不，但在某种程度上很容易修复。）你的功能包括一系列独立的步骤;当且仅当这些步骤中的每一个步骤都是时，整体功能是双重的（可逆的）。（你知道为什么吗？）

现在，每一步都有以下一种形式：

  $ key =（$ key ^ CONSTANT）^（$ key>> NUM_BITS）; 
 $ key =（$ key ^ CONSTANT）^（$ key<< NUM_BITS）;

与 NUM_BITS！= 0 p>

我们可以将它们视为单一形式的变体，通过将前者视为几乎相当于此：

  $ key = invert_order_of_bits（$ key）; ＃清晰的双体体
 $ constant = invert_order_of_bits（CONSTANT）; 
 $ key =（$ key ^ $ constant）^（$ key<< NUM_BITS）; 
 $ key = invert_order_of_bits（$ key）; ＃清楚的双体

所以我们需要的是显示：

  $ key =（$ key ^ CONSTANT）^（$ key<< NUM_BITS）;

是双射的。现在，XOR是交换和关联的，所以上面相当于这个：

  $ key = $ key ^（$ key< ;< NUM_BITS）; 
 $ key = $ key ^ CONSTANT;

和（x ^ y）^ y == x ^（y ^ y）== x ^ 0 == x ，所以清楚XOR-ing与常量值是可逆的（通过重新使用相同的值）;所以我们要显示的是这是双射：

  $ key = $ key ^（$ key<< NUM_BITS ）;

每当 NUM_BITS！= 0 p>

现在，我没有写出严格的证明，所以我只是给出一个单个的例子，说明如何扭转这个。假设 $ key ^（$ key< <9）是

  0010 1010 1101 1110 0010 0101 0000 1100

我们如何获得 $键？那么，我们知道最后九位的 $ key<< 9 全部为零，所以我们知道 $ key ^（$ key< 9）的最后9位与$ code> $ key 的最后九位。所以 $ key 看起来像

  bbbb bbbb bbbb bbbb bbbb bbb1 0000 1100

so $ key<< 9 看起来像

  bbbb bbbb bbbb bb10 0001 1000 0000 0000

so $ key 看起来像

  bbbb bbbb bbbb bb00 0011 1101 0000 1100

（通过XOR-ing $ key ^（$ key< <9）与 $ key<< 9 ），所以 $ key<< 9 看起来像

  bbbb b000 0111 1010 0001 1000 0000 0000

so $ key 看起来像

  bbbb b010 1010 0100 0011 1101 0000 1100

所以 $ key<< 9 看起来像

  0101 1000 0111 1010 0001 1000 0000 0000

so $ key 看起来像

  0111 0010 1010 0100 0011 1101 0000 1100

所以。。为什么我说几乎而不是是？为什么你的哈希函数不是完美的双目标？这是因为在PHP中，逐位移动运算符>> 和< 对称，而 $ key = $ key ^（$ key<< NUM_BITS）是完全可逆的， $ key = $键^（$ key>> NUM_BITS）不是。（以上，当我写道两种类型的步骤是几乎相同时，我真的意味着几乎，它有所作为！）你看，而< 像任何其他位一样对待符号位，并将其转移出来（在右侧引入零位），>> 特别对待符号位，并扩展：它在左边带来的位等于符号位。（NB你的问题提到未签名的32位值，但PHP实际上并不支持;它的按位操作总是在签名整数。）

由于此符号扩展名，如果 $ key 以 0 开头，则 $键>> NUM_BITS 以 0 开头，如果 $ key 以$ $ c $开头c> 1 ，然后 $ key>> NUM_BITS 也以 1 开头。在这两种情况下， $ key ^（$ key>> NUM_BITS）将以 0 开头。你已经失去了一点熵。如果你给我 $ key ^（$ key>> 9），不要告诉我是否 $ key 是否定的，那么我可以做的最好的是计算两个可能的值：$ code> $ key ：一个负数，一个正数或零。

您执行两个步骤，使用右移而不是左移，因此您丢失了两位熵。（我手里挥了挥手，我实际证明的是，你至少输了一个两位，但我有信心由于这些右移步骤之间的步骤的性质，您实际上会丢失两个完整位。）对于任何给定的输出值，只有四个不同的输入值可以产生。所以它不是独一无二的，但它几乎是独一无二的;并且可以通过以下方式轻松修复：

更改两个右移步骤以改用左移;或者

在任何左移步骤之前将两个右移步骤移动到函数的开始，并且说输出对于0和2之间的输入是唯一的− 1而不是0和2之间的输入³²− 1。

I am trying to generate non-sequential human readable order codes derived from (lets say) a unsigned 32bit internal id that starts at 1 and is auto incremented for each new order.

In my example code below, will every $hash be unique? (I plan to base34 encode the $hash to make it human readable.)

<?php
function int_hash($key) {
  $key = ($key^0x47cb8a8c) ^ ($key<<12);
  $key = ($key^0x61a988bc) ^ ($key>>19);
  $key = ($key^0x78d2a3c8) ^ ($key<<5);
  $key = ($key^0x5972b1be) ^ ($key<<9);
  $key = ($key^0x2ea72dfe) ^ ($key<<3);
  $key = ($key^0x5ff1057d) ^ ($key>>16);
  return $key;
}

for($order_id = 1; $order_id <= PHP_INT_MAX; ++$order_id) {
  $hash = int_hash($order_id);
}
?>

If not, are there any suggestions on how to replace int_hash?

The result of say, base34 encoding a md5($order_id) is too long for my liking.

解决方案

In my example code below, will every $hash be unique?

Almost. (Which, I guess, means "no, but in a way that's easily fixed".) Your function consists of a sequence of independent steps; the overall function is bijective (reversible) if and only if every single one of those steps is. (Do you see why?)

Now, each step has one of the following forms:

  $key = ($key ^ CONSTANT) ^ ($key >> NUM_BITS);
  $key = ($key ^ CONSTANT) ^ ($key << NUM_BITS);

with NUM_BITS != 0.

We can actually treat these as variants of a single form, by viewing the former as almost equivalent to this:

  $key = invert_order_of_bits($key); # clearly bijective
  $constant = invert_order_of_bits(CONSTANT);
  $key = ($key ^ $constant) ^ ($key << NUM_BITS);
  $key = invert_order_of_bits($key); # clearly bijective

So all we need is to show that this:

  $key = ($key ^ CONSTANT) ^ ($key << NUM_BITS);

is bijective. Now, XOR is commutative and associative, so the above is equivalent to this:

  $key = $key ^ ($key << NUM_BITS);
  $key = $key ^ CONSTANT;

and (x ^ y) ^ y == x ^ (y ^ y) == x ^ 0 == x, so clearly XOR-ing with a constant value is reversible (by re-XOR-ing with the same value); so all we have to show is that this is bijective:

  $key = $key ^ ($key << NUM_BITS);

whenever NUM_BITS != 0.

Now, I'm not writing a rigorous proof, so I'll just give a single reasoned-out example of how to reverse this. Suppose that $key ^ ($key << 9) is

0010 1010 1101 1110 0010 0101 0000 1100

How do we obtain $key? Well, we know that the last nine bits of $key << 9 are all zeroes, so we know that the last nine bits of $key ^ ($key << 9) are the same as the last nine bits of $key. So $key looks like

bbbb bbbb bbbb bbbb bbbb bbb1 0000 1100

so $key << 9 looks like

bbbb bbbb bbbb bb10 0001 1000 0000 0000

so $key looks like

bbbb bbbb bbbb bb00 0011 1101 0000 1100

(by XOR-ing $key ^ ($key << 9) with $key << 9), so $key << 9 looks like

bbbb b000 0111 1010 0001 1000 0000 0000

so $key looks like

bbbb b010 1010 0100 0011 1101 0000 1100

so $key << 9 looks like

0101 1000 0111 1010 0001 1000 0000 0000

so $key looks like

0111 0010 1010 0100 0011 1101 0000 1100

So . . . why do I say "almost" rather than "yes"? Why is your hash-function not perfectly bijective? It's because in PHP, the bitwise shift operators >> and << are not quite symmetric, and while $key = $key ^ ($key << NUM_BITS) is completely reversible, $key = $key ^ ($key >> NUM_BITS) is not. (Above, when I wrote that the two types of steps were "almost equivalent", I really meant that "almost". It makes a difference!) You see, whereas << treats the sign bit just like any other bit, and shifts it out of existence (bringing in a zero-bit on the right), >> treats the sign bit specially, and "extends" it: the bit that it brings in on the left is equal to the sign bit. (N.B. Your question mentions "unsigned 32bit" values, but PHP doesn't actually support that; its bitwise operations are always on signed integers.)

Due to this sign extension, if $key starts with a 0, then $key >> NUM_BITS starts with a 0, and if $key starts with a 1, then $key >> NUM_BITS also starts with a 1. In either case, $key ^ ($key >> NUM_BITS) will start with a 0. You've lost exactly one bit of entropy. If you give me $key ^ ($key >> 9), and don't tell me whether $key is negative, then the best I can do is compute two possible values for $key: one negative, one positive-or-zero.

You perform two steps that use right-shift instead of left-shift, so you lose two bits of entropy. (I'm hand-waving slightly — all I've actually demonstrated is that you lose at least one bit and at most two bits — but I'm confident that, due to the nature of the steps between those right-shift steps, you do actually lose two full bits.) For any given output value, there are exactly four distinct input-values that could yield it. So it's not unique, but it's almost unique; and it's easily fixed, by either:

changing the two right-shift steps to use left-shifts instead; or
moving both of the right-shift steps to the start of the function, before any left-shift steps, and saying that outputs are unique for inputs between 0 and 2³¹−1 rather than inputs between 0 and 2³²−1.

这篇关于人类可读订单代码的完美散列函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

人类可读订单代码的完美散列函数 [英] Perfect Hash Function for Human Readable Order Codes

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

人类可读订单代码的完美散列函数 [英] Perfect Hash Function for Human Readable Order Codes

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭