完美的网址哈希函数 [英] Perfect Hash Function for URLs
本文介绍了完美的网址哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
>从Base52 url shortener perfect hash function in C > http://lambdajones.com/b52
const char * b52idx [52] = {
0,1,2,3,4,5,6,7,8,9,
B, C,D,F,G,H,J,K,L,M,
,N,P,Q ,R,S,T,V,W,X,Y,
Z,b,c,d f,g,h,j,k,l,
,m,n,p,q,r,s ,t,v,w,x,
y,z
};
#define X 0xff
const int b52map [128] = {
X,X,X,X,X,X,X,X,X,X,X, X,X,X,X,X,X,X,X,X,X,
X,X,X,X,X,X,X, X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,
// 0 1 2 3 4 5 6 7 8 9
0,1,2,3,4,5,6,7,8,9,X,X,X,X,X,X,
// BCDFGHJKLMN
X,X,10 ,11,12,X,13,14,15,X,16,17,18,19,20,X,
// PQRSTVWXYZ
21,22,23,24,25,X, 26,27,28,29,30,X,X,X,X,X,
// bcdfghjklmn
X,X,31,32,33,X,34,35,36,X ,37,38,39,40,41,X,
// pqrstvwxyz
42,43,44,45,46,X,47,48,49,50,51,X,X, X,X,X
};
$ b $##ifdef __GNUC__
#define likely(x)__builtin_expect((x),1)
#else
#define likely(x)(x)
#endif
/ *
从00000开始有效 - > zzzzz,适用于380204032网址
返回积分短网址id
* /
unsigned long long b52(const char * c){
unsigned long long x = 0;
unsigned long long y = 0;
unsigned long long z = 0;
x | = b52map [c [0]]<< 24 | b52map [c [1]]<< 18 | \
b52map [c [2]]<< 12 | b52map [c [3]]<< 6 | b52map [C [4]];
y + =(x / 64)* 12;
if(x> 4095)y + = 624 *(x / 4096);
if(x> 262143)y + = 32448 *(x / 262144);
if(x> 16777215)y + = 1687296 *(x / 16777215); (可能((z = x - y)< 380204033))返回z;
if
else return 380204033;
b52inc(char * id){
int x [5] = {
b52map [id [0]],b52map [id [1] ],b52map [id [2]],b52map [id [3]],b52map [id [4]]
};
int y = 5;
//搜索我们可以增加的第一个字符(51 =='z')
//从b52idx表中增加并更新id
while(y--)如果(x [y] <51)中断;
id [y] = * b52idx [++ x [y]];
// if if(x [y] == 51)id [y] ='0(如果我们通过id的上面的z,将它们滚过
,而(y ++< 5) ;
}
Does anyone know of a perfect hashing function for URLs with 64-bit integers that would perform well for most URLs?
解决方案
Found this marked as a "Base52 url shortener perfect hash function in C"
from http://lambdajones.com/b52
const char *b52idx[52] = {
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"B", "C", "D", "F", "G", "H", "J", "K", "L", "M",
"N", "P", "Q", "R", "S", "T", "V", "W", "X", "Y",
"Z", "b", "c", "d", "f", "g", "h", "j", "k", "l",
"m", "n", "p", "q", "r", "s", "t", "v", "w", "x",
"y", "z"
};
#define X 0xff
const int b52map[128] = {
X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,
X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,
X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,
// 0 1 2 3 4 5 6 7 8 9
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X, X, X, X, X, X,
// B C D F G H J K L M N
X, X,10,11,12, X,13,14,15, X,16,17,18,19,20, X,
// P Q R S T V W X Y Z
21,22,23,24,25, X,26,27,28,29,30, X, X, X, X, X,
// b c d f g h j k l m n
X, X,31,32,33, X,34,35,36, X,37,38,39,40,41, X,
// p q r s t v w x y z
42,43,44,45,46, X,47,48,49,50,51, X, X, X, X, X
};
#ifdef __GNUC__
#define likely(x) __builtin_expect((x),1)
#else
#define likely(x) (x)
#endif
/*
valid from 00000 -> zzzzz, good for 380204032 urls
returns the integral short url id
*/
unsigned long long b52(const char *c) {
unsigned long long x = 0;
unsigned long long y = 0;
unsigned long long z = 0;
x |= b52map[c[0]] << 24 | b52map[c[1]] << 18 | \
b52map[c[2]] << 12 | b52map[c[3]] << 6 | b52map[c[4]];
y += (x/64) * 12;
if (x > 4095) y += 624 * (x/4096);
if (x > 262143) y += 32448 * (x/262144);
if (x > 16777215) y += 1687296 * (x/16777215);
if (likely((z = x - y) < 380204033)) return z;
else return 380204033;
}
void b52inc(char *id) {
int x[5] = {
b52map[id[0]], b52map[id[1]], b52map[id[2]],b52map[id[3]], b52map[id[4]]
};
int y = 5;
// search for the first character we can increment (51 == 'z')
// increment from the b52idx table and update id
while (y--) if (x[y] < 51) break;
id[y] = *b52idx[++x[y]];
// if we passed over id's 'z's above, roll them over
while (y++ < 5) if (x[y] == 51) id[y] = '0';
}
这篇关于完美的网址哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文