postgres如何处理位数据类型? [英] how does postgres handle the bit data type?

查看:103
本文介绍了postgres如何处理位数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,该表的列 vector 的类型为 bit(2000)。数据库引擎如何处理此值上的 AND OR 运算?它是否简单地分为32位块(或分别为64位),然后分别比较每个块,最后将结果简单地组合在一起?还是仅仅作为两个字符串处理?

i have a table with a column vector of type bit(2000). how does the db engine handle operations AND and OR over this values? does it simply divide into 32bit chunks (or 64, respectively) and then compares each chunk separately and in the end simply concats the results together? or does it handle simply as two strings?

我的意思是要预测,哪种用例会更快。我得到了键值数据(用户项)。

my point is to predict, which use case would be faster. i got a key-value data (user-item).

userID | itemID
U1     | I1
U1     | Ix
Un     | Ij

我想为每个用户计算n个最近邻居的列表(使用 jaccard索引

for each user i want to calculate a list of n nearest neighbors (using jaccard index, for example).

select my_jaccard(select itemID from table where userID=U1,select itemID from table where userID=U2)

我的解决方案-我将输入数据解析到用户向量表中,其中向量的类型为bit(2000),在表示特定项目的位置上带有1。

my solution - i parsed the input data into a table of user-vector, where the vector is of type bit(2000) with 1's on the position representing the particular item.

userID | vector
U1     | 00.......01
U1     | 0..1.....00
Un     | 00..1..1..0

在这张桌子上我只是做

select vector1&vector2



<关键是每个用户最多只有10条记录的所有项目,即向量最多有10个活动位。我认为,解析整个位向量以找到活动位需要更多的计算资源,而不是简单地将user1的10个值与user2的10个值相互比较。

the point is that each user has at most only 10 records for all items, i.e. the vector has maximum of 10 active bits. i think, that parsing the whole bitvector just to find the active bits needs more computational resources, than simply comparing those 10 values of user1 with 10 values of user2 each with another.

使用具有很少位设置为1的长位向量会更快吗?还是将原始值作为一组使用并将两个组进行比较比较好? (一组最多包含10个项目)

is it faster to use long bit-vectors which have very few bits set to 1 or is it better to use to original values as a sets and compare two sets together? (a set has maximum of 10 items)

我同时使用psql v8.2和v9.x

i use both psql v8.2 and v9.x

推荐答案

对位类型的位操作在内部作为er操作处理。例如,这是和代码的作用:

Bit operations on the bit types are handled internally as, er, bit operations. Here is what the "and" code does, for example:

p1 = VARBITS(arg1);
p2 = VARBITS(arg2);
r = VARBITS(result);
for (i = 0; i < VARBITBYTES(arg1); i++)
    *r++ = *p1++ & *p2++;

(因此实际上是8位块。)

(So it's actually 8-bit chunks.)

所以我认为这应该很快。

So I think this should be pretty fast.

这篇关于postgres如何处理位数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆