搜索数据库中许多对象的存在 [英] Search for the existance of many objects in a database

查看:101
本文介绍了搜索数据库中许多对象的存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个拥有500万用户的数据库,列有



id(unsigned int,auto-increment),facebook_id(unsigned int)和name(varchar)



在程序中,我列出了一个人的Facebook朋友列表中可变的用户数量(通常范围从500-1200不同的脸书ids)。



最有效的方法是发送查询到我的数据库,返回所有用户的数据库中存在同样的facebook_id?



伪代码:



$ friends =阵列(12345,22345,32345,42345,52345,...〜1000多);
$ q = mysql_query(SELECT * FROM users ...);
$ friendsAlreadyUsingApp = parseQuery($ q);

解决方案

几乎无数的文章,博客,Q& As等;而且这个问题的本质就是它看上去很简单 - 但不是这样。



问题的核心在于参数看起来像它应该使用 WHERE字段IN()但它不会这样做,因为参数是一个单一的字符串,刚刚有很多



因此,当该参数传递给SQL时,需要将该单个字符串处理为多个部分,以便将该字段与每个部分进行比较。这是一个复杂的地方,因为并不是所有的数据库类型都具有相同的功能来处理这个问题。例如,MySQL没有MS SQL Server提供的表变量。



所以。对于MySQL来说,一个简单的方法是这样的:

  SET @param:='105,110,125,135,145,155,165,175,185,195,205'; 

SELECT
*
FROM用户
WHERE FIND_IN_SET(facebook_id,@param)> 0
;




FIND_IN_SET 返回第二个参数中的第一个参数
的索引位置


只要你的数据库中的缩放程度如何,我不能说,这可能不适用于包含1000+ id的参数。



所以如果像 FIND_IN_SET 这样的文本处理太慢,那么每个id都需要从参数中分解出来并插入到表中。这样可以通过 INNER JOIN 来使用结果表过滤用户;但是这需要一个需要时间的表和插入,如果多个用户尝试同时使用该表,则可能会出现并发问题。






使用以下设置10,000个整数(1到10,000)的表格

  *创建一个名为Numbers * / 
的表CREATE TABLE`Numbers`

`Number` int PRIMARY KEY
);

/ *使用交叉连接从1&存入表* /
INSERT INTO数字(数)
选择1 +(aa +(10 * ba)+(100 * ca)+(1000 * da))作为N
从(选择0作为联合全部选择1联合全部选择2联合全部选择3联合全部选择4联合全部选择5联合全部选择6联合全部选择7联合全部选择8联合全部选择9)作为
交叉连接(选择0作为联合全部选择1联合全部选择2联合全部选择3联合全部选择4联合全部选择5联合全部选择6联合全部选择7联合全部选择8联合全部选择9)作为b
交叉连接(选择0作为联合全部选择1联合全部选择2联合全部选择3联合全部选择4联合全部选择5联合全部选择6联合所有选择7联合全部选择8联合全部选择9)作为c
交叉连接(选择0作为联合全部选择1联合所有选择2联合全部选择3联合全部选择4联合所有选择5联合全部选择6联合全部选择7联合全部选择8联合全部选择9)作为d
;

然后可以使用实用程序表将逗号分隔的参数分成单个整数,然后在用户表中的 INNER JOIN 中使用,将提供想要的结果。

  SET @param:='105,110,125,135,145,155,165,175,185,195,205'; 
SET @delimit:=',';

SELECT
users.id
,users.facebook_id
,users.name
FROM users
INNER JOIN(
SELECT
CAST(SUBSTRING(iq.param,n.number + 1,LOCATE(@delimit,iq.param,n.number + 1) - n.number - 1)AS UNSIGNED INTEGER)AS itemID
FROM(
SELECT
concat(@delimit,@param,@delimit)AS param
)AS iq
INNER JOIN数字n
ON n.Number< LENGTH (iq.param)
WHERE SUBSTRING(iq.param,n.number,1)= @delimit
)AS派生
ON users.facebook_id = derived.itemID
;

此查询可用作存储过程的基础,这可能更容易您从PHP。



看到这个SQLFiddle演示


Say I have a database with 5 million users, with the columns

id (unsigned int, auto-increment), facebook_id (unsigned int), and name (varchar)

In a program, I have a list of a variable amount of users from a person's facebook friend list (generally ranging from 500-1200 different facebook ids).

What's the most efficient way to send a query to my database that returns the facebook_id's of all of the users where that same facebook_id exists in the database?

Pseudo-code:

$friends = array(12345, 22345, 32345, 42345, 52345, ... ~1000 more); $q = mysql_query("SELECT * FROM users ..."); $friendsAlreadyUsingApp = parseQuery($q);

解决方案

This is a topic of almost an endless number of articles, blogs, Q&As etc; and the essence of this problem is that it looks really simple - but isn't.

The heart of the problem is that the parameters looks like it should work using WHERE field IN() BUT it does not do that because the parameter is a single string that just happens to have lots of commas in it.

So, when that parameter is passed to SQL it is necessary to process that single string into multiple parts so that the field can be compared to each part. This is where it gets a little complex as not all database types have all the same features to handle this. MySQL for example does not have a table variable that MS SQL Server provides.

So. A simple method, for MySQL is this:

SET @param := '105,110,125,135,145,155,165,175,185,195,205';

SELECT
*
FROM Users
WHERE FIND_IN_SET(facebook_id, @param) > 0
;

FIND_IN_SET Return the index position of the first argument within the second argument

Just how well this scales in your database I cannot tell, it might not be acceptable for parameters containing 1000+ id's.

So if text processing like FIND_IN_SET is too slow, then each id needs to be broken out from the parameter and inserted into a table. That way the resulting table can be used through an INNER JOIN to filter the users; but this requires a table and inserts which take time, and there may be concurrency issues if more than one user is attempting to use that table at the same time.


Using the following sets-up a table of 10,000 integers (1 to 10,000)

/* Create a table called Numbers */
CREATE TABLE `Numbers`
(
    `Number` int PRIMARY KEY
);

/* use cross joins to create 10,000 integers from 1 & store into table */
INSERT INTO Numbers (Number)
select 1 + (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a))  as N
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
;

This "utility table" can then be used to divide a comma separated parameter into a derived table of the individual integers, and this then used in an INNER JOIN to your users table will provide the wanted result.

SET @param := '105,110,125,135,145,155,165,175,185,195,205';
SET @delimit := ',';

SELECT
      users.id
    , users.facebook_id
    , users.name
FROM users
      INNER JOIN (
                  SELECT
                        CAST(SUBSTRING(iq.param, n.number + 1, LOCATE(@delimit, iq.param, n.number + 1) - n.number - 1) AS UNSIGNED INTEGER) AS itemID
                  FROM (
                              SELECT
                                    concat(@delimit, @param, @delimit) AS param
                        ) AS iq
                        INNER JOIN Numbers n
                                    ON n.Number < LENGTH(iq.param)
                  WHERE SUBSTRING(iq.param, n.number, 1) = @delimit
            ) AS derived
                  ON users.facebook_id = derived.itemID
;

This query can be used as the basis for a stored procedure which might be easier for you to call from PHP.

See this SQLFiddle demo

这篇关于搜索数据库中许多对象的存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆