Google BigQuery查询速度很慢 [英] Google BigQuery queries are slow

查看:236
本文介绍了Google BigQuery查询速度很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Google BigQuery,并且正在执行一些来自PHP的简单查询。 (例如SELECT * from emails WHERE email='mail@test.com')我只是检查电子邮件是否存在于表格中。



表格emails是现在空了。但PHP脚本仍然需要大约4分钟的时间来检查一张空桌子上的175封电子邮件。我希望将来这张桌子将会被填满,并且将会有50万封邮件,那么我估计请求时间会更长。

这是正常的吗?或者是否有任何想法/解决方案来提高检查时间?



(PS:表格emails只包含8列,都是字符串类型)



谢谢!

解决方案

如果您只是检查字段的存在,考虑使用 SELECT COUNT(*)FROM emails where email='mail@test.com'来代替。这只需要读取一个字段,所以在大型表上花费更少,速度更快。



< 。你可以这样做:

  SELECT SUM((IF(email ='mail1@test.com',1,0) )as m1,
SUM((IF(email ='mail2@test.com',1,0))as m2,
SUM((IF(email ='mail3@test.com', 1,0))as m3,
...
FROM emails

在单个查询中,你将被限制为64k,但它的计算速度应该非常快,因为它只需要一次扫描一个列。


$ b $另外,如果你想把电子邮件作为每行一个,你可以做一些更有趣的事情,比如

 选择电子邮件从电子邮件地址电子邮件在
('mail1@test.com','mail2@test.com','mail3@test.com'...)
GROUP BY电子邮件

作为进一步优化,您可以将它作为左连接:

  SELECT t1.email as email,IF(t2.email is not null,true,false)as found 
FROM [interesting_emails] t1
LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email

如果interesting_emails有你想检查的电子邮件列表,如

  mail1@test.com 
mail2@test.com
mail3@test.com

如果邮件表只包含mail1 @和maiil2 @,那么你会回来的结果:

 发现电子邮件
______________ _____
mail1@test.com true
mail2@test.com false
mail3@test.com true

这样做的好处是,如果需要的话,它可以扩展到数十亿的电子邮件(当数量变大时,可以考虑使用JOIN EACH而不是JOIN)。

I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test.com') I am just checking if the email exists in the table.

The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.

Is that normal ? Or are there any ideas/solutions to improve the checking time ?

(P.S. : The table "emails" contains only 8 columns, all are string type)

Thank you !

解决方案

If you are just checking for existence of a field, consider using SELECT COUNT(*) FROM emails where email='mail@test.com' instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.

And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:

SELECT SUM((IF(email = 'mail1@test.com', 1, 0)) as m1,
       SUM((IF(email = 'mail2@test.com', 1, 0)) as m2,
       SUM((IF(email = 'mail3@test.com', 1, 0)) as m3,
       ...
 FROM emails

You're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.

Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like

 SELECT email FROM emails WHERE email IN
 ('mail1@test.com', 'mail2@test.com', 'mail3@test.com'...)
 GROUP BY email

As a further optimization, you could do it as a LEFT JOIN:

SELECT t1.email as email, IF(t2.email is not null, true, false) as found 
FROM [interesting_emails] t1  
LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email

If the interesting_emails had the list of emails you wanted to check, like

mail1@test.com
mail2@test.com
mail3@test.com

If the emails table contained only mail1@ and maiil2@, then you'd get back as results:

email            found
______________   _____
mail1@test.com   true
mail2@test.com   false
mail3@test.com   true

The advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).

这篇关于Google BigQuery查询速度很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆