Google BigQuery查询速度很慢 [英] Google BigQuery queries are slow

查看：236 发布时间：2018/5/7 17:37:57 php sql google-app-engine bigdata google-bigquery

本文介绍了Google BigQuery查询速度很慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Google BigQuery，并且正在执行一些来自PHP的简单查询。（例如SELECT * from emails WHERE email='mail@test.com'）我只是检查电子邮件是否存在于表格中。

表格emails是现在空了。但PHP脚本仍然需要大约4分钟的时间来检查一张空桌子上的175封电子邮件。我希望将来这张桌子将会被填满，并且将会有50万封邮件，那么我估计请求时间会更长。

这是正常的吗？或者是否有任何想法/解决方案来提高检查时间？

（PS：表格emails只包含8列，都是字符串类型）

谢谢！

解决方案

如果您只是检查字段的存在，考虑使用 SELECT COUNT（*）FROM emails where email='mail@test.com'来代替。这只需要读取一个字段，所以在大型表上花费更少，速度更快。

< 。你可以这样做：

SELECT SUM（（IF（email ='mail1@test.com'，1,0））as m1， SUM（（IF（email ='mail2@test.com'，1,0））as m2， SUM（（IF（email ='mail3@test.com'， 1，0））as m3， ... FROM emails
在单个查询中，你将被限制为64k，但它的计算速度应该非常快，因为它只需要一次扫描一个列。

$ b $另外，如果你想把电子邮件作为每行一个，你可以做一些更有趣的事情，比如

选择电子邮件从电子邮件地址电子邮件在（'mail1@test.com'，'mail2@test.com'，'mail3@test.com'...） GROUP BY电子邮件
作为进一步优化，您可以将它作为左连接：

SELECT t1.email as email，IF（t2.email is not null，true，false）as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email
如果interesting_emails有你想检查的电子邮件列表，如

mail1@test.com mail2@test.com mail3@test.com
如果邮件表只包含mail1 @和maiil2 @，那么你会回来的结果：
发现电子邮件 ______________ _____ mail1@test.com true mail2@test.com false mail3@test.com true
这样做的好处是，如果需要的话，它可以扩展到数十亿的电子邮件（当数量变大时，可以考虑使用JOIN EACH而不是JOIN）。
I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test.com') I am just checking if the email exists in the table.

The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.

Is that normal ? Or are there any ideas/solutions to improve the checking time ?

(P.S. : The table "emails" contains only 8 columns, all are string type)

Thank you !
解决方案
If you are just checking for existence of a field, consider using SELECT COUNT(*) FROM emails where email='mail@test.com' instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.

And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:
SELECT SUM((IF(email = 'mail1@test.com', 1, 0)) as m1, SUM((IF(email = 'mail2@test.com', 1, 0)) as m2, SUM((IF(email = 'mail3@test.com', 1, 0)) as m3, ... FROM emails
You're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.

Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like
SELECT email FROM emails WHERE email IN ('mail1@test.com', 'mail2@test.com', 'mail3@test.com'...) GROUP BY email
As a further optimization, you could do it as a LEFT JOIN:
SELECT t1.email as email, IF(t2.email is not null, true, false) as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email
If the interesting_emails had the list of emails you wanted to check, like
mail1@test.com mail2@test.com mail3@test.com
If the emails table contained only mail1@ and maiil2@, then you'd get back as results:
email found ______________ _____ mail1@test.com true mail2@test.com false mail3@test.com true
The advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).

这篇关于Google BigQuery查询速度很慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Google BigQuery查询速度很慢 [英] Google BigQuery queries are slow

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

Google BigQuery查询速度很慢 [英] Google BigQuery queries are slow

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭