在具有多个输入的trigrams上构造BigQuery [英] Structuring BigQuery on trigrams with multiple inputs

查看:135
本文介绍了在具有多个输入的trigrams上构造BigQuery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,感谢回答这位问题的帮助,我能够成功查询单词,并获得最受欢迎的后续单词列表。例如,使用单词great,我可以按照以下格式获得最多10个单词的列表:

  SELECT second,SUM(cell.page_count)total 
FROM [publicdata:samples.trigrams]
WHERE first =great
group by 1
order by 2 desc
限制10

输出:

 第二个总额
------------------
交易3048832
和1689911
,1576341
a 1019511
编号984993
许多875974
重要805215
部分739409
。 700694
as 628978

我目前无法弄清楚如何做到这一点自动查询多个单词(而不是每次在单独的单词上调用查询),以便我可能有如下输出:

 greattotalnew_word_1new_total_1 ...new_word_Nnew_total_N 
-------------------- -------------------------------------------------- -------------------
deal 3048832new_follow_on_word1123456 ...follow_on_N1234567
和1689911new_follow_on_word212345 ... follow_on_N2123456

基本上我可以调用 N 单个查询中的单词数量(例如, new_word_1 )是一个完全不同的词,如棒球,与没有关系),并获得与不同colu上每个词相关的总计数MN。

另外,在了解了BigQuery的定价,我也很难找出如何尽可能限制查询的总数据。我可以考虑只使用最新的数据(比如2010年以后)和每个单词2个字母数字输出,但可能会丢失更明显的限制条件



对此有任何帮助非常感谢 - 谢谢!

解决方案

您可以在同一个查询中放置多个第一个单词,但它需要计算前10个分开单词,然后将结果汇总在一起。这里是棒极了和棒球的例子

  SELECT word1,total1,word2,total2 FROM 
SELECT ROW_NUMBER()OVER()rowid1,word1,total1 FROM(
SELECT second as word1,SUM(cell.page_count)total1
FROM [publicdata:samples.trigrams]
WHERE first = ()rowid2,word2,total2 FROM(())$($)

$ 1


$
SELECT second as as2,SUM(cell.page_count)total2
FROM [publicdata:samples.trigrams]
WHERE first =baseball
group by 1
order by 2 desc
limit 10))a2
ON a1.rowid1 = a2.rowid2


Presently, thanks to help from the answerer of this question, I am able to successfully query a word, and get a list of the most popular follow-on words. For example, using the word "great", I am able to get a list of up 10 words in the following format:

SELECT second, SUM(cell.page_count) total 
FROM [publicdata:samples.trigrams] 
WHERE first = "great"
group by 1
order by 2 desc
limit 10

With the output:

second     total     
------------------
deal       3048832   
and        1689911   
,          1576341   
a          1019511   
number     984993    
many       875974    
importance 805215    
part       739409    
.          700694    
as         628978

What I am currently having trouble figuring out how is how to do this query for multiple words automatically (as opposed to calling a query on a separate word each time) so that I could possibly have a output such as:

"great"     total     "new_word_1"           new_total_1 ... "new_word_N"     new_total_N
-----------------------------------------------------------------------------------------
deal       3048832    "new_follow_on_word1"  123456      ... "follow_on_N1"   234567
and        1689911    "new_follow_on_word2"  12345       ... "follow_on_N2"   123456

Where essentially I could call N number of words in a single query (for example, new_word_1 is a totally different word like "baseball", with no relation to "great"), and getting the total counts related to each word on a different column.

Additionally, after learning about the BigQuery's pricing, I am also having trouble figuring out how to limit the total data queried as much possible. I can think of using only the latest data (say, such as 2010 onwards) and 2 alphanumeric outputs per word, but may be missing more obvious limiters

Any help on this is much appreciated - thanks!

解决方案

You can put multiple first words in the same query, but it will need to compute top 10 following words separately, and then join together the results. Here is an example for "great" and "baseball"

SELECT word1, total1, word2, total2 FROM
(SELECT ROW_NUMBER() OVER() rowid1, word1, total1 FROM (
SELECT second as word1, SUM(cell.page_count) total1 
FROM [publicdata:samples.trigrams] 
WHERE first = "great"
group by 1
order by 2 desc
limit 10)) a1
JOIN
(SELECT ROW_NUMBER() OVER() rowid2, word2, total2 FROM (
SELECT second as word2, SUM(cell.page_count) total2 
FROM [publicdata:samples.trigrams] 
WHERE first = "baseball"
group by 1
order by 2 desc
limit 10)) a2
ON a1.rowid1 = a2.rowid2

这篇关于在具有多个输入的trigrams上构造BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆