在具有多个输入的trigrams上构造BigQuery [英] Structuring BigQuery on trigrams with multiple inputs
问题描述
现在,感谢回答这位问题的帮助,我能够成功查询单词,并获得最受欢迎的后续单词列表。例如,使用单词great,我可以按照以下格式获得最多10个单词的列表:
SELECT second,SUM(cell.page_count)total
FROM [publicdata:samples.trigrams]
WHERE first =great
group by 1
order by 2 desc
限制10
输出:
第二个总额
------------------
交易3048832
和1689911
,1576341
a 1019511
编号984993
许多875974
重要805215
部分739409
。 700694
as 628978
我目前无法弄清楚如何做到这一点自动查询多个单词(而不是每次在单独的单词上调用查询),以便我可能有如下输出:
greattotalnew_word_1new_total_1 ...new_word_Nnew_total_N
-------------------- -------------------------------------------------- -------------------
deal 3048832new_follow_on_word1123456 ...follow_on_N1234567
和1689911new_follow_on_word212345 ... follow_on_N2123456
基本上我可以调用 另外,在了解了BigQuery的定价,我也很难找出如何尽可能限制查询的总数据。我可以考虑只使用最新的数据(比如2010年以后)和每个单词2个字母数字输出,但可能会丢失更明显的限制条件 对此有任何帮助非常感谢 - 谢谢! 您可以在同一个查询中放置多个第一个单词,但它需要计算前10个分开单词,然后将结果汇总在一起。这里是棒极了和棒球的例子 Presently, thanks to help from the answerer of this question, I am able to successfully query a word, and get a list of the most popular follow-on words. For example, using the word "great", I am able to get a list of up 10 words in the following format: With the output: What I am currently having trouble figuring out how is how to do this query for multiple words automatically (as opposed to calling a query on a separate word each time) so that I could possibly have a output such as: Where essentially I could call Additionally, after learning about the BigQuery's pricing, I am also having trouble figuring out how to limit the total data queried as much possible. I can think of using only the latest data (say, such as 2010 onwards) and 2 alphanumeric outputs per word, but may be missing more obvious limiters Any help on this is much appreciated - thanks! You can put multiple first words in the same query, but it will need to compute top 10 following words separately, and then join together the results. Here is an example for "great" and "baseball"
这篇关于在具有多个输入的trigrams上构造BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! N
单个查询中的单词数量(例如, new_word_1
)是一个完全不同的词,如棒球,与没有关系),并获得与不同colu上每个词相关的总计数MN。
SELECT word1,total1,word2,total2 FROM
SELECT ROW_NUMBER()OVER()rowid1,word1,total1 FROM(
SELECT second as word1,SUM(cell.page_count)total1
FROM [publicdata:samples.trigrams]
WHERE first = ()rowid2,word2,total2 FROM(())$($)
$ 1
$
SELECT second as as2,SUM(cell.page_count)total2
FROM [publicdata:samples.trigrams]
WHERE first =baseball
group by 1
order by 2 desc
limit 10))a2
ON a1.rowid1 = a2.rowid2
SELECT second, SUM(cell.page_count) total
FROM [publicdata:samples.trigrams]
WHERE first = "great"
group by 1
order by 2 desc
limit 10
second total
------------------
deal 3048832
and 1689911
, 1576341
a 1019511
number 984993
many 875974
importance 805215
part 739409
. 700694
as 628978
"great" total "new_word_1" new_total_1 ... "new_word_N" new_total_N
-----------------------------------------------------------------------------------------
deal 3048832 "new_follow_on_word1" 123456 ... "follow_on_N1" 234567
and 1689911 "new_follow_on_word2" 12345 ... "follow_on_N2" 123456
N
number of words in a single query (for example, new_word_1
is a totally different word like "baseball", with no relation to "great"), and getting the total counts related to each word on a different column. SELECT word1, total1, word2, total2 FROM
(SELECT ROW_NUMBER() OVER() rowid1, word1, total1 FROM (
SELECT second as word1, SUM(cell.page_count) total1
FROM [publicdata:samples.trigrams]
WHERE first = "great"
group by 1
order by 2 desc
limit 10)) a1
JOIN
(SELECT ROW_NUMBER() OVER() rowid2, word2, total2 FROM (
SELECT second as word2, SUM(cell.page_count) total2
FROM [publicdata:samples.trigrams]
WHERE first = "baseball"
group by 1
order by 2 desc
limit 10)) a2
ON a1.rowid1 = a2.rowid2