如何搜索包含特定单词的行然后返回每个单词的计数? [英] How to search for rows containing specific words then return count of each word?
问题描述
列文字
我有150,000行数据,我试图在Google BigQuery中查询。包含各种长度的文本,我想从中查询特定的关键字。
我已经得到了下面的查询,它返回包含特定关键字的所有行如:Facebook): 问题: 1)我是否改进了查询,以便在新列中的文本中返回关键字'facebook'的所有匹配项的总数? 2)我如何(facebook,cnn,bbc,twitter),并返回数据中每个关键字的总数(例如,facebook 42,cnn 54,bbc 88,twitter 49)?
pre $ 从Data.Set_1中选择文本
WHERE文本CONTAINS'facebook'
code>
<对于BigQuery Legacy SQL来说,这是一个很好的解决方案。
COUNT(1)AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')))/ LENGTH(keyword)))AS出现
FROM YourTable
CROSS JOIN关键字
WHERE文本CONTAINS关键字
GROUP BY关键字
使用示例
SELECT
关键字,
COUNT(1)AS行,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')))/ LENGTH(keyword)))AS发生
FROM(
SELECT文本FROM
(SELECT'facebookfacebookcnnbbccnn'AS Text),
(SELECT'facebook'AS Text),
(SELECT'cnn'AS Text)
)AS words
CROSS JOIN(
SELECT键关键字FROM
(SELECT'facebook'AS关键字),
(SELECT'cnn'AS关键字),
(SELECT'bbc'AS关键字)
)AS关键字
WHERE文本CONTAINS关键字
GROUP BY关键字
对于BigQuery Standard SQL(请参阅启用标准SQL )
SELECT
关键字,
COUNT(1)AS`rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')) )/ LENGTH(关键字))AS发生
FROM YourTable
JOIN关键字
ON STRPOS(文本,关键字)> 0
GROUP BY关键字
使用示例
WITH关键字AS(
SELECT'facebook'AS关键字UNION ALL
SELECT'cnn'AS关键字UNION ALL
SELECT'bbc 'AS关键字
),
words AS(
SELECT'facebookfacebookcnnbbccnn'AS文本UNION ALL
SELECT'facebook'AS文本UNION ALL
SELECT'cnn'AS文本
)
SELECT
关键字,
COUNT(1)AS`rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,' ')))/ LENGTH(关键字))出现
FROM words
JOIN关键字
ON STRPOS(文本,关键字)> 0
GROUP BY关键字
I have 150,000 rows of data which I'm attempting to query in Google BigQuery.
Column Text
contains various lengths of text, from which I want to query for particular keywords.
I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):
SELECT Text From Data.Set_1
WHERE Text CONTAINS 'facebook'
Questions:
1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?
2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?
for BigQuery Legacy SQL
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM YourTable
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
Example to play with
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM (
SELECT Text FROM
(SELECT 'facebookfacebookcnnbbccnn' AS Text),
(SELECT 'facebook' AS Text),
(SELECT 'cnn' AS Text)
) AS words
CROSS JOIN (
SELECT keyword FROM
(SELECT 'facebook' AS keyword),
(SELECT 'cnn' AS keyword),
(SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
For BigQuery Standard SQL (see Enabling Standard SQL)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM YourTable
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
Example to play with
WITH keywords AS (
SELECT 'facebook' AS keyword UNION ALL
SELECT 'cnn' AS keyword UNION ALL
SELECT 'bbc' AS keyword
),
words AS (
SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
SELECT 'facebook' AS Text UNION ALL
SELECT 'cnn' AS Text
)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM words
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
这篇关于如何搜索包含特定单词的行然后返回每个单词的计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!