如何搜索包含特定单词的行然后返回每个单词的计数? [英] How to search for rows containing specific words then return count of each word?

查看:65
本文介绍了如何搜索包含特定单词的行然后返回每个单词的计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



文字我有150,000行数据,我试图在Google BigQuery中查询。包含各种长度的文本,我想从中查询特定的关键字。



我已经得到了下面的查询,它返回包含特定关键字的所有行如:Facebook):

pre $ 从Data.Set_1中选择文本
WHERE文本CONTAINS'facebook'
code>

问题:

1)我是否改进了查询,以便在新列中的文本中返回关键字'facebook'的所有匹配项的总数?

2)我如何(facebook,cnn,bbc,twitter),并返回数据中每个关键字的总数(例如,facebook 42,cnn 54,bbc 88,twitter 49)?
<对于BigQuery Legacy SQL来说,这是一个很好的解决方案。


COUNT(1)AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')))/ LENGTH(keyword)))AS出现
FROM YourTable
CROSS JOIN关键字
WHERE文本CONTAINS关键字
GROUP BY关键字

使用示例

  SELECT 
关键字,
COUNT(1)AS行,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')))/ LENGTH(keyword)))AS发生
FROM(
SELECT文本FROM
(SELECT'facebookfacebookcnnbbccnn'AS Text),
(SELECT'facebook'AS Text),
(SELECT'cnn'AS Text)
)AS words
CROSS JOIN(
SELECT键关键字FROM
(SELECT'facebook'AS关键字),
(SELECT'cnn'AS关键字),
(SELECT'bbc'AS关键字)
)AS关键字
WHERE文本CONTAINS关键字
GROUP BY关键字

对于BigQuery Standard SQL(请参阅启用标准SQL

  SELECT 
关键字,
COUNT(1)AS`rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,'')) )/ LENGTH(关键字))AS发生
FROM YourTable
JOIN关键字
ON STRPOS(文本,关键字)> 0
GROUP BY关键字

使用示例

  WITH关键字AS(
SELECT'facebook'AS关键字UNION ALL
SELECT'cnn'AS关键字UNION ALL
SELECT'bbc 'AS关键字
),
words AS(
SELECT'facebookfacebookcnnbbccnn'AS文本UNION ALL
SELECT'facebook'AS文本UNION ALL
SELECT'cnn'AS文本

SELECT
关键字,
COUNT(1)AS`rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text,keyword,' ')))/ LENGTH(关键字))出现
FROM words
JOIN关键字
ON STRPOS(文本,关键字)> 0
GROUP BY关键字


I have 150,000 rows of data which I'm attempting to query in Google BigQuery.

Column Text contains various lengths of text, from which I want to query for particular keywords.

I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):

SELECT Text From Data.Set_1 
WHERE Text CONTAINS 'facebook'

Questions:

1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?

2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?

解决方案

for BigQuery Legacy SQL

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM YourTable 
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

Example to play with

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM (
  SELECT Text FROM
    (SELECT 'facebookfacebookcnnbbccnn' AS Text),
    (SELECT 'facebook' AS Text), 
    (SELECT 'cnn' AS Text)
) AS words 
CROSS JOIN (
  SELECT keyword FROM 
    (SELECT 'facebook' AS keyword),
    (SELECT 'cnn' AS keyword), 
    (SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

For BigQuery Standard SQL (see Enabling Standard SQL)

SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM YourTable 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

Example to play with

WITH keywords AS (
  SELECT 'facebook' AS keyword UNION ALL
  SELECT 'cnn' AS keyword UNION ALL
  SELECT 'bbc' AS keyword 
),
words AS (
  SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
  SELECT 'facebook' AS Text UNION ALL
  SELECT 'cnn' AS Text 
)
SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM words 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

这篇关于如何搜索包含特定单词的行然后返回每个单词的计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆