在Pg中查找彼此相邻的两个单词的句子 [英] Find sentences with two words adjacent to each other in Pg

查看:274
本文介绍了在Pg中查找彼此相邻的两个单词的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要帮助制定一个先进的Postgres查询。我试图用两个相邻的单词来查找句子,直接使用Postgres,而不是一些命令语言扩展。我的表格是:

pre $ TABLE字(拼写文本,wordid序列)
TABLE语句(sentenceid serial)
TABLE item(句子整数,位置smallint,wordid整数)

我有一个简单的查询来查找句子用一个单词:

  SELECT DISTINCT sentence.sentenceid 
FROM商品,单词,句子
WHERE word .spelling ='word1'
AND item.wordid = word.wordid
AND sentence.sentenceid = item.sentenceid

我想依次查询某个其他单词( word2 )的结果,该单词的相应项目的 item.sentenceid 等于到当前查询结果( item sentence )的 sentenceid ,其中 item.position 等于当前查询结果的 item.position + 1 。如何优化我的查询以实现此目标并以高性能的方式执行?

解决方案

更简单的解决方案,但只给出结果,在 item.position s中没有差别:

  SELECT DISTINCT句子.sentenceid 
从句子
加入项目on sentence.sentenceid = item.sentenceid
加入单词on item.wordid = word.wordid
加入项目next_item ON sentence.sentenceid = next_item .sentenceid
AND next_item.position = item.position + 1
加入单词AS next_word ON next_item.wordid = next_word.wordid
WHERE word.spelling ='word1'
AND next_word .spelling ='word2'

更通用的解决方案,使用窗口函数

  SELECT DISTINCT句子
FROM(SELECT sentence.sentenceid,
word.spelling,
lead(word.spelling)OVER(PARTITION BY sentence.sentenceid
ORDER BY item.position)
FROM句子
JOIN项目on sentence.sentenceid = item。 sentenceid
JOIN word ON item.wordid = word.wordid)AS pair
WHERE spelling ='word1'
AND lead ='word2'

编辑:也是一般解决方案(允许空白),但仅限连接:

  SELECT DISTINCT sentence.sentenceid 
FROM句子
JOIN项ON句子.sentenceid = item.sentenceid
加入单词ON item.wordid =单词.wordid
JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
AND next_item.position> item.position
JOIN word AS next_word ON next_item.wordid = next_word.wordid
LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
AND mediate_word.position> item.position
AND mediate_word.position< next_item.position
WHERE mediate_word.wordid IS NULL
AND word.spelling ='word1'
AND next_word.spelling ='word2'


I need help crafting an advanced Postgres query. I am trying to find sentences with two words adjacent to each other, using Postgres directly, not some command language extension. My tables are:

TABLE word (spelling text, wordid serial)
TABLE sentence (sentenceid serial)
TABLE item (sentenceid integer, position smallint, wordid integer)

I have a simple query to find sentences with a single word:

SELECT DISTINCT sentence.sentenceid 
FROM item,word,sentence 
WHERE word.spelling = 'word1' 
  AND item.wordid = word.wordid 
  AND sentence.sentenceid = item.sentenceid 

I want to filter the results of that query in turn by some other word (word2) whose corresponding item has an item.sentenceid equal to the current query result's (item or sentence)'s sentenceid and where item.position is equal to the current query result's item.position + 1. How can I refine my query to achieve this goal and in a performant manner?

解决方案

Simpler solution, but only gives results, when there are no gaps in item.positions:

SELECT DISTINCT sentence.sentenceid 
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position = item.position + 1
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
 WHERE word.spelling = 'word1'
   AND next_word.spelling = 'word2'

More general solution, using window functions:

SELECT DISTINCT sentenceid
FROM (SELECT sentence.sentenceid,
             word.spelling,
             lead(word.spelling) OVER (PARTITION BY sentence.sentenceid
                                           ORDER BY item.position)
        FROM sentence 
        JOIN item ON sentence.sentenceid = item.sentenceid
        JOIN word ON item.wordid = word.wordid) AS pairs
 WHERE spelling = 'word1'
   AND lead = 'word2'

Edit: Also general solution (gaps allowed), but with joins only:

SELECT DISTINCT sentence.sentenceid
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position > item.position
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
  LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
                                AND mediate_word.position > item.position
                                AND mediate_word.position < next_item.position
 WHERE mediate_word.wordid IS NULL
   AND word.spelling = 'word1'
   AND next_word.spelling = 'word2'

这篇关于在Pg中查找彼此相邻的两个单词的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆