根据其他表中的匹配选择分区 [英] Select partitions based on matches in other table

查看:46
本文介绍了根据其他表中的匹配选择分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有下表(个对话):

 id | record_id  |  is_response  |         text         |
 ---+------------+---------------+----------------------+
 1  |     1      |      false    | in text 1            |
 2  |     1      |      true     | response text 3      |
 3  |     1      |      false    | in text 2            |
 4  |     1      |      true     | response text 2      |
 5  |     1      |      true     | response text 3      |
 6  |     2      |      false    | in text 1            |
 7  |     2      |      true     | response text 1      |
 8  |     2      |      false    | in text 2            |
 9  |     2      |      true     | response text 3      |
 10 |     2      |      true     | response text 4      |

另一个帮助表(响应):

 id |         text         |
 ---+----------------------+
 1  | response text 1      |
 2  | response text 2      |
 3  | response text 4      |

我正在寻找一个SQL查询来输出以下内容:

I'm looking for an SQL query to output the following:

  record_id |       context
  ----------+-----------------------+---------------------
       1    | in text 1 response text 3 in text 2 response text 2
  ----------+-----------------------+---------------------
       2    | in text 1 response text 1
  ----------+-----------------------+---------------------
       2    | in text 2 response text 3 response text 4

因此每次 is_response true 文本响应表中,<到目前为止,strong>聚集会话上下文,而忽略没有以池中的响应结尾的会话部分。

So each time is_response is true and the text is in the responses table, aggregate the conversation context up to this point, ignoring conversation part that does not end with a response in the pool.

在上面的示例中,响应文本3 record_id 1中。

In the example above living response text 3 in record_id 1.

我尝试了以下复杂的SQL,但有时会错误地将文本汇总:

I've tried the following complex SQL but it breaks sometimes aggregating the text wrong:

with context as(
    with answers as (

       SELECT record_id, is_response, id as ans_id
        , max(id)
          OVER (PARTITION BY record_id ORDER BY id
          ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_ans_id
       FROM (select * from conversations where text in (select text from responses)) ans
       ),
     lines as (
      select answers.record_id, con.id, COALESCE(previous_ans_id || ',' || ans_id, '0') as block, con.text as text from answers, conversations con where con.engagement_id = answers.record_id and ((previous_ans_id is null and con.id <= ans_id) OR (con.id > previous_ans_id and con.id <= ans_id)) order by engagement_id, id asc
      )

      select record_id, block,replace(trim(both ' ' from string_agg(text, E' ')) ,'  ',' ') ctx from lines group by record_id, block order by record_id,block
      )

select * from context

我确信还有更好的方法。

I'm sure there is a better way.

推荐答案

这是我的看法:

SELECT
    record_id,
    string_agg(text, ' ' ORDER BY id) AS context
FROM (
    SELECT
        *,
        coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
    FROM (
        SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
        FROM conversations
         ) c
     ) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);

这将扫描表个会话一次,但我不确定它的性能是否会比您的解决方案好。基本思想是使用窗口函数来计算同一记录中的前几行如何结束对话。然后,我们可以使用该数字和 record_id 进行分组,并丢弃不完整的对话。

This will scan the table conversations once, but I am not sure if it will perform better than your solution. The basic idea is to use a window function to count how maybe preceding rows within the same record, end the conversation. Then we can group by with that number and the record_id and discard incomplete conversations.

这篇关于根据其他表中的匹配选择分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆