使用合并合并匹配记录 [英] Using merge to combine matching records

查看:120
本文介绍了使用合并合并匹配记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将来自一个表的匹配记录合并为另一个表的单个记录.我知道这可以通过group by以及sum(),max()等来完成,但我的困难在于,不属于group by的列是我需要串联的varchars.

I'm trying to combine matching records from a table into a single record of another table. I know this can be done with group by, and sum(), max(), etc..., My difficulty is that the columns that are not part of the group by are varchars that i need to concatenate.

我正在使用Sybase ASE 15,因此没有像MySQL的group_concat之类的功能.

I'm using Sybase ASE 15, so I do not have a function like MySQL's group_concat or similar.

我尝试使用没有运气的合并,目标表的结尾与源表的记录数相同.

I tried using merge without luck, the target table ended with the same number of records of source table.

create table #source_t(account varchar(10), event varchar(10))

Insert into #source_t(account, event) values ('account1','event 1')
Insert into #source_t(account, event) values ('account1','event 2')
Insert into #source_t(account, event) values ('account1','event 3')

create table #target(account varchar(10), event_list varchar(2048))

merge into #target as t
    using #source_t as s
    on t.account = s.account
    when     matched then update set event_list = t.event_list + ' | ' + s.event
    when not matched then insert(account, event_list) values (s.account, s.event)

select * from #target

drop table #target

drop table #source_t

考虑到上表,我希望每个帐户有一个记录,该帐户的所有事件都在第二列中进行连接.

Considering the above tables, I wanted to have one record per account, with all the events of the account concatenated in the second column.

account, event_list
'account1', 'event 1 | event 2 | event 3'

但是,我得到的只是与#source相同的记录.

However, all I've got is the same records as #source.

在我看来,在语句执行开始时尝试对表的状态"进行合并匹配,因此匹配时从不执行.有没有办法告诉DBMS与更新后的目标表进行匹配?

It seems to me that the match in merge is attempted against the "state" of the table at the beginning of statement execution, so the when matched never executes. Is there a way of telling the DBMS to match against the updated target table?

我设法通过使用游标获得了所需的结果,所以执行了merge语句n次,n是#source中的记录数,因此合并实际上在匹配时执行 部分.

I managed to obtain the results I needed by using a cursor, so the merge statement is executed n times, n being the number of records in #source, thus the merge actually executes the when matched part.

问题在于性能,以这种方式删除重复项大约需要5分钟才能将63K记录合并为42K.

The problem with it is the performance, removing duplicates this way takes about 5 minutes to combine 63K records into 42K.

是否有更快的方法来实现这一目标?

Is there a faster way of achieving this?

推荐答案

在使用UPDATE语句更新@variable时,UPDATE语句有一个鲜为人知的部分(文献不充分?),该变量允许您在@中累加/连接值.变量作为基于集合的UPDATE操作的一部分.

There's a little known (poorly documented?) aspect of the UPDATE statement when using it to update a @variable which allows you to accumulate/concatenate values in the @variable as part of a set-based UPDATE operation.

通过一个示例更容易解释":

This is easier to 'explain' with an example:

create table source
(account  varchar(10)
,event    varchar(10)
)
go

insert source values ('account1','event 1')
insert source values ('account1','event 2')
insert source values ('account1','event 3')

insert source values ('account2','event 1')

insert source values ('account3','event 1')
insert source values ('account3','event 2')
go

declare @account      varchar(10),
        @event_list   varchar(40)   -- increase the size to your expected max length 

select  @account = 'account1'

-- allow our UPDATE statement to cycle through the events for 'account1',
-- appending each successive event to @event_list

update  source
set     @event_list = @event_list + 
                      case when @event_list is not NULL then ' | ' end + 
                      event
from    source
where   account = @account

-- we'll display as a single-row result set; we could also use a 'print' statement ... 
-- just depends on what format the calling process is looking for

select  @account     as account,
        @event_list  as event_list
go

 account    event_list
 ---------- ----------------------------------------
 account1   event 1 | event 2 | event 3

PRO:

  • 单个UPDATE语句处理单个帐户值

CON:

  • 仍然需要一个游标来处理一系列帐户值
  • 如果所需的最终输出是单个结果集,则需要将中间结果(例如,@ account和@update)存储在(临时)表中,然后针对此(临时)表运行最终SELECT以产生所需的结果集
  • 当您实际上没有更新物理表时,如果您无权访问更新"表,则可能会遇到问题

注意:您可以将cursor/UPDATE逻辑放入存储的过程中,通过代理表调用该过程,这将允许一系列"select @ account,@ update"语句的输出返回到调用过程作为单个结果集...但这是(某种)复杂卷积编码方法的另一个主题.

NOTE: You could put the cursor/UPDATE logic in a stored proc, call the proc through a proxy table, and this would allow the output from a series of 'select @account,@update' statements to be returned to the calling process as a single result set ... but that's a whole 'nother topic on a (somewhat) convoluted coding method.

对于您的流程,您将需要一个游标来遍历唯一的一组帐户值,但是您将能够消除用于遍历给定帐户的事件列表的游标开销.最终结果是,您应该在运行过程上看到一些时间上的改善.

For your process you'll need a cursor to loop through your unique set of account values, but you'll be able to eliminate the cursor overhead for looping through the list of events for a given account. Net result is that you should see some improvement in the time it takes to run your process.

这篇关于使用合并合并匹配记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆