将列更新为不同的聚合值 [英] Update column to be different aggregate values

查看:23
本文介绍了将列更新为不同的聚合值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个用于合并"和删除表中重复行的脚本.该表包含地址信息,并使用整数字段将有关电子邮件的信息存储为位标志(列名 lngValue).例如,lngValue &1 == 1 表示它的主地址.

I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.

同一电子邮件被输入两次的情况,但有时使用不同的 lngValues.为了解决这个问题,我需要从所有重复项中获取 lngValue 并将它们分配给一个幸存的记录并删除其余记录.

There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.

到目前为止,我最头疼的是记录的合并".我想要做的是按位或重复记录的所有 lngValues 一起.这是我到目前为止所拥有的,它只找到所有 lngValues 的值按位或一起.

My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.

警告:前面有乱码

declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)

insert into @duplicates (lngInternetPK, lngContactFK, lngValue) 
(
select  tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue   from tblminternet  inner join 
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk 
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)

update @duplicates set lngValue = t.val

from 
                (select (sum(dupes.lngValue) & 65535) as val from 
                    (select  here.lngInternetPK,                     here.lngContactFK, here.lngValue from tblminternet here  inner join 
                    (select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
                    On here.strAddress = secondemail.strAddress     and
                    here.lngcontactfk = secondemail.lngcontactfk 
                    where count > 1 and here.strAddress is not      null and here.lngValue & 256 <> 256) dupes, tblminternet this

                where this.lngContactFK = dupes.lngContactFK
                ) t
where lngInternetPK in (select lngInternetPK from @duplicates)    


这里要求的是一些示例数据:


As requested here is some sample data:

表名:tblminternet
列名:
lngInternetPK
lngContactFK
lng值
字符串地址

Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress

示例第 1 行:
lngInternetPK:1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"

Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"

示例第 2 行:
lngInternetPK:2
lngContactFK: 1
lng值:40
strAddress: "me@myaddress.com"

Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"

如果这两个在这里合并是期望的结果:
lngInternetPK:1
lngContactFK: 1
lng值:41
strAddress: "me@myaddress.com"

If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"

其他必要的规则:
每个联系人可以有多个电子邮件,但每个电子邮件行必须是不同的(每封电子邮件只能显示为一行).

Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).

推荐答案

SQL Server 缺少原生的按位聚合,这就是我们需要模拟它们的原因.

SQL Server lacks native bitwise aggregates, that's why we need to emulate them.

这里的主要思想是生成一组从015的位,对每个位应用位掩码到值并选择MAX(将为给定位提供 OR),然后选择 SUM(将合并位掩码).

The main idea here is to generate a set of bits from 0 to 15, for each bit apply the bitmask to the value and select MAX (which will give us an OR for a given bit), then select the SUM (which will merge the bit masks).

我们只是用 lngValue 的新值更新任何给定 (lngContactFK, strValue) 的第一个 lngInternetPK,并删除所有重复项.

The we just update the first lngInternetPK for any given (lngContactFK, strValue) with the new value of lngValue, and delete all duplicates.

;WITH   bits AS
        (
        SELECT  0 AS b
        UNION ALL
        SELECT  b + 1
        FROM    bits
        WHERE   b < 15
        ),
        v AS
        (
        SELECT  i.*,
                (
                SELECT  SUM(value)
                FROM    (
                        SELECT  MAX(lngValue & POWER(2, b)) AS value
                        FROM    tblmInternet ii
                        CROSS JOIN
                                bits
                        WHERE   ii.lngContactFK = i.lngContactFK
                                AND ii.strAddress = i.strAddress
                        GROUP BY
                                b
                        ) q
                ) AS lngNewValue
        FROM    (
                SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
                FROM    tblmInternet ii
                ) i
        WHERE   rn = 1
        )
UPDATE  v
SET     lngValue = lngNewValue;

;WITH    v AS
        (
        SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
        FROM    tblmInternet ii
        )
DELETE  v
WHERE   rn > 1

查看我博客中的这篇文章以获得更详细的解释:

See this article in my blog for more detailed explanations:

这篇关于将列更新为不同的聚合值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆