将列更新为不同的聚合值 [英] Update column to be different aggregate values
问题描述
我正在创建一个用于合并"和删除表中重复行的脚本.该表包含地址信息,并使用整数字段将有关电子邮件的信息存储为位标志(列名 lngValue).例如,lngValue &1 == 1 表示它的主地址.
I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.
同一电子邮件被输入两次的情况,但有时使用不同的 lngValues.为了解决这个问题,我需要从所有重复项中获取 lngValue 并将它们分配给一个幸存的记录并删除其余记录.
There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.
到目前为止,我最头疼的是记录的合并".我想要做的是按位或重复记录的所有 lngValues 一起.这是我到目前为止所拥有的,它只找到所有 lngValues 的值按位或一起.
My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.
警告:前面有乱码
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
insert into @duplicates (lngInternetPK, lngContactFK, lngValue)
(
select tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue from tblminternet inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)
update @duplicates set lngValue = t.val
from
(select (sum(dupes.lngValue) & 65535) as val from
(select here.lngInternetPK, here.lngContactFK, here.lngValue from tblminternet here inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On here.strAddress = secondemail.strAddress and
here.lngcontactfk = secondemail.lngcontactfk
where count > 1 and here.strAddress is not null and here.lngValue & 256 <> 256) dupes, tblminternet this
where this.lngContactFK = dupes.lngContactFK
) t
where lngInternetPK in (select lngInternetPK from @duplicates)
这里要求的是一些示例数据:
As requested here is some sample data:
表名:tblminternet
列名:
lngInternetPK
lngContactFK
lng值
字符串地址
Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress
示例第 1 行:
lngInternetPK:1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
示例第 2 行:
lngInternetPK:2
lngContactFK: 1
lng值:40
strAddress: "me@myaddress.com"
Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"
如果这两个在这里合并是期望的结果:
lngInternetPK:1
lngContactFK: 1
lng值:41
strAddress: "me@myaddress.com"
If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"
其他必要的规则:
每个联系人可以有多个电子邮件,但每个电子邮件行必须是不同的(每封电子邮件只能显示为一行).
Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).
推荐答案
SQL Server
缺少原生的按位聚合,这就是我们需要模拟它们的原因.
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.
这里的主要思想是生成一组从0
到15
的位,对每个位应用位掩码到值并选择MAX
(将为给定位提供 OR
),然后选择 SUM
(将合并位掩码).
The main idea here is to generate a set of bits from 0
to 15
, for each bit apply the bitmask to the value and select MAX
(which will give us an OR
for a given bit), then select the SUM
(which will merge the bit masks).
我们只是用 lngValue
的新值更新任何给定 (lngContactFK, strValue)
的第一个 lngInternetPK
,并删除所有重复项.
The we just update the first lngInternetPK
for any given (lngContactFK, strValue)
with the new value of lngValue
, and delete all duplicates.
;WITH bits AS
(
SELECT 0 AS b
UNION ALL
SELECT b + 1
FROM bits
WHERE b < 15
),
v AS
(
SELECT i.*,
(
SELECT SUM(value)
FROM (
SELECT MAX(lngValue & POWER(2, b)) AS value
FROM tblmInternet ii
CROSS JOIN
bits
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
) q
) AS lngNewValue
FROM (
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
) i
WHERE rn = 1
)
UPDATE v
SET lngValue = lngNewValue;
;WITH v AS
(
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn > 1
查看我博客中的这篇文章以获得更详细的解释:
See this article in my blog for more detailed explanations:
这篇关于将列更新为不同的聚合值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!