更新列为不同的聚合值 [英] Update column to be different aggregate values
问题描述
我正在创建一个用于合并并从表中删除重复行的脚本。该表包含地址信息,并使用整数字段来存储有关电子邮件的信息作为位标志(列名称lngValue)。例如,lngValue& 1 == 1表示其主地址。
有两次输入相同电子邮件的实例,但有时使用不同的lngValues。要解决这个问题,我需要从所有重复的文件中获取lngValue,并将它们分配给一个幸存的记录,然后删除其余的记录。
到目前为止,我最大的头痛是记录的合并。我想做的是将重复记录按位或所有lngValues在一起。这是我到目前为止,它只发现所有lngValues的值按位或一起。
警告:凌乱的代码
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
插入到@duplicates(lngInternetPK,lngContactFK,lngValue)
(
选择tblminternet.lngInternetPK,tblminternet.lngContactFK,tblminternet.lng从tblminternet内部连接的$ val $ b(选择strAddress,lngcontactfk,count(*)作为从tblminternet计数,其中lngValue& 256 256 group by strAddress,lngcontactfk)secondemail
在tblminternet.strAddress = secondemail.strAddress和
tblminternet.lngcontactfk = secondemail.lngcontactfk
其中count> 1和tblminternet.strAddress不为空和tblminternet。 lngValue&256 256 - order by lngContactFK,strAddress
)
update @duplicates set lngValue = t.val
from
(select(sum(dupes.lngValue)& 65535)as val from
(select h ere.lngInternetPK,here.lngContactFK,here.lng从tblminternet这里的值加入
(select strAddress,lngcontactfk,count(*)as count from tblminternet where lngValue& 256 256组由strAddress,lngcontactfk)secondemail
在here.strAddress = secondemail.strAddress和
here.lngcontactfk = secondemail.lngcontactfk
其中count> 1和here.strAddress不为null,here.lngValue& 256 256)dupes,tblminternet这个
其中this.lngContactFK = dupes.lngContactFK
)t
其中lngInternetPK in(select lngInternetPK from @duplicates)
编辑:
根据要求,这里有一些示例数据:
表名称:tblminternet
列名称:
lngInternetPK
lngContactFK
lngValue
strAddress
示例行1:
lngInternetPK:1
lngContactFK:1
lngValue:33
strAddress:me@myaddress.com
示例行2:
lngInternetPK:2
lngContactFK:1
lngValue:40
strAddress:me@myaddress.com
如果这两个合并在这里是所需的结果:
lngInternetPK:1
lngContactFK:1
lngValue:41
strAddress:me@myaddress.com
其他必要的规则:
每个联系人都可以有多个iple电子邮件,但每个电子邮件行必须是不同的(每个电子邮件只能显示为一行)。
SQL Server
缺少原生的按位聚合,这就是为什么我们需要模拟它们。
这里的主要思想是生成一组位从 0
到 15
,每个位应用位掩码值并选择 MAX
(这将为给定位给我们一个 OR
),然后选择 SUM
这将合并位掩码)。
我们只是为任何给定的$ $ c $更新了第一个 lngInternetPK
c>(lngContactFK,strValue),新值 lngValue
,并删除所有重复项。
$ b $ (
UNION ALL
SELECT b + 1
FROM bits
WHERE b< 15
),
v AS
(
SELECT i。*,
(
SELECT SUM(value)
FROM(
SELECT MAX(lngValue& POWER(2,b))AS值
FROM tblmInternet ii
CROSS JOIN
位
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
)q
)AS lngNewValue
FROM(
SELECT ii。*,ROW_NUMBER()OVER分区由lngContactFK,strAddress ORDER BY lngInternetPK)AS rn
FROM tblmInternet ii
)i
WHERE rn = 1
)
更新v
SET lngValue = lngNewValue;
; WITH v AS
(
SELECT ii。*,ROW_NUMBER()OVER(PARTITION BY lngContactFK,strAddress ORDER BY lngInternetPK)AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn> 1
有关详细说明,请参阅我的博客中的这篇文章:
I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.
There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.
My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.
Warning: messy code ahead
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
insert into @duplicates (lngInternetPK, lngContactFK, lngValue)
(
select tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue from tblminternet inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)
update @duplicates set lngValue = t.val
from
(select (sum(dupes.lngValue) & 65535) as val from
(select here.lngInternetPK, here.lngContactFK, here.lngValue from tblminternet here inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On here.strAddress = secondemail.strAddress and
here.lngcontactfk = secondemail.lngcontactfk
where count > 1 and here.strAddress is not null and here.lngValue & 256 <> 256) dupes, tblminternet this
where this.lngContactFK = dupes.lngContactFK
) t
where lngInternetPK in (select lngInternetPK from @duplicates)
Edit:
As requested here is some sample data:
Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress
Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"
If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"
Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.
The main idea here is to generate a set of bits from 0
to 15
, for each bit apply the bitmask to the value and select MAX
(which will give us an OR
for a given bit), then select the SUM
(which will merge the bit masks).
The we just update the first lngInternetPK
for any given (lngContactFK, strValue)
with the new value of lngValue
, and delete all duplicates.
;WITH bits AS
(
SELECT 0 AS b
UNION ALL
SELECT b + 1
FROM bits
WHERE b < 15
),
v AS
(
SELECT i.*,
(
SELECT SUM(value)
FROM (
SELECT MAX(lngValue & POWER(2, b)) AS value
FROM tblmInternet ii
CROSS JOIN
bits
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
) q
) AS lngNewValue
FROM (
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
) i
WHERE rn = 1
)
UPDATE v
SET lngValue = lngNewValue;
;WITH v AS
(
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn > 1
See this article in my blog for more detailed explanations:
这篇关于更新列为不同的聚合值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!