选择重复并保持最旧(不是基于ID) [英] Select duplicate and keep the oldest (not based on ID)
问题描述
让我解释一下,我有这样的表:
|域|创建日期|值1 |值2 |
| -------- | --------------------- | --------- | ---- ----- |
| abc | 2013-05-28 15:35:01 |值1 |值2 |
| abc | 2013-04-30 12:10:10 |值1 |值2 |
| aaa | 2011-04-02 13:10:10 |值1 |值2 |
| bbb | 2012-02-12 10:48:10 |值1 |值2 |
| bbb | 2013-04-15 07:15:23 |值1 |值2 |
我想选择(使用子查询)这个:
|域|创建日期|值1 |值2 |
| -------- | --------------------- | --------- | ---- ----- |
| abc | 2013-04-30 12:10:10 |值1 |值2 |
| aaa | 2011-04-02 13:10:10 |值1 |值2 |
| bbb | 2012-02-12 10:48:10 |值1 |值2 |
我尝试在WHERE子句中使用IN / NOT IN进行子查询的组合,但我无法获得正确的结果。
我还有另一个问题要问,如果有人已经面临这种问题,我很乐意听到他如何设法弄清楚。
您上面看到的第一张表中的记录经常(每十分钟)删除/插入。我的目的是制作一个后缀邮件服务器将全天候使用的结果的副本(或可能是视图)(没有重复条目)。我听说大视角(有许多子查询)会降低表现,这意味着一个表将是一个更好的选择。事情是,如果我必须每十分钟一个新的表,将有一点停机时间,后缀将无法读取表。
等待你的建议,谢谢已经。
编辑:
根据@Ed吉布斯答案,有一个更好的例子:
来源表:
域|创建日期|值1 |值2 |
| ------------ | --------------------- | --------- | --------- |
| google.com | 2013-05-28 15:35:01 |约翰玛丽|
| google.com | 2013-04-30 12:10:10 |帕特里克爱德华|
| yahoo.fr | 2011-04-02 13:10:10 |布兰妮| garry |
| ebay.com | 2012-02-12 10:48:10 |哈里| mickael |
| ebay.com | 2013-04-15 07:15:23 |帐单|爱丽丝|
查询结果是源表。
期望的结果:
域|值1 |值2 |
| ------------ | --------- | --------- |
| google.com |帕特里克爱德华|
| yahoo.fr |布兰妮| garry |
| ebay.com |哈里| mickael |
我想保留最旧的域(最小创建日期),其自己的值为1和2。
新问题!
结果如下所示:
|域|值1 | foreign_key |
| ------------ | --------- | ------------- |
| google.com |帕特里克X |
| yahoo.fr |布兰妮| Y |
| ebay.com |哈里| Z |
我还有一张这样的表格:
|电子邮件|值1 | foreign_key |
| -------------------- | --------- | ------------- |
| john@google.com |帕特里克X |
| john@google.com |布兰妮| Y |
| harry@google.com |玛丽| X |
| mickael@google.com |杰克| X |
| david@ebay.com |瓦尔特| Z |
| alice@yahoo.com |布赖恩Y |
假设(在本示例中)电子邮件来自Y foreign_key的%@google.com不是很好的记录(只有来自X外国的google.com只有好的,而且因为它的域名是我通过创建日期选择选择的)我如何设法仅从我的新视图中引用的域/ fk中只选择电子邮件?
所需结果:
电子邮件|值1 | foreign_key |
| -------------------- | --------- | ------------- |
| john@google.com |帕特里克X |
| harry@google.com |玛丽| X |
| mickael@google.com |杰克| X |
| david@ebay.com |瓦尔特| Z |
| alice@yahoo.com |布赖恩Y |
我尝试使用CONCAT('%','@',域)和foreign_key = foreign_key加入,但不给我我想要的。
根据您的样本数据和结果,一个 GROUP BY
会给你以后的结果:
SELECT
域,
MIN(创建日期)AS创建日期,
value1,
value2
FROM mytable
GROUP BY域,value1,value2
附录:@Arka提供了更新的示例数据,其中 value 1
和 value 2
列具有不同的值(原来是相同的)。将查询更改为:
SELECT domain,creationdate,value1,value2
FROM mytable
WHERE (domain,creationdate)IN(
SELECT domain,MIN(creationdate)
FROM mytable
GROUP BY domain)
子查询获取每个域
的最早的 creationdate
列表,而外部查询仅选择域
和 creationdate
匹配子查询值的行。
Thanks for your help i'm stuck on this problem.
Let me explain it, i have this kind of table :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb | 2013-04-15 07:15:23 | value 1 | value 2 |
And i want to select (with subqueries) this :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.
I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.
The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.
Waiting for your advices, thanks already.
EDIT :
Based on @Ed Gibbs answer, there is a better sample :
Source table :
| domain | creationdate | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john | mary |
| google.com | 2013-04-30 12:10:10 | patrick | edward |
| yahoo.fr | 2011-04-02 13:10:10 | britney | garry |
| ebay.com | 2012-02-12 10:48:10 | harry | mickael |
| ebay.com | 2013-04-15 07:15:23 | bill | alice |
With your query the result is the source table.
Desired result :
| domain | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward |
| yahoo.fr | britney | garry |
| ebay.com | harry | mickael |
I want to keep the oldest domain (with the min creation date) with its own value1 and 2.
New question !
I made a view of the desired result based on your anwser.
The result look like this :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.fr | britney | Y |
| ebay.com | harry | Z |
I also have a table with this kind of entries :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com | patrick | X |
| john@google.com | britney | Y |
| harry@google.com | mary | X |
| mickael@google.com | jack | X |
| david@ebay.com | walter | Z |
| alice@yahoo.com | brian | Y |
Assume that (in this sample) emails %@google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?
Desired result :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com | patrick | X |
| harry@google.com | mary | X |
| mickael@google.com | jack | X |
| david@ebay.com | walter | Z |
| alice@yahoo.com | brian | Y |
I tried with a CONCAT('%','@',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.
Based on your sample data and results, a GROUP BY
will give you the results you're after:
SELECT
domain,
MIN(creationdate) AS creationdate,
value1,
value2
FROM mytable
GROUP BY domain, value1, value2
Addendum: @Arka provided updated sample data where the value 1
and value 2
columns have different values (in the original they were the same). That changes the query to this:
SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
SELECT domain, MIN(creationdate)
FROM mytable
GROUP BY domain)
The subquery gets a list of the earliest creationdate
for each domain
, and the outer query only selects rows where the domain
and creationdate
match the subquery values.
这篇关于选择重复并保持最旧(不是基于ID)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!