选择重复并保持最旧(不是基于ID) [英] Select duplicate and keep the oldest (not based on ID)

查看:134
本文介绍了选择重复并保持最旧(不是基于ID)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢您的帮助,我坚持这个问题。



让我解释一下,我有这样的表:

  |域|创建日期|值1 |值2 | 
| -------- | --------------------- | --------- | ---- ----- |
| abc | 2013-05-28 15:35:01 |值1 |值2 |
| abc | 2013-04-30 12:10:10 |值1 |值2 |
| aaa | 2011-04-02 13:10:10 |值1 |值2 |
| bbb | 2012-02-12 10:48:10 |值1 |值2 |
| bbb | 2013-04-15 07:15:23 |值1 |值2 |

我想选择(使用子查询)这个:

  |域|创建日期|值1 |值2 | 
| -------- | --------------------- | --------- | ---- ----- |
| abc | 2013-04-30 12:10:10 |值1 |值2 |
| aaa | 2011-04-02 13:10:10 |值1 |值2 |
| bbb | 2012-02-12 10:48:10 |值1 |值2 |

我尝试在WHERE子句中使用IN / NOT IN进行子查询的组合,但我无法获得正确的结果。



我还有另一个问题要问,如果有人已经面临这种问题,我很乐意听到他如何设法弄清楚。



您上面看到的第一张表中的记录经常(每十分钟)删除/插入。我的目的是制作一个后缀邮件服务器将全天候使用的结果的副本(或可能是视图)(没有重复条目)。我听说大视角(有许多子查询)会降低表现,这意味着一个表将是一个更好的选择。事情是,如果我必须每十分钟一个新的表,将有一点停机时间,后缀将无法读取表。



等待你的建议,谢谢已经。



编辑:



根据@Ed吉布斯答案,有一个更好的例子:



来源表:

 域|创建日期|值1 |值2 | 
| ------------ | --------------------- | --------- | --------- |
| google.com | 2013-05-28 15:35:01 |约翰玛丽|
| google.com | 2013-04-30 12:10:10 |帕特里克爱德华|
| yahoo.fr | 2011-04-02 13:10:10 |布兰妮| garry |
| ebay.com | 2012-02-12 10:48:10 |哈里| mickael |
| ebay.com | 2013-04-15 07:15:23 |帐单|爱丽丝|

查询结果是源表。



期望的结果:

 域|值1 |值2 | 
| ------------ | --------- | --------- |
| google.com |帕特里克爱德华|
| yahoo.fr |布兰妮| garry |
| ebay.com |哈里| mickael |

我想保留最旧的域(最小创建日期),其自己的值为1和2。






新问题!





结果如下所示:

  |域|值1 | foreign_key | 
| ------------ | --------- | ------------- |
| google.com |帕特里克X |
| yahoo.fr |布兰妮| Y |
| ebay.com |哈里| Z |

我还有一张这样的表格:

  |电子邮件|值1 | foreign_key | 
| -------------------- | --------- | ------------- |
| john@google.com |帕特里克X |
| john@google.com |布兰妮| Y |
| harry@google.com |玛丽| X |
| mickael@google.com |杰克| X |
| david@ebay.com |瓦尔特| Z |
| alice@yahoo.com |布赖恩Y |

假设(在本示例中)电子邮件来自Y foreign_key的%@google.com不是很好的记录(只有来自X外国的google.com只有好的,而且因为它的域名是我通过创建日期选择选择的)我如何设法仅从我的新视图中引用的域/ fk中只选择电子邮件?



所需结果:

 电子邮件|值1 | foreign_key | 
| -------------------- | --------- | ------------- |
| john@google.com |帕特里克X |
| harry@google.com |玛丽| X |
| mickael@google.com |杰克| X |
| david@ebay.com |瓦尔特| Z |
| alice@yahoo.com |布赖恩Y |

我尝试使用CONCAT('%','@',域)和foreign_key = foreign_key加入,但不给我我想要的。

解决方案

根据您的样本数据和结果,一个 GROUP BY 会给你以后的结果:

  SELECT 
域,
MIN(创建日期)AS创建日期,
value1,
value2
FROM mytable
GROUP BY域,value1,value2






附录:@Arka提供了更新的示例数据,其中 value 1 value 2 列具有不同的值(原来是相同的)。将查询更改为:

  SELECT domain,creationdate,value1,value2 
FROM mytable
WHERE (domain,creationdate)IN(
SELECT domain,MIN(creationdate)
FROM mytable
GROUP BY domain)

子查询获取每个的最早的 creationdate 列表,而外部查询仅选择 creationdate 匹配子查询值的行。


Thanks for your help i'm stuck on this problem.

Let me explain it, i have this kind of table :

| domain |     creationdate    | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc    | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc    | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa    | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb    | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb    | 2013-04-15 07:15:23 | value 1 | value 2 |

And i want to select (with subqueries) this :

| domain |     creationdate    | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc    | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa    | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb    | 2012-02-12 10:48:10 | value 1 | value 2 |

I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.

I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.

The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.

Waiting for your advices, thanks already.

EDIT :

Based on @Ed Gibbs answer, there is a better sample :

Source table :

| domain     |     creationdate    | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john    | mary    |
| google.com | 2013-04-30 12:10:10 | patrick | edward  |
| yahoo.fr   | 2011-04-02 13:10:10 | britney | garry   |
| ebay.com   | 2012-02-12 10:48:10 | harry   | mickael |
| ebay.com   | 2013-04-15 07:15:23 | bill    | alice   |

With your query the result is the source table.

Desired result :

| domain     | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward  |
| yahoo.fr   | britney | garry   |
| ebay.com   | harry   | mickael |

I want to keep the oldest domain (with the min creation date) with its own value1 and 2.


New question !

I made a view of the desired result based on your anwser.

The result look like this :

| domain     | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X           |
| yahoo.fr   | britney | Y           |
| ebay.com   | harry   | Z           |

I also have a table with this kind of entries :

| email              | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com    | patrick | X           |
| john@google.com    | britney | Y           |
| harry@google.com   | mary    | X           |
| mickael@google.com | jack    | X           |
| david@ebay.com     | walter  | Z           |
| alice@yahoo.com    | brian   | Y           |

Assume that (in this sample) emails %@google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?

Desired result :

| email              | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com    | patrick | X           |
| harry@google.com   | mary    | X           |
| mickael@google.com | jack    | X           |
| david@ebay.com     | walter  | Z           |
| alice@yahoo.com    | brian   | Y           |

I tried with a CONCAT('%','@',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.

解决方案

Based on your sample data and results, a GROUP BY will give you the results you're after:

SELECT
  domain,
  MIN(creationdate) AS creationdate,
  value1,
  value2
FROM mytable
GROUP BY domain, value1, value2


Addendum: @Arka provided updated sample data where the value 1 and value 2 columns have different values (in the original they were the same). That changes the query to this:

SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
  SELECT domain, MIN(creationdate)
  FROM mytable
  GROUP BY domain)

The subquery gets a list of the earliest creationdate for each domain, and the outer query only selects rows where the domain and creationdate match the subquery values.

这篇关于选择重复并保持最旧(不是基于ID)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆