Django - 删除有多个字段进行比较的重复对象 [英] Django - remove duplicate objects where there is more than one field to compare

查看:164
本文介绍了Django - 删除有多个字段进行比较的重复对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模型有四个字段。如何从我的数据库中删除重复的对象?



Daniel Roseman对这个问题似乎是适当的,但我不知道如何将这个扩展到有四个字段来比较每个对象的情况。



谢谢,



W。

解决方案

  unique_fields = ['field_1',...,'field_n'] 

duplicateates =(MyModel.objects.values(* unique_fields)
.order_by()
.annotate(max_id = models.Max('id'),
count_id = models.Count('id')
.filter(count_id__gt = 1) )

对于重复的副本:
(MyModel.objects.filter(** {x:unique [x] for x in unique_fields})
.exclude(id = dup licate ['max_id'])
.delete())

你不应该经常。在数据库上使用 unique_together 约束。



底层的SQL代码



注释django ORM时使用 GROUP BY 语句在查询中使用的所有模型字段。因此使用 .values()方法。 GROUP BY 将对具有相同值的所有记录进行分组。重复的( id for unique_fields )在 HAVING .filter()在注释的 QuerySet

  SELECT 
field_1,
...
field_n,
MAX(id)作为max_id,
COUNT(id)as count_id
FROM
app_mymodel
GROUP BY
field_1,
...
field_n
HAVING
count_id> 1

重复的记录将在中删除​​循环,每个组中最常见的一个例外。



空.order_by()



只要确定,在汇总 QuerySet .order_by()调用是很明智的C $ C>。



用于订购 QuerySet 的字段也包含在 GROUP BY 语句。空的 .order_by()覆盖在模型的 Meta 中声明的列,结果它们不包括在SQL查询中例如,按日期排序的默认排序可能会破坏结果)。



您可能不需要在当前时间重写它,但有人可能会在以后添加默认排序,从而破坏您的宝贵删除 - 重复代码甚至不知道。是的,我相信你有100%的考试覆盖率...



只需添加空的 .order_by()安全。 ; - )



交易



当然,您应该考虑一个单一的交易。



https://docs.djangoproject.com/en/1.10/topics/db/transactions/#django.db.transaction.atomic


I have a model that has four fields. How do I remove duplicate objects from my database?

Daniel Roseman's answer to this question seems appropriate, but I'm not sure how to extend this to situation where there are four fields to compare per object.

Thanks,

W.

解决方案

unique_fields = ['field_1', …, 'field_n']

duplicates = (MyModel.objects.values(*unique_fields)
                             .order_by()
                             .annotate(max_id=models.Max('id'),
                                       count_id=models.Count('id'))
                             .filter(count_id__gt=1))

for duplicate in duplicates:
    (MyModel.objects.filter(**{x: duplicate[x] for x in unique_fields})
                    .exclude(id=duplicate['max_id'])
                    .delete())

You shouldn't do it often. Use unique_together constraints on database instead.

Underlying SQL code

When annotating django ORM uses GROUP BY statement on all model fields used in the query. Thus the use of .values() method. GROUP BY will group all records having those values identical. The duplicated ones (more than one id for unique_fields) are later filtered out in HAVING statement generated by .filter() on annotated QuerySet.

SELECT
    field_1,
    …
    field_n,
    MAX(id) as max_id,
    COUNT(id) as count_id
FROM
    app_mymodel
GROUP BY
    field_1,
    …
    field_n
HAVING
    count_id > 1

The duplicated records are later deleted in the for loop with an exception to the most frequent one for each group.

Empty .order_by()

Just to be sure, it's always wise to add an empty .order_by() call before aggregating a QuerySet.

The fields used for ordering the QuerySet are also included in GROUP BY statement. Empty .order_by() overrides columns declared in model's Meta and in result they're not included in the SQL query (e.g. default sorting by date can ruin the results).

You might not need to override it at the current moment, but someone might add default ordering later and therefore ruin your precious delete-duplicates code not even knowing that. Yes, I'm sure you have 100% test coverage…

Just add empty .order_by() to be safe. ;-)

Transaction

Of course you should consider doing it all in a single transaction.

https://docs.djangoproject.com/en/1.10/topics/db/transactions/#django.db.transaction.atomic

这篇关于Django - 删除有多个字段进行比较的重复对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆