Django - 删除有多个字段进行比较的重复对象 [英] Django - remove duplicate objects where there is more than one field to compare
问题描述
Daniel Roseman对这个问题似乎是适当的,但我不知道如何将这个扩展到有四个字段来比较每个对象的情况。
谢谢,
W。
unique_fields = ['field_1',...,'field_n']
duplicateates =(MyModel.objects.values(* unique_fields)
.order_by()
.annotate(max_id = models.Max('id'),
count_id = models.Count('id')
.filter(count_id__gt = 1) )
对于重复的副本:
(MyModel.objects.filter(** {x:unique [x] for x in unique_fields})
.exclude(id = dup licate ['max_id'])
.delete())
你不应该经常。在数据库上使用 unique_together
约束。
底层的SQL代码
注释django ORM时使用 GROUP BY
语句在查询中使用的所有模型字段。因此使用 .values()
方法。 GROUP BY
将对具有相同值的所有记录进行分组。重复的( id
for unique_fields
)在 HAVING
在 .filter()
在注释的 QuerySet
SELECT
field_1,
...
field_n,
MAX(id)作为max_id,
COUNT(id)as count_id
FROM
app_mymodel
GROUP BY
field_1,
...
field_n
HAVING
count_id> 1
重复的记录将在中删除
循环,每个组中最常见的一个例外。
空.order_by()
只要确定,在汇总 QuerySet $之前添加一个空的
.order_by()
调用是很明智的C $ C>。
用于订购 QuerySet
的字段也包含在 GROUP BY
语句。空的 .order_by()
覆盖在模型的 Meta
中声明的列,结果它们不包括在SQL查询中例如,按日期排序的默认排序可能会破坏结果)。
您可能不需要在当前时间重写它,但有人可能会在以后添加默认排序,从而破坏您的宝贵删除 - 重复代码甚至不知道。是的,我相信你有100%的考试覆盖率...
只需添加空的 .order_by()
安全。 ; - )
交易
当然,您应该考虑一个单一的交易。
https://docs.djangoproject.com/en/1.10/topics/db/transactions/#django.db.transaction.atomic
I have a model that has four fields. How do I remove duplicate objects from my database?
Daniel Roseman's answer to this question seems appropriate, but I'm not sure how to extend this to situation where there are four fields to compare per object.
Thanks,
W.
unique_fields = ['field_1', …, 'field_n']
duplicates = (MyModel.objects.values(*unique_fields)
.order_by()
.annotate(max_id=models.Max('id'),
count_id=models.Count('id'))
.filter(count_id__gt=1))
for duplicate in duplicates:
(MyModel.objects.filter(**{x: duplicate[x] for x in unique_fields})
.exclude(id=duplicate['max_id'])
.delete())
You shouldn't do it often. Use unique_together
constraints on database instead.
Underlying SQL code
When annotating django ORM uses GROUP BY
statement on all model fields used in the query. Thus the use of .values()
method. GROUP BY
will group all records having those values identical. The duplicated ones (more than one id
for unique_fields
) are later filtered out in HAVING
statement generated by .filter()
on annotated QuerySet
.
SELECT
field_1,
…
field_n,
MAX(id) as max_id,
COUNT(id) as count_id
FROM
app_mymodel
GROUP BY
field_1,
…
field_n
HAVING
count_id > 1
The duplicated records are later deleted in the for
loop with an exception to the most frequent one for each group.
Empty .order_by()
Just to be sure, it's always wise to add an empty .order_by()
call before aggregating a QuerySet
.
The fields used for ordering the QuerySet
are also included in GROUP BY
statement. Empty .order_by()
overrides columns declared in model's Meta
and in result they're not included in the SQL query (e.g. default sorting by date can ruin the results).
You might not need to override it at the current moment, but someone might add default ordering later and therefore ruin your precious delete-duplicates code not even knowing that. Yes, I'm sure you have 100% test coverage…
Just add empty .order_by()
to be safe. ;-)
Transaction
Of course you should consider doing it all in a single transaction.
https://docs.djangoproject.com/en/1.10/topics/db/transactions/#django.db.transaction.atomic
这篇关于Django - 删除有多个字段进行比较的重复对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!