删除基于多列的重复记录? [英] Remove duplicate records based on multiple columns?
问题描述
我正在使用 Heroku 来托管我的 Ruby on Rails 应用程序,出于某种原因,我可能有一些重复的行.
I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows.
有没有办法根据 2 个或更多条件删除重复记录,但只保留该重复集合的 1 条记录?
Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection?
在我的用例中,我的数据库中有汽车的品牌和型号关系.
In my use case, I have a Make and Model relationship for cars in my database.
Make Model
--- ---
Name Name
Year
Trim
MakeId
我想删除所有具有相同名称、年份和修剪但保留其中 1 条记录的模型记录(意思是,我需要该记录但只需要一次).我正在使用 Heroku 控制台,所以我可以轻松地运行一些活动记录查询.
I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily.
有什么建议吗?
推荐答案
class Model
def self.dedupe
# find all models and group them on keys which should be common
grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] }
grouped.values.each do |duplicates|
# the first one we want to keep right?
first_one = duplicates.shift # or pop for last one
# if there are any more left, they are duplicates
# so delete all of them
duplicates.each{|double| double.destroy} # duplicates can now be destroyed
end
end
end
Model.dedupe
- 查找全部
- 将它们分组到您需要唯一性的键上
- 循环分组模型的哈希值
- 删除第一个值,因为您想保留一个副本
- 删除其余部分
这篇关于删除基于多列的重复记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!