从数据表中删除重复的最佳方式是什么? [英] What is the best way to remove duplicates from a datatable?
问题描述
我有一个datatable,它有大约20列和10K行。我需要删除这个datatable中的重复行基于4个关键列。 .Net有没有这样做的功能?最接近我正在寻找的函数是datatable.DefaultView.ToTable(true,要显示的列数组),但是这个函数在所有列之间有一个区别。
如果有人可以帮助我,这将是很棒的。
编辑:对不起,我很抱歉。该数据表是通过读取CSV文件而不是从数据库创建的。所以使用SQL查询不是一个选项。
可以使用Linq to Datasets。请查看此。这样的东西:
//填写DataSet。
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);
列表< DataRow> rows = new List< DataRow>();
DataTable contact = ds.Tables [Contact];
//从联系人表中获取100行。
IEnumerable< DataRow> query =(from c in contact.AsEnumerable()
select c).Take(100);
DataTable contactsTableWith100Rows = query.CopyToDataTable();
//向列表中添加100行。
foreach(ContactsTableWith100Rows.Rows中的DataRow行)
rows.Add(row);
//通过向列表中添加相同的100行来创建重复的行。
foreach(ContactsTableWith100Rows.Rows中的DataRow行)
rows.Add(row);
DataTable table =
System.Data.DataTableExtensions.CopyToDataTable< DataRow>(rows);
//查找表中唯一的联系人。
IEnumerable< DataRow> uniqueContacts =
table.AsEnumerable()。Distinct(DataRowComparer.Default);
Console.WriteLine(Unique contacts:);
foreach(DataRow uniqueContact in uniqueContacts)
{
Console.WriteLine(uniqueContact.Field< Int32>(ContactID));
}
I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.
I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking for was datatable.DefaultView.ToTable(true, array of columns to display), But this function does a distinct on all the columns.
It would be great if someone could help me with this.
EDIT: I am sorry for not being clear on this. This datatable is being created by reading a CSV file and not from a DB. So using an SQL query is not an option.
You can use Linq to Datasets. Check this. Something like this:
// Fill the DataSet.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);
List<DataRow> rows = new List<DataRow>();
DataTable contact = ds.Tables["Contact"];
// Get 100 rows from the Contact table.
IEnumerable<DataRow> query = (from c in contact.AsEnumerable()
select c).Take(100);
DataTable contactsTableWith100Rows = query.CopyToDataTable();
// Add 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
// Create duplicate rows by adding the same 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
DataTable table =
System.Data.DataTableExtensions.CopyToDataTable<DataRow>(rows);
// Find the unique contacts in the table.
IEnumerable<DataRow> uniqueContacts =
table.AsEnumerable().Distinct(DataRowComparer.Default);
Console.WriteLine("Unique contacts:");
foreach (DataRow uniqueContact in uniqueContacts)
{
Console.WriteLine(uniqueContact.Field<Int32>("ContactID"));
}
这篇关于从数据表中删除重复的最佳方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!