从数据表中删除重复项的最佳方法是什么? [英] What is the best way to remove duplicates from a datatable?

查看:26
本文介绍了从数据表中删除重复项的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我检查了整个网站并在网上搜索,但无法找到解决此问题的简单方法.

I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.

我有一个大约有 20 列和 10K 行的数据表.我需要根据 4 个键列删除此数据表中的重复行..Net 没有执行此操作的功能吗?最接近我正在寻找的函数是 datatable.DefaultView.ToTable(true, array of columns to display),但是这个函数对所有的列做了不同的处理.

I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking for was datatable.DefaultView.ToTable(true, array of columns to display), But this function does a distinct on all the columns.

如果有人能帮我解决这个问题就好了.

It would be great if someone could help me with this.

我很抱歉对此不清楚.该数据表是通过读取 CSV 文件而不是从数据库创建的.所以使用 SQL 查询不是一种选择.

I am sorry for not being clear on this. This datatable is being created by reading a CSV file and not from a DB. So using an SQL query is not an option.

推荐答案

您可以使用 Linq to Datasets.检查这个.像这样:

You can use Linq to Datasets. Check this. Something like this:

// Fill the DataSet.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);

List<DataRow> rows = new List<DataRow>();

DataTable contact = ds.Tables["Contact"];

// Get 100 rows from the Contact table.
IEnumerable<DataRow> query = (from c in contact.AsEnumerable()
                              select c).Take(100);

DataTable contactsTableWith100Rows = query.CopyToDataTable();

// Add 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
    rows.Add(row);

// Create duplicate rows by adding the same 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
    rows.Add(row);

DataTable table =
    System.Data.DataTableExtensions.CopyToDataTable<DataRow>(rows);

// Find the unique contacts in the table.
IEnumerable<DataRow> uniqueContacts =
    table.AsEnumerable().Distinct(DataRowComparer.Default);

Console.WriteLine("Unique contacts:");
foreach (DataRow uniqueContact in uniqueContacts)
{
    Console.WriteLine(uniqueContact.Field<Int32>("ContactID"));
}

这篇关于从数据表中删除重复项的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆