比较两个DataTable以确定一个行而不是另一个 [英] Compare two DataTables to determine rows in one but not the other

查看:60
本文介绍了比较两个DataTable以确定一个行而不是另一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据表,从CSV文件生成的 A B 。我需要检查在 A 中不存在的 B 中存在哪些行。



有没有办法做某种查询以显示不同的行,或者我必须遍历每个DataTable上的每一行以检查它们是否相同?如果表格变大,后一个选项似乎非常密集。

解决方案


迭代每个DataTable上的每一行,以检查它们是否相同。


看到您已经从CSV文件中加载数据,你不会有任何索引或任何东西,所以在某些时候,有些事情将不得不重复遍历每一行,无论是你的代码,还是图书馆,或者是什么。



无论如何,这是一个算法问题,这不是我的专长,但我天真的方法如下:



1:你能否利用数据的任何属性?每个表中的所有行都是唯一的,您可以按照相同的条件对它们进行排序吗?如果是这样,您可以这样做:




  • 按照ID(使用一些有用的东西,如快速排序)对两个表进行排序。如果他们已经被排序,那么你赢了。

  • 同时跨过两个表,跳过任一表中ID的任何空白。匹配的ID是重复的记录。



这允许你在(排序时间* 2)+一次通过,所以如果我的大-O符号是正确的,它将是(无论什么时候)+ O(m + n)这是相当不错的。

(修订:这是一种方法,ΤΤΩΤΖΥΥΥΥΥΥΥΥΥΥΥ>>>>>>>>>>>>>>>>>:::::::::::::::::::::::::/



我真的很感兴趣看看有什么人比我自己想到的算法更好的人: - )


I have two DataTables, A and B, produced from CSV files. I need to be able to check which rows exist in B that do not exist in A.

Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large.

解决方案

would I have to iterate through each row on each DataTable to check if they are the same.

Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.

Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:

1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:

  • Sort both tables by their ID (using some useful thing like a quicksort). If they're already sorted then you win big.
  • Step through both tables at once, skipping over any gaps in ID's in either table. Matched ID's mean duplicated records.

This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.
(Revision: this is the approach that
ΤΖΩΤΖΙΟΥ describes )

2: An alternative approach, which may be more or less efficient depending on how big your data is:

  • Run through table 1, and for each row, stick it's ID (or computed hashcode, or some other unique ID for that row) into a dictionary (or hashtable if you prefer to call it that).
  • Run through table 2, and for each row, see if the ID (or hashcode etc) is present in the dictionary. You're exploiting the fact that dictionaries have really fast - O(1) I think? lookup. This step will be really fast, but you'll have paid the price doing all those dictionary inserts.

I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)

这篇关于比较两个DataTable以确定一个行而不是另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆