比较两个DataTable以确定一个行而不是另一个 [英] Compare two DataTables to determine rows in one but not the other
问题描述
我有两个数据表,从CSV文件生成的 A
和 B
。我需要检查在 A
中不存在的 B
中存在哪些行。
有没有办法做某种查询以显示不同的行,或者我必须遍历每个DataTable上的每一行以检查它们是否相同?如果表格变大,后一个选项似乎非常密集。
迭代每个DataTable上的每一行,以检查它们是否相同。
看到您已经从CSV文件中加载数据,你不会有任何索引或任何东西,所以在某些时候,有些事情将不得不重复遍历每一行,无论是你的代码,还是图书馆,或者是什么。
无论如何,这是一个算法问题,这不是我的专长,但我天真的方法如下:
1:你能否利用数据的任何属性?每个表中的所有行都是唯一的,您可以按照相同的条件对它们进行排序吗?如果是这样,您可以这样做:
- 按照ID(使用一些有用的东西,如快速排序)对两个表进行排序。如果他们已经被排序,那么你赢了。
- 同时跨过两个表,跳过任一表中ID的任何空白。匹配的ID是重复的记录。
这允许你在(排序时间* 2)+一次通过,所以如果我的大-O符号是正确的,它将是(无论什么时候)+ O(m + n)这是相当不错的。 我真的很感兴趣看看有什么人比我自己想到的算法更好的人: - ) I have two DataTables, Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large. would I have to iterate through each row on each DataTable to check if they are the same. Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever. Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows: 1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this: This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good. 2: An alternative approach, which may be more or less efficient depending on how big your data is: I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-) 这篇关于比较两个DataTable以确定一个行而不是另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
(修订:这是一种方法,ΤΤΩΤΖΥΥΥΥΥΥΥΥΥΥΥ>>>>>>>>>>>>>>>>>:::::::::::::::::::::::::/
A
and B
, produced from CSV files. I need to be able to check which rows exist in B
that do not exist in A
.
(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )