比较两个数据表,以确定在一个行而不是其他 [英] Compare two DataTables to determine rows in one but not the other

查看:84
本文介绍了比较两个数据表,以确定在一个行而不是其他的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据表, A B ,CSV文件产生的。我需要能够检查哪些行的 B 指不以 A 存在的存在。

有没有办法做一些查询来显示不同​​的行或我将不得不通过每个DataTable的每一行进行迭代,以检查它们是否相同?后一种选择似乎是非常密集,如果表变大。


解决方案

  

我将不得不通过每个DataTable的每一行遍历检查,如果他们是相同的。


看到你已经加载从CSV文件中的数据,你不会有任何索引或任何东西,所以在某些时候,有些事情是会不得不通过每一行迭代,无论是你的code或图书馆,或什么的。

无论如何,这是一个算法的问题,这不是我的专业,但我的幼稚的做法是如下:

1:你可以利用数据的任何属性?在每个表中的所有行唯一的,您可以用同样的标准对它们进行排序两者兼而有之?如果是这样,你可以这样做:


  • 按自己的ID两个表(使用像一些快速排序有用的东西)。如果他们已经被排序,那么你赢得大。

  • 通过这两个表一次,跳过在ID的任何差距在两个表步骤。匹配的ID的平均重复记录。

这允许你做它(排序时间* 2)+一遍,所以如果我的大O表示法是正确的,它会是(无论排序时间)+ O(M + N)的为pretty不错。

(修订版:这是方法,<一个href=\"http://stackoverflow.com/questions/164144/c-how-to-compare-two-datatables-a-b-how-to-show-rows-which-are-in-b-but-not-in-a#164213\">ΤΖΩΤΖΙΟΥ描述)

2:另一种方法,其可以是或多或少有效取决于数据多大:


  • 通过表1运行,并为每一行,坚持它的ID(或计算出的散列code,或该行一些其他的唯一ID)转换成字典(或哈希表,如果你preFER来称呼它)。

  • 通过表2运行,并为每一行,看是否ID(或哈希code等)在字典present。你利用的事实,词典有真快 - 我认为O(1)?抬头。这一步将是非常快的,但你已支付做的所有这些字典刀片的价格。

我会看到什么人的算法,更好地了解真正的兴趣比我拿出这一个: - )

I have two DataTables, A and B, produced from CSV files. I need to be able to check which rows exist in B that do not exist in A.

Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large.

解决方案

would I have to iterate through each row on each DataTable to check if they are the same.

Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.

Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:

1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:

  • Sort both tables by their ID (using some useful thing like a quicksort). If they're already sorted then you win big.
  • Step through both tables at once, skipping over any gaps in ID's in either table. Matched ID's mean duplicated records.

This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.
(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )

2: An alternative approach, which may be more or less efficient depending on how big your data is:

  • Run through table 1, and for each row, stick it's ID (or computed hashcode, or some other unique ID for that row) into a dictionary (or hashtable if you prefer to call it that).
  • Run through table 2, and for each row, see if the ID (or hashcode etc) is present in the dictionary. You're exploiting the fact that dictionaries have really fast - O(1) I think? lookup. This step will be really fast, but you'll have paid the price doing all those dictionary inserts.

I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)

这篇关于比较两个数据表,以确定在一个行而不是其他的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆