比较两个数据表以确定一个行而不是另一个 [英] Compare two DataTables to determine rows in one but not the other

查看:24
本文介绍了比较两个数据表以确定一个行而不是另一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据表,AB,从 CSV 文件生成.我需要能够检查 B 中存在哪些行而 A 中不存在.

I have two DataTables, A and B, produced from CSV files. I need to be able to check which rows exist in B that do not exist in A.

有没有办法做某种查询来显示不同的行,或者我是否必须遍历每个 DataTable 上的每一行以检查它们是否相同?如果表变大,后一种选择似乎非常密集.

Is there a way to do some sort of query to show the different rows or would I have to iterate through each row on each DataTable to check if they are the same? The latter option seems to be very intensive if the tables become large.

推荐答案

我是否必须遍历每个 DataTable 上的每一行以检查它们是否相同.

would I have to iterate through each row on each DataTable to check if they are the same.

当您从 CSV 文件加载数据时,您将不会有任何索引或任何东西,因此在某些时候,必须遍历每一行,无论是您的代码,或者图书馆,或者其他什么.

Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.

无论如何,这是一个算法问题,这不是我的专长,但我的幼稚方法如下:

Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:

1:你能利用数据的任何属性吗?每个表中的所有行是否都是唯一的,您能否按照相同的标准对它们进行排序?如果是这样,您可以这样做:

1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:

  • 按 ID 对两个表进行排序(使用一些有用的东西,例如快速排序).如果它们已经被排序,那么你就赢了.
  • 一次遍历两个表,跳过任一表中 ID 中的任何空白.匹配 ID 的意思是重复记录.

这允许你在 (sort time * 2 ) + 一次通过中完成,所以如果我的 big-O-notation 是正确的,它会是 (whatever-sort-time) + O(m+n) 其中还不错.
(修订:这是 ΤΖΩΤΖΙΟΥ 描述 )

This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.
(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )

2:另一种方法,它可能或多或少的效率取决于您的数据有多大:

2: An alternative approach, which may be more or less efficient depending on how big your data is:

  • 遍历表 1,对于每一行,将它的 ID(或计算出的哈希码,或该行的其他一些唯一 ID)粘贴到字典(或哈希表,如果您愿意这样称呼).
  • 遍历表 2,对于每一行,查看字典中是否存在 ID(或哈希码等).您正在利用字典速度非常快的事实-我认为 O(1)?抬头.这一步会非常快,但您已经为所有这些字典插入付出了代价.

我真的很想看看比我更了解算法的人为这个算法想出了什么:-)

I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)

这篇关于比较两个数据表以确定一个行而不是另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆