匹配两个巨大的CSV文件之间的共同标识 [英] Match common IDs between two huge csv files

查看：130 发布时间：2016/7/28 15:09:23 bash csv replace awk

本文介绍了匹配两个巨大的CSV文件之间的共同标识的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要两个巨大的CSV文件比较条目千像波纹管：

I need to compare two huge csv files with a thousand of entries like bellow:

id;val

1;a
2;b 
3;c

答第二文件具有以下结构

Ans second file has the following structure

id1;entry    
1;002
2;x90 
5;d07

期望的结果是相匹配，并与ID / ID1相同的值相结合的线条，创造出与波纹管只匹配条目第三CSV文件：

The desired result is to match and combine the lines with the same value for id/id1 and create a third csv file with only matched entries showing bellow:

idR;valR;entryR
1;a;002
2;b;x90

要做到这一点，我可以加载每个文件中的一个独特的数据库表，并执行一个选择像这样来检索所有匹配的值：

To accomplish this I can load each file in a distinct database table and perform a select like this to retrieve all matched values:

select tb1.id, tb1.val tb2.entry 
  from tb1, tb2
 where tb1.id = tb2.1

目前，一旦我可以检索所需使用这种方法的所有值。

At once I can retrieve all values desired with this approach.

但是让我们假设这些文件可以进行排序，并以这种方式使用它可能用awk打印结果与ID和ID1相同值的条目。我所能做的最好是为每个值创建两个关联数组和用awk执行二进制搜索和sed /切？

But let's suppose these files could be sorted and in this way use it's possible to use awk to print the results for a entries with the same values for id and id1. The best that I can do is to create two associative arrays for each value and perform a binary search using awk and sed/cut?

这是可能加载这两个文件并立即将它们结合起来，产生的结果最终csv文件？

It's possible to load these two files and combine them at once to produce a final csv file with the results?

或者我可以给这与标准库的Perl？

Or I can to this with perl with standard lib?

推荐答案

能做到这一点与标准 加入 效用

Can do this with standard join utility

FILE1.TXT

1 a
2 b
3 c

FILE2.TXT

1 002
2 x90
5 d07

加入例如

join -1 1 -2 1 -o 1.1,1.2,2.2 file1.txt file2.txt

这里

加入从file1.field1加入到file2.field2和输出使用-o标志指定的字段

here join is joining from file1.field1 to file2.field2 and outputting the fields specified with the -o flag

输出

1 a 002
2 b x90

这篇关于匹配两个巨大的CSV文件之间的共同标识的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配两个巨大的CSV文件之间的共同标识 [英] Match common IDs between two huge csv files

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

匹配两个巨大的CSV文件之间的共同标识 [英] Match common IDs between two huge csv files

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭