PHP-比较两个CSV文件,查找重复项,并从其中一个文件中删除匹配的行 [英] PHP- Compare two CSV files, look for duplicates and remove matching rows from one of the files

查看:773
本文介绍了PHP-比较两个CSV文件,查找重复项,并从其中一个文件中删除匹配的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尽力学习PHP,并自己去做。

我有两个CSV文件,每行数百行。



CSV 1看起来像这样:



名称,电子邮件,兴趣



CSV 2看起来像这样:



只有电子邮件



我想写一个脚本来比较这两个文件寻找重复项。我只想保留重复的。但你可以看到,CSV 2只包含一封电子邮件。如果CSV中的电子邮件1在CSV 2中不存在,则应删除CSV 1中包含该电子邮件的行。



最终结果可以覆盖CSV 1或创建一个名为final.csv的新文件...任何最简单的。



我将非常感谢您的帮助。



我试过这些行,没有运气:

  egrep -v $(cat csv2.csv | tr'\\\
''|'| sed's /.$//')csv1.csv

  grep -v -f csv22.csv csv1.csv> output-file 

干杯,



marc

解决方案

这里是一个脚本,将循环遍历这两个文件,并输出第三个文件,其中file2中的电子邮件地址在file1中找到。

  if(($ file3 = fopen(file3.csv,w))!== FALSE){
if(($ file1 = fopen(file1.csv,r))!== FALSE){
while(($ file1Row = fgetcsv($ file1))!== FALSE){
if(($ file2 = fopen(file2.csv,r))!== FALSE){
while(($ file2Row = fgetcsv($ file2))!== FALSE) {
if(strtolower(trim($ file2Row [0]))== strtolower(trim($ file1Row [1])))
fputcsv($ file3,$ file1Row);
}
fclose($ file2);
}
}
fclose($ file1);
}
fclose($ file3);
}

几个笔记: b
$ b





$ b

file1.csv:



  john,john @ foobar.com,blah 
mary,mary @ blah.com,something
jane,jan @ something.com,blarg
bob,bob @ test.com,asdfsfd

file2.csv



  mary@blah.com 
bob@test.com


$ b b

file3.csv(已产生)



  mary,mary @ blah.com 
bob,bob @ test.com,asdfsfd


I'm trying my best to learn PHP and hack things out myself. But this part has me stuck.

I have two CSV files with hundreds of rows each.

CSV 1 looks like this:

name, email, interest

CSV 2 looks like this:

email only

I'm trying to write a script to compare the two files looking for duplicates. I only want to keep the duplicates. But as you can see, CSV 2 only contains an email. If an email in CSV 1 DOES NOT EXIST in CSV 2, then the row containing that email in CSV 1 should be deleted.

The end result can either overwrite CSV 1 or create a fresh new file called "final.csv"... whatever is easiest.

I would be grateful for the help.

I tried something along these lines with no luck:

egrep -v $(cat csv2.csv | tr '\n' '|' | sed 's/.$//') csv1.csv

and

grep -v -f csv22.csv csv1.csv >output-file

cheers,

marc

解决方案

Here is a script that will loop through both files and output a 3rd file where email addresses in file2 are found in file1.

if (($file3 = fopen("file3.csv", "w")) !== FALSE) {
  if (($file1 = fopen("file1.csv", "r")) !== FALSE) {
    while (($file1Row = fgetcsv($file1)) !== FALSE) {
      if (($file2 = fopen("file2.csv", "r")) !== FALSE) {
        while (($file2Row = fgetcsv($file2)) !== FALSE) {
          if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
            fputcsv($file3, $file1Row);             
        }
        fclose($file2);
      }
    }
    fclose($file1);
  }
  fclose($file3);
}

Couple of notes:

  • You may need to provide some additional arguments to fgetcsv, depending on how your csv is structured (e.g. delimiter, quotes)
  • Based on how you listed the contents of each file, this code reads the 2nd column of file1, and the 1st column of file2. If that's not really how they are positioned, you will need to change the number in the bracket for $file1Row[1] and $file2Row[0]. Column # starts at 0.
  • Script is current set to overwrite if file3.csv exists. If you want it to append instead of overwrite, change the 2nd argument of the $file3 fopen to "a" instead of "w"

Example:

file1.csv:

john,john@foobar.com,blah
mary,mary@blah.com,something
jane,jan@something.com,blarg
bob,bob@test.com,asdfsfd

file2.csv

mary@blah.com
bob@test.com

file3.csv (generated)

mary,mary@blah.com,something
bob,bob@test.com,asdfsfd

这篇关于PHP-比较两个CSV文件,查找重复项,并从其中一个文件中删除匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆