PHP-比较两个CSV文件,查找重复项,并从其中一个文件中删除匹配的行 [英] PHP- Compare two CSV files, look for duplicates and remove matching rows from one of the files
问题描述
我正在尽力学习PHP,并自己去做。
我有两个CSV文件,每行数百行。
CSV 1看起来像这样:
名称,电子邮件,兴趣
CSV 2看起来像这样:
只有电子邮件
我想写一个脚本来比较这两个文件寻找重复项。我只想保留重复的。但你可以看到,CSV 2只包含一封电子邮件。如果CSV中的电子邮件1在CSV 2中不存在,则应删除CSV 1中包含该电子邮件的行。
最终结果可以覆盖CSV 1或创建一个名为final.csv的新文件...任何最简单的。
我将非常感谢您的帮助。
我试过这些行,没有运气:
egrep -v $(cat csv2.csv | tr'\\\
''|'| sed's /.$//')csv1.csv
和
grep -v -f csv22.csv csv1.csv> output-file
干杯,
marc
这里是一个脚本,将循环遍历这两个文件,并输出第三个文件,其中file2中的电子邮件地址在file1中找到。
if(($ file3 = fopen(file3.csv,w))!== FALSE){
if(($ file1 = fopen(file1.csv,r))!== FALSE){
while(($ file1Row = fgetcsv($ file1))!== FALSE){
if(($ file2 = fopen(file2.csv,r))!== FALSE){
while(($ file2Row = fgetcsv($ file2))!== FALSE) {
if(strtolower(trim($ file2Row [0]))== strtolower(trim($ file1Row [1])))
fputcsv($ file3,$ file1Row);
}
fclose($ file2);
}
}
fclose($ file1);
}
fclose($ file3);
}
几个笔记: b
$ b
- 您可能需要为 fgetcsv提供一些额外的参数
- 根据您如何列出每个文件的内容,此代码读取file1的第2列,并且根据您的csv的结构(例如分隔符, file2的第1列。如果这不是真正的如何定位,你需要改变
$ file1Row [1]
和$ file2Row [0]
。列#从0开始。 - 如果file3.csv存在,脚本当前设置为覆盖。如果你想要附加而不是覆盖,改变
$ file3
fopen 改为a而非w
$ b
file1.csv:
john,john @ foobar.com,blah
mary,mary @ blah.com,something
jane,jan @ something.com,blarg
bob,bob @ test.com,asdfsfd
file2.csv
mary@blah.com
bob@test.com
$ b b
file3.csv(已产生)
mary,mary @ blah.com
bob,bob @ test.com,asdfsfd
I'm trying my best to learn PHP and hack things out myself. But this part has me stuck.
I have two CSV files with hundreds of rows each.
CSV 1 looks like this:
name, email, interest
CSV 2 looks like this:
email only
I'm trying to write a script to compare the two files looking for duplicates. I only want to keep the duplicates. But as you can see, CSV 2 only contains an email. If an email in CSV 1 DOES NOT EXIST in CSV 2, then the row containing that email in CSV 1 should be deleted.
The end result can either overwrite CSV 1 or create a fresh new file called "final.csv"... whatever is easiest.
I would be grateful for the help.
I tried something along these lines with no luck:
egrep -v $(cat csv2.csv | tr '\n' '|' | sed 's/.$//') csv1.csv
and
grep -v -f csv22.csv csv1.csv >output-file
cheers,
marc
Here is a script that will loop through both files and output a 3rd file where email addresses in file2 are found in file1.
if (($file3 = fopen("file3.csv", "w")) !== FALSE) {
if (($file1 = fopen("file1.csv", "r")) !== FALSE) {
while (($file1Row = fgetcsv($file1)) !== FALSE) {
if (($file2 = fopen("file2.csv", "r")) !== FALSE) {
while (($file2Row = fgetcsv($file2)) !== FALSE) {
if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
fputcsv($file3, $file1Row);
}
fclose($file2);
}
}
fclose($file1);
}
fclose($file3);
}
Couple of notes:
- You may need to provide some additional arguments to fgetcsv, depending on how your csv is structured (e.g. delimiter, quotes)
- Based on how you listed the contents of each file, this code reads the 2nd column of file1, and the 1st column of file2. If that's not really how they are positioned, you will need to change the number in the bracket for
$file1Row[1]
and$file2Row[0]
. Column # starts at 0. - Script is current set to overwrite if file3.csv exists. If you want it to append instead of overwrite, change the 2nd argument of the
$file3
fopen to "a" instead of "w"
Example:
file1.csv:
john,john@foobar.com,blah
mary,mary@blah.com,something
jane,jan@something.com,blarg
bob,bob@test.com,asdfsfd
file2.csv
mary@blah.com
bob@test.com
file3.csv (generated)
mary,mary@blah.com,something
bob,bob@test.com,asdfsfd
这篇关于PHP-比较两个CSV文件,查找重复项,并从其中一个文件中删除匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!