如何根据另一个文件中的列表值从csv文件中删除行? [英] How to delete rows from a csv file based on a list values from another file?

查看:52
本文介绍了如何根据另一个文件中的列表值从csv文件中删除行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件:

candidates.csv :

id,value
1,123
4,1
2,5
50,5

blacklist.csv :

1
2
5
3
10

我想从 candidates.csv 中删除所有行,其中第一列( id )的值包含在 blacklist.csv . id 始终为数字.在这种情况下,我希望输出看起来像这样:

I'd like to remove all rows from candidates.csv in which the first column (id) has a value contained in blacklist.csv. id is always numeric. In this case I'd like my output to look like this:

id,value
4,1
50,5

到目前为止,我用于识别重复行的脚本如下所示:

So far, my script for identifying the duplicate lines looks like this:

cat candidates.csv | cut -d \, -f 1 | grep -f blacklist.csv -w

这给了我输出

1
2

现在,我需要以某种方式将此信息通过管道传回 sed / awk / gawk /...,以删除重复项,但是我不知道如何有什么想法我可以从这里继续吗?还是有更好的解决方案?我唯一的限制是它必须在bash中运行.

Now I somehow need to pipe this information back into sed/awk/gawk/... to delete the duplicates, but I don't know how. Any ideas how I can continue from here? Or is there a better solution altogether? My only restriction is that it has to run in bash.

推荐答案

有关以下内容:

 awk -F, '(NR==FNR){a[$1];next}!($1 in a)' blacklist.csv candidates.csv

这是如何工作的?

awk程序是一系列模式-动作对,写为:

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

其中 condition 通常是一个表达式,而 action 是一系列命令.在这里,第一个条件操作对显示为:

where condition is typically an expression and action a series of commands. Here, the first condition-action pairs read:

  • (NR == FNR){a [$ 1]; next} ,如果总记录计数 NR 等于文件 FNR (即,如果我们正在读取第一个文件),将所有值存储在数组 a 中,然后跳至下一条记录(不执行其他任何操作)
  • !($ 1 in a)如果第一个字段不在数组 a 中,则执行默认操作,即打印行.这将仅对第二个文件起作用,因为第一个条件操作对的条件不成立.
  • (NR==FNR){a[$1];next} if the total record count NR equals the record count of the file FNR (i.e. if we are reading the first file), store all values in array a and skip to the next record (do not do anything else)
  • !($1 in a) if the first field is not in the array a then perform the default action which is print the line. This will only work on the second file as the condition of the first condition-action pair does not hold.

这篇关于如何根据另一个文件中的列表值从csv文件中删除行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆