根据第二个文本文件从文本文件中删除重复项 [英] Remove duplicates from text file based on second text file

查看:31
本文介绍了根据第二个文本文件从文本文件中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何通过检查第二个文本文件 (removthese.txt) 从文本文件 (main.txt) 中删除所有行.如果文件大于 10-100mb,什么是有效方法.[使用mac]

How can I remove all lines from a text file (main.txt) by checking a second textfile (removethese.txt). What is an efficient approach if files are greater than 10-100mb. [Using mac]

main.txt
3
1
2
5

删除这些行

removethese.txt
3
2
9

输出:

output.txt
1
5

示例行(这些是我正在使用的实际行 - 顺序无关紧要):

Example Lines (these are the actual lines I'm working with - order does not matter):

ChIJW3p7Xz8YyIkRBD_TjKGJRS0
ChIJ08x-0kMayIkR5CcrF-xT6ZA
ChIJIxbjOykFyIkRzugZZ6tio1U
ChIJiaF4aOoEyIkR2c9WYapWDxM
ChIJ39HoPKDix4kRcfdIrxIVrqs
ChIJk5nEV8cHyIkRIhmxieR5ak8
ChIJs9INbrcfyIkRf0zLkA1NJEg
ChIJRycysg0cyIkRArqaCTwZ-E8
ChIJC8haxlUDyIkRfSfJOqwe698
ChIJxRVp80zpcEARAVmzvlCwA24
ChIJw8_LAaEEyIkR68nb8cpalSU
ChIJs35yqObit4kR05F4CXSHd_8
ChIJoRmgSdwGyIkRvLbhOE7xAHQ
ChIJaTtWBAWyVogRcpPDYK42-Nc
ChIJTUjGAqunVogR90Kc8hriW8c
ChIJN7P2NF8eVIgRwXdZeCjL5EQ
ChIJizGc0lsbVIgRDlIs85M5dBs
ChIJc8h6ZqccVIgR7u5aefJxjjc
ChIJ6YMOvOeYVogRjjCMCL6oQco
ChIJ54HcCsaeVogRIy9___RGZ6o
ChIJif92qn2YVogR87n0-9R5tLA
ChIJ0T5e1YaYVogRifrl7S_oeM8
ChIJwWGce4eYVogRcrfC5pvzNd4

推荐答案

有两种标准方法可以做到这一点:

There are two standard ways to do this:

使用grep:

grep -vxFf removethese main

这使用:

  • -v 反转匹配.
  • -x 匹配整行,例如,防止 he 匹配 hellohighway to hell.
  • -F 使用固定字符串,以便参数按原样使用,而不是解释为正则表达式.
  • -f 从另一个文件中获取模式.在这种情况下,从 removethese.
  • -v to invert the match.
  • -x match whole line, to prevent, for example, he to match lines like hello or highway to hell.
  • -F to use fixed strings, so that the parameter is taken as it is, not interpreted as a regular expression.
  • -f to get the patterns from another file. In this case, from removethese.

使用awk:

$ awk 'FNR==NR {a[$0];next} !($0 in a)' removethese main
1
5

像这样,我们将 removethese 中的每一行存储在一个数组 a[] 中.然后,我们读取 main 文件并只打印数组中不存在的那些行.

Like this we store every line in removethese in an array a[]. Then, we read the main file and just print those lines that are not present in the array.

这篇关于根据第二个文本文件从文本文件中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆