bash中,Linux的:两个文本文件之间的差异设置 [英] bash, Linux: Set difference between two text files
问题描述
我有两个文件 A
- nodes_to_delete
和 B
- nodes_to_keep
。每个文件都有一个多线,数字ID。
I have two files A
-nodes_to_delete
and B
-nodes_to_keep
. Each file has a many lines with numeric ids.
我想有数字ID是在 nodes_to_delete
,但不是在 nodes_to_keep
,例如列表。
I want to have the list of numeric ids that are in nodes_to_delete
but NOT in nodes_to_keep
, e.g. .
在PostgreSQL数据库中这样做是不合理的缓慢。任何整洁的方式使用Linux CLI工具做在bash?
Doing it within a PostgreSQL database is unreasonably slow. Any neat way to do it in bash using Linux CLI tools?
更新:这似乎是一个Python化的工作,但这些文件是真的,真的很大。我已经解决了使用 uniq的
,排序
一些集理论技术的一些类似的问题和。这大约是两个或三个数量级比数据库当量更快
UPDATE: This would seem to be a Pythonic job, but the files are really, really large. I have solved some similar problems using uniq
, sort
and some set theory techniques. This was about two or three orders of magnitude faster than the database equivalents.
推荐答案
借助 COMM 命令这样做的。
这篇关于bash中,Linux的:两个文本文件之间的差异设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!