巴什目录排序问题 - 删除重复的线路? [英] Bash Directory Sorting Issue - Removing Duplicate Lines?
问题描述
我使用这个命令来合并多个相同的目录,并从每个对应文件删除重复的行:
I'm using this command to merge multiple identical directories and to remove duplicate lines from each of the corresponding files:
for f in app1/*; do
bn="$(basename "$f")"
sort -u "$f" "app2/$bn" > "app/$bn"
done
有这样它会检查所有文件的线,并删除所有的重复也可以编辑这个方法吗?我确实需要保持与单个文件现有的文件结构。
Is there a way to edit this so that it checks the lines of all the files and removes all the duplicates as well? I do need to keep the existing file structure with individual files.
最终的结果将创建300文本文件的目录中没有超过30MB。
The end result creates a directory with 300 text files that's no larger than 30mb.
示例:
**Directory app1**
*1.txt*
a
b
c
*2.txt*
d
e
f
**Directory app2**
*1.txt*
a
b
c
g
*2.txt*
a
b
c
d
e
f
**Results in Directory app**
*1.txt*
a
b
c
g
*2.txt*
a
b
c
d
e
f
Desired Result in Directory app Should Be:
*1.txt*
a
b
c
g
*2.txt*
d
e
f
正如你可以看到它不是删除重复的A B C,从2.txt线时,它也发现1.txt的。每个文件中所有的线应保持独特的,所有重复应予删除。
As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.
推荐答案
这或许应该用 perl的-i
完成的:
perl -i -n -e 'print unless $h{$_};++$h{$_}' app1/*
这似乎创造APP1 .bak的
文件(尽管手册页说它不会),你可能要与<$ C $检验后的结果,以消除C> RM APP1 / *。BAK 。
This seems to create .bak
files in app1 (despite man page saying it won't) which you may want to eliminate after verifying the result with rm app1/*.bak
.
这篇关于巴什目录排序问题 - 删除重复的线路?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!