巴什目录排序问题 - 删除重复的线路? [英] Bash Directory Sorting Issue - Removing Duplicate Lines?

查看:125
本文介绍了巴什目录排序问题 - 删除重复的线路?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用这个命令来合并多个相同的目录,并从每个对应文件删除重复的行:

I'm using this command to merge multiple identical directories and to remove duplicate lines from each of the corresponding files:

for f in app1/*; do 
   bn="$(basename "$f")"
   sort -u "$f" "app2/$bn" > "app/$bn"
done

有这样它会检查所有文件的线,并删除所有的重复也可以编辑这个方法吗?我确实需要保持与单个文件现有的文件结构。

Is there a way to edit this so that it checks the lines of all the files and removes all the duplicates as well? I do need to keep the existing file structure with individual files.

最终的结果将创建300文本文件的目录中没有超过30MB。

The end result creates a directory with 300 text files that's no larger than 30mb.

示例:

**Directory app1**
*1.txt*       
a
b
c

*2.txt*
d
e
f

**Directory app2**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

**Results in Directory app**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

Desired Result in Directory app Should Be:
*1.txt*
a
b
c
g

*2.txt*
d
e
f

正如你可以看到它不是删除重复的A B C,从2.txt线时,它也发现1.txt的。每个文件中所有的线应保持独特的,所有重复应予删除。

As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.

推荐答案

这或许应该用 perl的-i 完成的:

perl -i -n -e 'print unless $h{$_};++$h{$_}' app1/*

这似乎创造APP1 .bak的文件(尽管手册页说它不会),你可能要与<$ C $检验后的结果,以消除C> RM APP1 / *。BAK 。

This seems to create .bak files in app1 (despite man page saying it won't) which you may want to eliminate after verifying the result with rm app1/*.bak.

这篇关于巴什目录排序问题 - 删除重复的线路?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆