Linux Shell脚本:如何删除单词列表文件中的最终数字? [英] Linux shell scripting: How can I remove final numbers in a word list file?

查看:48
本文介绍了Linux Shell脚本:如何删除单词列表文件中的最终数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个示例列表文本文件(每行一个单词):

I have this example list text file (one word per line):

John
J0hn
John45
Smith
Sm1th
Jane
Jane333
Doe555

我想获取:

John
J0hn
Smith
Sm1th
Jane
Doe

这是:我想将数字删除到单词的末尾(请注意,单词内的数字是允许的),然后删除重复项.
我有一些编程经验,因此我可以先执行一些循环以检查这些数字,然后执行另一个循环以除去重复的单词,但是我认为Linux Shell必须具有一些简单的命令或参数扩展可以为我解决这个问题.

This is: I would like to remove numbers to the end of the words (note that numbers inside words are allowed) and then remove duplicates.
I have some experience in programming, so I could implement some loop/s to check for those numbers, and then another loop/s to remove duplicate words, but I think the Linux Shell must have some simple commands or parameter expansions that could solve this for me.

可以删除原始的文件排序,但是如果某些方法不需要它,那会很好.

Removing original file sorting is a possibility, but it would be fine if some method does not require it.

可能的用法:

  • 隔离密码数据库中使用的单词(John,45John,12345John)以获取多样性统计信息.

欢迎您提出想法.谢谢你.

Ideas are welcome. Thanks you.

EDIT-1:这种字典"文本文件中不应包含空格(无论如何,谢谢您,@ rottweilers_anonymous).

EDIT-1: whitespaces are not expected in this kind of "dictionary" text files (thanks you anyway, @rottweilers_anonymous).

EDIT-2:添加了一个可能的歧义示例,即仅包含数字的单词":必须保留(我知道,我知道,严格来说,这不是单词" ;-)).示例原始文件:

EDIT-2: Added example of a possible ambiguity, a "word" that has only numbers: it must be left (I know, I know, that is not strictly a "word" ;-) ). Example original file:

John
J0hn
John45
Smith
Sm1th
Jane
Jane333
Doe555
12345

只要像 12345 这样的行(没有单词的数字)实际上不是单词末尾的数字,我想保留它,因此结果必须是:

As long as a line like 12345 (numbers without word) is not really a number to the end of a word, I would like to keep it, so the results must be:

John
J0hn
Smith
Sm1th
Jane
Doe
12345

推荐答案

一种简单的方法是使用 sed uniq :

A simple way would be with sed and uniq:

sed "s/\([^0-9]\)[0-9]*\s*$/\1/" file | uniq

这确实假定名称是正确的.如果不是,则可以使用 sort :

This does assume that the names are in order. If they aren't, you can use sort:

sed "s/\([^0-9]\)[0-9]*\s*$/\1/" file | sort -u

根据@rottweilers_anonymous建议,在行末添加对空格的检查.

per @rottweilers_anonymous suggestion, added the check for white space at the end of line.

根据OP对问题条件的修改,请勿从仅数字的行中删除数字.

per OP's modification of question conditions, don't remove numbers from line that is only numbers.

这篇关于Linux Shell脚本:如何删除单词列表文件中的最终数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆