用计数器awk重新编号重复的行 [英] Renumbering duplicate lines with counter awk

查看:74
本文介绍了用计数器awk重新编号重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在csv中有重复的单词.我需要这样数:

I have duplicate words in csv. And i need to count it in such way:

jsmith
jsmith
kgonzales
shouston
dgenesy
kgonzales
jsmith

对此:

jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com

我有类似的东西,但是它对我来说无法正常工作.或者我不能这样做在此处输入链接说明

I have smth like that, but it doesn't work properly for me..or i cant do it enter link description here

推荐答案

一种简单的方法是使用用户名作为索引来维护数组,并在每次读取用户时对其进行递增,例如

A simple way to do it is maintain an array using the username as the index and increment it each time you read a user, e.g.

awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' file

三元 ((($ 1 in a)?$ 1 a [$ 1]:$ 1)只是检查用户是否在 a [] ,如果用户不在数组中,则使用名称加数组 $ 1 a [$ 1] 的值,则仅使用用户 $ 1 .三元的结果与"@ email.com" 串联在一起,以完成输出.

The ternary (($1 in a) ? $1 a[$1] : $1) just checks if the user in in a[] yet, and if so uses the name plus the value of the array $1 a[$1] if the user is not in the array, then it just uses the user $1. The result of the ternary is concatenated with "@email.com" to complete the output.

最后,用户数组元素的值增加了 a [$ 1] ++ .

Lastly, the value for the array element for the user is incremented, a[$1]++.

使用/输出示例

将您的名字放在一个名为 users 的文件中,您将拥有:

With your names in a file called users you would have:

$ awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users
jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com


将电子邮件保留在输入文件中

如果您输入的用户名末尾已经包含一封电子邮件,那么您只想输出该记录并跳至下一条记录,例如

If your input already contains an e-mail at the end of the username, then you simply want to output that record and skip to the next record, e.g.

awk '$1~/@/{print; next} { print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users

这将从您的评论中保留 e.meeks@example.or .

That will preserve e.meeks@example.or from your comment.

示例输入

jsmith
jsmith
kgonzales
shouston
e.meeks@example.org
dgenesy
kgonzales
jsmith

示例输出

jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
e.meeks@example.org
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com

这篇关于用计数器awk重新编号重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆