如何把序列号重复时数据的一行结束了吗? [英] How to put sequential numbers at the end of repeated data in a line?

查看:107
本文介绍了如何把序列号重复时数据的一行结束了吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些重复信息的文件。线被编号,然后是冒号,随后的信息。我想提出一个序列号只在重复信息的结尾。

I have a file with some repeated information. The lines are numbered, followed by a colon, followed by the information. I want to put a sequential number only at the end of the repeated information.

例。

输入:

1:Jose da Silva
2:Jose da Silva
3:Fulano de Tal
4:Jose da Silva
5:Sicrano Pereira
6:Ze Ruela
7:Sicrano Pereira
8:Jose da Silva

输出:

1:Jose da Silva #1
2:Jose da Silva #2
3:Fulano de Tal
4:Jose da Silva #3
5:Sicrano Pereira #1
6:Ze Ruela
7:Sicrano Pereira #2
8:Jose da Silva #4

[这个问题从<一个不同href=\"http://stackoverflow.com/questions/37315483/how-to-put-sequential-numbers-only-at-end-of-the-repeated-lines\">this 之一,因为这里的线是不同的永诺(每行都有不同的号码)。我的输入/输出的例子可能看起来非常相似,但在实际应用中,他们都没有。]

[This question differs from this one because here the lines are allways different (every line has a different number). My input/output examples may look very similar, but in the real application they are not.]

推荐答案

previous答案:

awk -F: 'FNR==NR {count[$2]++; next}
         count[$2]>1 {$0=$0 OFS "#"++times[$2]}
         1' file file

这就是:第一次,计算每个第二块出现的次数。第二次,保持一个附加递增的数字以那些出现不止一次。因此,而不是比较全线飘红,把它比第二个字段,它是从结肠的一切:

That is: the first time, count how many times each second block occurs. The second time, keep appending an incrementing number to those that appear more than once. So instead of comparing the whole line, it compares the second field, which is everything from the colon :.

进一步的解释:


  • FNR == {NR行动;接下来} {more_actions}文件1文件2 在于做一些东西动作读第一个文件,当其他 more_actions 读第二个的时候。当你想比较文件,就像我们在这里做这都非常方便。别急,这里我们只有一个文件,对不对?是的,但是这也使文件中的一个,以彼此进行比较的行。这个在惯用的awk
  • 更信息
  • 所以 FNR == {NR算[$ 2] ++;阵列中的下一个} 商店计数每2场出现多少次。通过这种方式,何塞·达席尔瓦是反的4倍,等等。注意,我们使用 $ 2 作为数组的索引:这是一款基于分隔符<$ C $第二场C>:我们设置与 -F:。也就是说,第一场就是一切上升到第,第二个字段一切从第一达第二个等。

  • 算[$ 2]→1 {$ 0 = $ 0个OFS#+倍[$ 2]} THI姐姐已经读第二次文件。这里它使检查是否对当前时间的第二字段中的计数器说,它会发生一次或多次。如果不止一次,把它添加到原始字符串 $ 1,0 部分内容。这是 OFS#+倍[$ 2]

    • OFS 是输出字段分隔符。即,打印数据时所使用的字段分隔符。由于我们没有在运行程序之前设置它,它默认为一个空格。

    • 这就是我们想要的柜台前添加一些文字。

    • ++倍[$ 2] 这只是跟踪它多少次到目前为止打印的计数器。既然我们有不同的第二场,我们需要一个数组倍[] 来跟踪它们中的每一个。

    • the FNR==NR {actions; next} {more_actions} file1 file2 consists in doing some stuff actions when reading the first file and other more_actions when reading the second one. This comes very handy when you want to compare files, like we are doing here. But wait, here we only have one file, right? Yes, but this also allows to compare lines in the file one to each other. More info about this in Idiomatic awk.
    • So FNR==NR {count[$2]++; next} stores in the array count how many times every 2nd field appears. This way, Jose da Silva is counter 4 times, etc. Note we use $2 as the index of the array: this is the second field based on the delimiter : that we set with -F:. That is, the first field is everything up to the first :, the second field everything from the first : up to the second one and so on.
    • count[$2]>1 {$0=$0 OFS "#"++times[$2]} thi sis already reading the file for the second time. Here it keeps checking if the counter on the second field of the current time says that it happens one or more times. If it is more than once, it adds to the original string $0 some content. This is OFS "#"++times[$2].
      • OFS is the output field separator. That is, the field separator that is used when printing data. Since we did not set it before running the program, it default to a space.
      • "#" this is just some text we want to add before the counter.
      • ++times[$2] this is just a counter to keep track of how many times it was printed so far. Since we have different 2nd fields, we need an array times[] to keep track of each one of them.

      输出:

      $ awk -F: 'FNR==NR {count[$2]++; next} count[$2]>1 {$0=$0 OFS "#"++times[$2]}1' file file
      1:Jose da Silva #1
      2:Jose da Silva #2
      3:Fulano de Tal
      4:Jose da Silva #3
      5:Sicrano Pereira #1
      6:Ze Ruela
      7:Sicrano Pereira #2
      8:Jose da Silva #4
      

      这篇关于如何把序列号重复时数据的一行结束了吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆