awk:在生成数据时保留行顺序并删除重复的字符串(镜像) [英] awk: preserve row order and remove duplicate strings (mirrors) when generating data

查看：60 发布时间：2021/4/15 18:54:34 awk comparison batch-processing

本文介绍了awk:在生成数据时保留行顺序并删除重复的字符串(镜像)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个文本文件

g1.txt

 alfa beta;www.google.com
 Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;

g2.txt

Jack to ride.zip;http://alfa.org;
JKr.rui.rar;http://gamma.org;
Nofj ogk.png;http://gamma.org;

我使用此命令来运行awk脚本

I use this command to run my awk script

awk -f ./join2.sh g1.txt g2.txt > "g3.txt"

我得到了这个输出

Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;;Jack to ride.zip;http://alfa.org;JKr.rui.rar;http://gamma.org;Nofj ogk.png;http://gamma.org;
alfa beta;www.google.com;

有什么问题?

1.行顺序不守恒，例如在输出文件g3.txt中，行 alfa beta; www.google.com; 行位于行之后轻... .如应该在g1.txt
中看到的那样 2.我在 Light .. 行中有很多镜像字符串，可以在g3.txt

What are the problems?

1. row order is not conservated, for example in the output file g3.txt, the line alfa beta;www.google.com; is after the line Light.... when it should be first, as you can see in g1.txt
2. I have many mirror strings in Light.. line, you can see that in g3.txt

http://alfa.org
http://gamma.org
http://gamma.org

在同一行中重复.

我想要什么样的行输出? 像这样:

alfa beta;www.google.com
Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;Jack to ride.zip;JKr.rui.rar;Nofj ogk.png;

首先:我尝试实现一个检查行中是否存在普通字符串的函数，例如，您是否在行输出中看到 Light Dweller-CR，Technical Metal.那行内有相同的字符串?例如 http://alfa.org 和 http://gamma.org ?好吧，我不要这个.我希望每个字符串都包含在定界符中；只能出现一次，并且每行只能出现一次.
此规则应仅适用于输出文件g3.txt

First: I try to implement a function that check if there are ugual strings inside a row, for example do you see in my row output Light Dweller - CR, Technical Metal... that there are identical string inside that row? For example http://alfa.org and http://gamma.org ? Ok, I don't want this. I want each string, enclosed within delimiters; is present only once and only once for each row.
This rule should only apply to the output file, g3.txt

第二个::我希望g1.txt中的行的原始顺序必须在g3.txt输出文件中保留.例如，在g1.txt中，我有

Second: I want that original order of rows in g1.txt must be maintained in the g3.txt output file. For example, in g1.txt I have

alfa beta ... 
Light Dweller ...

但是我的脚本给我返回了不同的顺序

but my script returns to me a different ordering

Light Dweller ...
alfa beta ...

我想防止对行进行重新排序

I want to prevent reordering of rows

我的 join2.sh 脚本是这个

#! /usr/bin/awk  -f

BEGIN {
  OFS=FS=";"
  C=0;
}
{
  if (ARGIND == 1) {
     X = $NF
     T0[$NF] = C++
     $NF = ""
     if (T1[X]) {
        T1[X] = T1[X] $0
     } else {
        T1[X] = $0
     }
  } else {
     X = $NF
     T0[$NF] = C++
     $NF = ""
     if (T2[X]) {
        T2[X] = T2[X] $0
     } else {
        T2[X] = $0
     }
  }
}

END {
  for (X in T0) {
    # concatenate T1[X] and X, since T1[X] ends with ";"
    print T1[X]  X, T2[X]
  }
}

解决方案:

awk:在生成数据时保留行顺序并删除重复的字符串(镜像) [英] awk: preserve row order and remove duplicate strings (mirrors) when generating data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

awk:在生成数据时保留行顺序并删除重复的字符串(镜像) [英] awk: preserve row order and remove duplicate strings (mirrors) when generating data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭