bash中的for循环仅打印n次命令,而不是重复执行 [英] for loop in bash simply prints n times the command instead of reiterating

查看:379
本文介绍了bash中的for循环仅打印n次命令,而不是重复执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个超过6000行的input.txt文件.

I have a input.txt file with over 6000 lines.

如果一行a的单词超过10个,那么我希望将其拆分,而不是第10个单词,而是第一个逗号出现的位置.而且,如果新行也有10个以上的单词,则也应该将其拆分,并重复7次.

If a line a has over 10 words then I want it to be split but not at the 10th word but where the first comma character appears. And, if the new line also has more than 10 words, then it should also be split, and keep reiterating this process 7 times.

最终产品:没有包含超过10个单词和逗号的行,因为它们都被分割了.

End product: no lines with more than 10 words and commas because they have all been split.

示例:

输入

Line 1: This is me, and my sample test line that I like to get working, and I want to be able to kick some ass while doing it

预期输出:

Line 1: This is me, 
Line 2: and my sample test line that I like to get working,
Line 3: and I want to be able to kick some ass while doing it

我正在使用以下代码:

#! /bin/bash

for run in {1..7}
do

awk 'NF >= 10 {
sub (", ", ",\n")

}1' input.txt

done

此代码未提供期望的结果.相反,我得到以下输出7次.

This code is not giving the desired result. Instead I get the following output 7 times.

line 1: This is me,

line 2: and my sample test line that I like to get working, and I want to be able to kick some ass while doing it.

我倾向于sed,但我不清楚.我看到了三种方法:1)代码读取一行(例如line7),它超过10个单词,并以逗号将其断开(但不检查新断开的文本是否超过10个单词),然后移至下一行.在文件末尾,它重申了此过程(例如7次),以确保新断行也少于10个字.然后,它将获取此过程的输出,并执行相同的操作,但条件是新的(例如,单词"and").然后,它需要这个的输出,依此类推...我可以添加无尽的条件.这是我更喜欢的方法.我还认为编写代码更容易.

I am leaning to sed, but I'm not clear on something. I see three approaches 1) code reads a line (say line7) and it's over 10 words and breaks it at comma (but doesn't check if the newly broken like is over 10 words) and moves on to the next line. At end of file, it reiterates this process (say 7 times) to ensure that newly broken lines are also under 10 words. THEN, it takes the output of this process and does the same thing but with a new condition (eg. word "and "). THEN, it takes the output of this and so on...I can add endless conditions. This is the approach I prefer. I also think it is easier to code.

第二种方法2)是该代码读取行,如果超过10个单词,则将其分解为逗号,如果超过10个单词,则将其分解为逗号,以此类推,直到少于10个单词.只有然后,它才会继续前进到下一行.我认为这就是Ghoti的代码所做的.但是添加其他条件很复杂. 3)第三种方法是:用逗号将超过10个单词的行换行,然后将其余行用和"换行,依此类推.最后,整个过程需要重复几次.这也是恕我直言,不是最好的方法.

Second approach 2) Is that code reads line, and if it's over 10 words it breaks it down at the comma, and then if still over 10 words it breaks that further at the comma and so on until less than 10 words. Only Then, does it move on to the next line. I think this is what Ghoti's code does. But then it's complicated to add additional conditions. 3) 3rd approach is: it breaks the line over 10 words at comma, then remainder line gets broken at "and " and so on. Then in the end, this whole process get reiterated a few time. This is also IMHO not the best way to do it.

有人可以帮忙吗?

提前谢谢!

推荐答案

我想我明白你在追求什么.您的方法存在一些问题:

I think I see what you're after. There are a few problems with your approach:

  • awk不会就地处理文件.因此,您的sub()进行了更改,1打印到标准输出,但是您的输入文件从未更改.
  • sub()时,您不会在awk正在处理的输入流中插入新记录.您的命令仅在当前记录中添加换行符.
  • awk doesn't process files in-place. So your sub() makes a change, 1 prints to stdout, but your input file never changes.
  • When you sub(), you don't insert a new record into the input stream that awk is processing. Your command merely adds a newline to the current record.

鉴于这些,您可以像建议的那样多次处理输入.但是,与其随意假设一行上最多包含七个10个单词的短语,不如实际检测是否需要继续.像这样:

Given these, you could get away with processing the input multiple times, as you've suggested. But rather than arbitrarily assuming that you'll have a maximum of seven 10-word phrases on a line, it might be better to actually detect whether you need to continue. Something like this:

#!/usr/bin/env bash

input=input.txt
temp=$(mktemp ${input}.XXXX)
trap "rm -f $temp" 0

while awk '
  BEGIN { retval=1 }
  NF >= 10 && /, / {
    sub(/, /, ","ORS)
    retval=0
  }
  1
  END { exit retval }
' "$input" > "$temp"; do
  mv -v $temp $input
done

这使用awk的退出值来确定是否需要运行bash循环的另一次迭代.如果awk检测到不需要替换,则循环停止.

This uses an exit value from awk to determine whether we need to run another iteration of the bash loop. If awk detects that no substitutions were required, then the loop stops.

这篇关于bash中的for循环仅打印n次命令,而不是重复执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆