排序线路长度(包括空格)的文本文件 [英] Sort a text file by line length including spaces

查看:129
本文介绍了排序线路长度(包括空格)的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件看起来像这样


AS2345,ASDF1232,平原为例先生,110二进制AVE。,亚特兰蒂斯,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232,平原为例女士,1121110三元ST。 110二进制AVE ..,亚特兰蒂斯,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232,平原为例先生,110二进制AVE。,自由城,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232,平原为例先生,110三元AVE。,有的市,RI,12345,(999)123-5555,1.56

我需要通过线的长度,包括空格排序。下面的命令不
包括空格,有没有修改它,它会为我工作的一种方式?

 猫$ @ | AWK'{打印长度,$ 0}'|排序-n | AWK'{$ 1 =;打印$ 0}


解决方案

 猫testfile的| AWK'{打印长度,$ 0}'|排序-n -s |切-d-f2-

或者,做你原来的(也许是无意的)任何相等长度的线子分类:

 猫testfile的| AWK'{打印长度,$ 0}'|排序-n |切-d-f2-

在这两种情况下,我们通过从AWK远动你的Final Cut解决您的所述问题。

匹配长度的

线 - 如何在平局的情况下做的:

问题并没有指定是否进一步分选通缉匹配长度的线。我认为这是不必要的,并建议使用 -s - 稳定)以prevent这样的行进行排序,对对方,并让他们在其发生在输入的相对顺序。

(谁想要整理这些关系可能看排序的的更多的控制 - 键选项)

为什么这个问题的尝试性解决方案失败(AWK线重建):

有值得注意的区别:

 回声你好世界AWK| AWK'{}打印
回声你好世界AWK| AWK'{$ 1 =你好;打印}'

他们分别产生

 你好awk的世界
你好世界AWK

的(GAWK的)手册的相关的部分只提到作为一个搁置这样awk是要重建整个$ 0(基于分离器等),当你改变一个字段。我想这不是疯狂的行为。它具有这样的:

。最后,有些时候很方便迫使awk来重建整个记录,使用领域和OFS的当前值要做到这一点,使用看似无害的任务:

  $ 1 = $ 1号力记录进行复原
 打印$ 0#或其他任何使用$ 0来

这将迫使awk来重建记录。

测试输入,包括长度相等的几行:

AA A与多个空格线
BB文件中的很长的线
建行
9 DD等于LEN。原稿POS = 1
500 DD等于LEN。原稿POS = 2
CCZ
CCA
ee值与一些空格的行
1 DD等于LEN。原稿POS = 3
FF
5 DD等于LEN。原稿POS = 4
G

I have a CSV file that looks like this

AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mrs. Plain Example, 1121110 Ternary st.                                        110 Binary ave..,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Liberty City,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Ternary ave.,Some City,RI,12345,(999)123-5555,1.56

I need to sort it by line length including spaces. The following command doesn't include spaces, is there a way to modify it so it will work for me?

cat $@ | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}'

解决方案

Answer

cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-

Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:

cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-

In both cases, we have solved your stated problem by moving away from awk for your final cut.

Lines of matching length - what to do in the case of a tie:

The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s (--stable) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.

(Those who want more control of sorting these ties might look at sort's --key option.)

Why the question's attempted solution fails (awk line-rebuilding):

It is interesting to note the difference between:

echo "hello   awk   world" | awk '{print}'
echo "hello   awk   world" | awk '{$1="hello"; print}'

They yield respectively

hello   awk   world
hello awk world

The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:

"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"

 $1 = $1   # force record to be reconstituted
 print $0  # or whatever else with $0

"This forces awk to rebuild the record."

Test input including some lines of equal length:

aa A line   with     MORE    spaces
bb The very longest line in the file
ccb
9   dd equal len.  Orig pos = 1
500 dd equal len.  Orig pos = 2
ccz
cca
ee A line with  some       spaces
1   dd equal len.  Orig pos = 3
ff
5   dd equal len.  Orig pos = 4
g

这篇关于排序线路长度(包括空格)的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆