按行长(包括空格)对文本文件进行排序 [英] Sort a text file by line length including spaces

查看:25
本文介绍了按行长(包括空格)对文本文件进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的 CSV 文件

<前>AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Atlantis,RI,12345,(999)123-5555,1.56AS2345,ASDF1232, Mrs. Plain Example, 1121110 三元街110 Binary ave..,Atlantis,RI,12345,(999)123-5555,1.56AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Liberty City,RI,12345,(999)123-5555,1.56AS2345,ASDF1232, Mr. Plain Example, 110 Ternary ave.,Some City,RI,12345,(999)123-5555,1.56

我需要按行长(包括空格)对其进行排序.以下命令不包括空格,有没有办法修改它以便它对我有用?

cat $@ |awk '{ 打印长度,$0 }' |排序 -n |awk '{$1="";打印 $0}'

解决方案

答案

cat 测试文件 |awk '{ 打印长度,$0 }' |排序 -n -s |剪切 -d" " -f2-

或者,对任何等长行进行原始(可能是无意的)子排序:

cat 测试文件 |awk '{ 打印长度,$0 }' |排序 -n |剪切 -d" " -f2-

在这两种情况下,我们都通过在最终剪辑中远离 awk 解决了您提出的问题.

匹配长度的行 - 在平局的情况下该怎么做:

问题没有说明是否需要对匹配长度的行进行进一步排序.我认为这是不需要的,并建议使用 -s (--stable) 来防止此类行相互排序,并将它们保留在相对位置它们在输入中出现的顺序.

(那些想要更多地控制这些关系的排序的人可以查看 sort 的 --key 选项.)

为什么问题的尝试解决方案失败(awk 行重建):

注意到以下区别很有趣:

echo "hello awk world" |awk '{打印}'回声你好 awk 世界" |awk '{$1="你好";打印}'

他们分别屈服

你好 awk 世界你好 awk 世界

(gawk 的)手册的相关部分仅作为旁白提及当您更改一个字段时,awk 将重建整个 $0(基于分隔符等).我想这不是疯狂的行为.它有这个:

最后,有时可以方便地强制 awk 重建整个记录,使用字段的当前值和 OFS.为此,使用看似无害的赋值:"

 $1 = $1 # 强制重建记录用 $0 打印 $0 # 或其他任何内容

这会强制 awk 重建记录."

测试输入,包括一些等长的行:

aa 一行有更多空格bb 文件中最长的一行抄送9 dd 等于 len.原始位置 = 1500 dd 等于 len.原始位置 = 2ccz共济会ee 一行有一些空格1 dd 等于 len.原始位置 = 3ff5 dd 等于 len.原始位置 = 4G

I have a CSV file that looks like this

AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mrs. Plain Example, 1121110 Ternary st.                                        110 Binary ave..,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Liberty City,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Ternary ave.,Some City,RI,12345,(999)123-5555,1.56

I need to sort it by line length including spaces. The following command doesn't include spaces, is there a way to modify it so it will work for me?

cat $@ | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}'

解决方案

Answer

cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-

Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:

cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-

In both cases, we have solved your stated problem by moving away from awk for your final cut.

Lines of matching length - what to do in the case of a tie:

The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s (--stable) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.

(Those who want more control of sorting these ties might look at sort's --key option.)

Why the question's attempted solution fails (awk line-rebuilding):

It is interesting to note the difference between:

echo "hello   awk   world" | awk '{print}'
echo "hello   awk   world" | awk '{$1="hello"; print}'

They yield respectively

hello   awk   world
hello awk world

The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:

"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"

 $1 = $1   # force record to be reconstituted
 print $0  # or whatever else with $0

"This forces awk to rebuild the record."

Test input including some lines of equal length:

aa A line   with     MORE    spaces
bb The very longest line in the file
ccb
9   dd equal len.  Orig pos = 1
500 dd equal len.  Orig pos = 2
ccz
cca
ee A line with  some       spaces
1   dd equal len.  Orig pos = 3
ff
5   dd equal len.  Orig pos = 4
g

这篇关于按行长(包括空格)对文本文件进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆