合并2行成一个 [英] Merge 2 lines into one

查看:120
本文介绍了合并2行成一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件开头的9位数字大专$ ​​C $ c和具有5位课程结束code。

 工程教育和研究,纳西克的512161000 EN5121 K. K.豪尔赫研究所61220机械工程[第二Shift]键XOPENH 1 116 16978
技术517261123 EN5172 R. C.鲁斯图姆研究所Shirpur 61220机械工程[第二Shift]键YOPENH 1 100 29555
617561234 EN6175 ABC XYZ教育信托基金,工学ABC某某学院,
普纳61220机械工程[第二Shift]键ZOPENH 2 105 25017

有某些项,其中有一个换行中所示的3例中。
我需要合并第三和第四线于一体,就像第1和第2行,这样我可以轻松地使用命令,如grep,awk的等等。

更新:

凯文的回答似乎并没有工作。

 猫todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin维卡斯Sanstha的Jagdambha学院,
工程技术,Yavatmal 24510计算机工程LSCO 1 55 93531猫todel.txt | perl的-ne'格格;如果(/ ^ \\ D {9} /){打印\\ n $ _}其他{打印$ _ \\ n}'
工程技术,Yavatmal 24510计算机工程LSCO 1 55 93531ege,


解决方案

关于分割线:这 SED 脚本假设你至少有一个领域的领先数后(上分割的第一行),和后号码前一个空格(在分割的最后一行),并且有每分割线只有一个分割

修改接受输入与Windows CRLF换行 * nix的LF。但要注意,输出的是* nix的 \\ n

  SED -nr的/ \\ r?$ //#允许'\\ r \\ n'换行符
         。/ ^([0-9] {9})*([0-9] {5})$ / {磷; B}
         / ^([0-9] {9})/ {小时; B}
         /([0-9] {5})$ / {X; G组; S / \\ n //; p}

或短,但也许少可读:

  SED -nr的/ \\ R $ //?; / ^([0-9] {9})/ {/([0-9] {5})$ / {磷; B};小时; B}; /([0-9] {5})$ / {X; G组; S / \\ n //; p}

我期望第一种速度更快,因为最常见的测试(为实线)仅涉及一个单一的正则表达式,而第二(较短)脚本,需要最频繁的测试二正则表达式的测试。

这它的输出我得到的;使用 GNU的sed 4.2.1

 工程教育和研究,纳西克的512161000 EN5121 K. K.豪尔赫研究所61220机械工程[第二Shift]键XOPENH 1 116 16978
技术517261123 EN5172 R. C.鲁斯图姆研究所Shirpur 61220机械工程[第二Shift]键YOPENH 1 100 29555
617561234 EN6175 ABC XYZ教育信托基金,工学ABC某某学院,浦那61220输入code hereMechanical工程[第二Shift]键ZOPENH 2 105 25017
,工程技术,Yavatmal 24510计算机工程112724510 EN1127 Jagadambha Bahuuddeshiya Gramin维卡斯Sanstha的Jagdambha学院LSCO 1 55 93531

I have a text file starts with 9 digits college code and ends with of 5 digits course code.

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017

There are some entries where there is a line break as shown in the 3 example above. I need to merge 3rd and 4th line into one just like 1st and 2nd line, so that I can easily use command like grep, awk etc.

Update:

Kevin's answer does not seem to work.

cat todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

cat todel.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }' 
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531ege of,

解决方案

Regarding split lines: This sed script assumes that you have at least one space after the leading number (on the first line of the split), and one space before the trailing number (on the last line of the split), and that there is only one split per split line.

Modified to accept input with Windows CRLF newlines or *nix LF. but note that the output is a *nix \n

sed -nr 's/\r?$// # allow for '\r\n' newlines
         /^([0-9]{9}) .* ([0-9]{5})$/{p;b}
         /^([0-9]{9}) /{h;b}
         / ([0-9]{5})$/{x;G; s/\n//; p}' 

or, shorter, but perhaps less readable:

sed -nr 's/\r?$//; /^([0-9]{9}) /{/ ([0-9]{5})$/{p;b};h;b};/ ([0-9]{5})$/{x;G; s/\n//; p}' 

I do expect that the first one is faster, because the most frequent test (for full lines) involves just a single regex, whereas the second (shorter) script, need two regex tests for the most frequent test.

This it the output I get; using GNU sed 4.2.1

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,Pune 61220 enter code hereMechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

这篇关于合并2行成一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆