我应该用sed,awk的,perl的,用于改变文本跨越多行只选择需要的信息? [英] Should I use sed, awk, perl, for altering text spanning multiple lines and selecting only the info needed?

查看:127
本文介绍了我应该用sed,awk的,perl的,用于改变文本跨越多行只选择需要的信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在上课的一个项目,我们需要一个完整的描述就像一个低于

I'm working on a project for class where we take a file full of lines describing classes like the one below

CSC 1010 - 电脑,功放;应用结果
  计算机和应用程序。 prerequisite:高中代数II。计算机的历史,>硬件组件,操作系统,应用软件,进行数据通信。结果
  3.000学时

CSC 1010 - COMPUTERS & APPLICATIONS
Computers and Applications. Prerequisite: high school Algebra II. History of computers, >hardware components, operating systems, applications software, data communication.
3.000 Credit hours

和把它变成

CSC1010,计算机和放大器;应用,3

CSC1010,COMPUTERS & APPLICATIONS,3

我用:

sed -n 's/^CSC /CSC/p' courses.txt > practice.txt

它输出:

CSC1010 - 电脑,功放;应用结果
  CSC1310 - INTRO COMP编程非易失少校结果
  CSC2010 - INTRO计算机科学结果
  CSC2310 - PRIN计算机编程结果
  CSC2320 - 网站开发基金结果
  CSC2510 - 作者COMP SCI结果理论值地基
  CSC3010 - 历史的计算结果
  CSC3210 - 计算机ORG和放大器;编程结果
  CSC3320 - 系统级编程结果
  CSC3330 - C ++编程结果
  CSC3410 - 数据结构,CTW结果
  CSC4110 - 嵌入式系统结果
  CSC4120 - 介绍到机器人

CSC1010 - COMPUTERS & APPLICATIONS
CSC1310 - INTRO COMP PROGRAMMING NON-MAJ
CSC2010 - INTRO TO COMPUTER SCIENCE
CSC2310 - PRIN OF COMPUTER PROGRAMMING
CSC2320 - FUND OF WEBSITE DEVELOPMENT
CSC2510 - THEOR FOUNDATIONS OF COMP SCI
CSC3010 - HISTORY OF COMPUTING
CSC3210 - COMPUTER ORG & PROGRAMMING
CSC3320 - SYSTEM-LEVEL PROGRAMMING
CSC3330 - C++ PROGRAMMING
CSC3410 - DATA STRUCTURES-CTW
CSC4110 - EMBEDDED SYSTEMS
CSC4120 - INTRODUCTION TO ROBOTICS

和我也用:

sed '/\.000 Course hours//p' courses.txt > courses10.txt

它输出:

3
  3
  3
  3
  3
  3
  3
  3
  3
  3
  3
  4
  4
  4
  4
  4
  4
  4

3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4

我的问题是试图选择是否用sed,awk的或Perl会更好。所以,到目前为止我用sed来消除未课程标题或信贷小时数组成器的线路。正如你看到的上面。我希望使用一个普通的前pression通过文件进行排序,并得到每个以CSC或载.000课时开始线。我计算过,我得到的输出后,我可能会在sed中使用命令来启动与CSC行的末尾删除新线和替换用逗号。从那以后,我将取代用逗号反斜线。然而,要做到这一点,我想我需要使用扩展前pression这样的sed很可能会出来。我用的是考虑常规的前pression是(^ CSC | [0-9] \\ 000。)。所以,我应该在SED,AWK或Perl可以这样做。如果你能请附上你的理由,为什么它会更有效地使用你的建议任何方法。

My problem is trying to select whether the sed, awk, or perl would be better. So, far I've used sed to eliminate the lines that are not composed either of the course title or the number of credit hours. As you saw above. I was hoping to use a regular expression to sort through the file and get each line that started with "CSC" or contained ".000 Course hours". I figured that after I got that output I could use a command in the sed to remove the new line from the end of the lines starting with the CSC and replace that with a comma. After that I would replace the backslash with a comma. However, to do that I think I would need to use an extended expression so sed would probably be out. The regular expression I was considering using is (^CSC |[0-9]\.000). So, should I be doing this in sed, awk, or perl. If you could please include your reasoning as to why it would be more efficient to use whatever method you suggest.

推荐答案

在Perl中:

while (<>) {
  chomp;
  print if s/^CSC\s+/CSC/ and s/\s+-\s+/,/;
  printf ",%.0f\n", $1 if /^([\d.]+)\s+Credit hours/;
}

这篇关于我应该用sed,awk的,perl的,用于改变文本跨越多行只选择需要的信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆