如何使用SED / AWK解析一个文件的内容? [英] How to parse contents of a file using sed/awk?

查看:160
本文介绍了如何使用SED / AWK解析一个文件的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的输入文件有其格式如下内容,其中每列由一个空间

分离<$p$p><$c$c>string1<space>string2<space>string3<space>YYYY-mm-dd<space>hh:mm:ss.SSS<space>string4<space>10:1234567890<space>0e:Apple 1.2.3.4&lt;空&GT;&lt;空&GT;&STRING5 lt;空&GT; HEX

有2空间后,0E:苹果1.2.3.4,因为在这个领域/列中没有14位数。整个0E:苹果1.2.3.4space被视为该列的单个值。

在第7列, 10 重新presents在下面的字符串中的字符数。

在第8栏, 0E:重新presents的14十六进制的值,所以,十六进制值提的字符个数后面的字符串中的

我爱:

 0E:苹果1.2.3.4 - &GT;这是8列中的实际值没有
    (我已经提到的,显示第14位为空)它算作
0E:在P P Lë1。 2。 3。 4
   | | | | | | | | | | | | | |
   1 2 3 4 5 6 7 8 9 10 11 12 1314

让我们考虑第一行输入文件如下:

 字符串1字符串2 STRING3 YYYY-MM-DD 23:50:45.999串,10:1234567890 0E:苹果1.2.3.4 STRING5 001E

其中:


  • 字符串1 是第1列的值

  • 字符串2 在第二列中的值

  • STRING3 是第3列中的值

  • YYYY-MM-DD 4中

  • 23:50:50.999 在5日

  • STRING3 在6

  • 10:1234567890 在7 //有结尾没有空间,因为它有10个数字

  • 0E:苹果1.2.3.4第8 //空间

  • STRING5 在9日

  • 001E 第十

期望的输出:

 字符串1,字符串,STRING3,YYYY-MM DD,23:50:50.999,string3,1234567890,Apple_1.2.3.4,string5,30

要求:


  1. 消除距离第7和第8列计数( 10 &放大器; 0E:

  2. B /空间是W 苹果 1.2.3.4 应该是替换_

  3. 在最后一列的十六进制值应转换为十进制值。

  4. 替换为列之间的空间

  5. 我只在第10列中使用十六进制值在这里。如果它在几列呢?任何方式将其转换为特定的某些列?

我已经使用这个尝试:

  $猫input.txt的| sed的'S / [A-Z0-9] *:// G'

这使得输出:

 字符串1,字符串,STRING3,YYYY-MM-DD,45.999,string4,1234567890,苹果,1.2.3.4,string5,001e


解决方案

这会做你想要什么你的例子输入:

 的awk -F[]'{子(/.*:/,,$ 7)子(/.*:/,,$ 8); printf的%S%S%S,%S,%S,%S%S%S_%S,%S,%S%D \\ n,$ 1,$ 2,$ 3,$ 4,$ 5 $ 6,$ 7 $ 8 $ 9 $ 10,$ 10,0X$ 12}'input.txt中

部分的说明:

AWK 的printf 允许您指定的输出格式,所以你可以手动指定要界定哪些字段与,并要与来划定_

-F[] 强制字段分隔符是一个空格,以便它知道有两个单空间之间的空场。默认行为是允许多个空格是一个单一的分隔符,这是根据你的问题想不是。

功能,可以做定期的前pression更换,在这种情况下删除 .. preFIX领域中的7和8。

有关领域12,我们告诉的printf 来输出为数字(的 %d个 ),并作出输入字符串由 0X pfixed $ p $ 使其间$ p $点其为十六进制。

注:的如果它并不总是你想要的输出是这样 $ 8_ $共9 ,那么你实际上需要解析十六进制preFIX和报数的字符,以确定其中场结束。如果是这样的话,我会亲自preFER写别的东西,例如整个事情蟒蛇。

My input file has its content in following format, where each column is separated by a "space"

string1<space>string2<space>string3<space>YYYY-mm-dd<space>hh:mm:ss.SSS<space>string4<space>10:1234567890<space>0e:Apple 1.2.3.4<space><space>string5<space>HEX  

There are 2 "spaces" after "0e:Apple 1.2.3.4" because there is no 14th digit in this field/column. The entire "0e:Apple 1.2.3.4space" is treated as a single value of that column.

In the 7th column, 10: represents the count of characters in the following string.

In the 8th column, 0e: represents a hex value of 14. So, the HEX values mention the count of characters in the string that follows.

Like:

"0e:Apple 1.2.3.4 "--> this is the actual value in 8th column without " "  
    (I've mentioned " " to show that the 14th digit is empty)  

It's counted as  
0e:A p p l e   1 . 2 .   3  . 4    
   | | | | | | | | | |   |  | | |  
   1 2 3 4 5 6 7 8 9 10 11 12 1314  

Let's consider first row from the input file as:

string1 string2 string3 yyyy-mm-dd 23:50:45.999 string4 10:1234567890 0e:Apple 1.2.3.4  string5 001e  

where:

  • string1 is the value in 1st column
  • string2 is the value in 2nd column
  • string3 is the value in 3rd column
  • yyyy-mm-dd in 4th
  • 23:50:50.999 in 5th
  • string3 in 6th
  • 10:1234567890 in 7th //there is no space at the end because it has 10 digits
  • 0e:Apple 1.2.3.4 in 8th //space at the end
  • string5 in 9th
  • 001e in 10th

Expected output:

string1,string2,string3,yyyy-mm dd,23:50:50.999,string3,1234567890,Apple_1.2.3.4,string5,30  

Requirements:

  1. Eliminate the counts from 7th and 8th column (10: & 0e:)
  2. The space b/w Apple and 1.2.3.4 should be replace by "_"
  3. Hex value in the last column should be converted to decimal value.
  4. Replace the "space" between columns with ","
  5. I've used hex value only in 10th column here. What if it's in several columns? Any way to convert it specific to certain columns?

I've tried using this:

$ cat input.txt |sed 's/[a-z0-9].*://g'  

which gives output as:

string1,string2,string3,yyyy-mm-dd,45.999,string4,1234567890,Apple,1.2.3.4,,string5,001e  

解决方案

This will do what you want on your example input:

awk -F "[ ]" '{sub(/.*:/, "", $7) sub(/.*:/, "", $8); printf "%s,%s,%s,%s,%s,%s,%s,%s_%s,%s,%s,%d\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, "0x"$12}' input.txt

Explanation of parts:

awk printf allows you to specify an output format, so you can manually specify which fields you want to delimit with , and which you want to delimit with _.

-F "[ ]" forces the field separator to be a single space so that it knows there is an empty field between two single spaces. The default behavior would be to allow multiple spaces to be a single delimiter, which is not what you want according to the question.

The sub function allows you to do regular expression replacement, in this case removing the ..: prefix in fields 7 and 8.

For field 12, we tell printf to output as a number (%d) and give as input the string in prefixed by 0x so that it interprets it as hexadecimal.

Note: If it's not always the case that you want the output to be $8_$9, then you actually need to parse the hexadecimal prefix and count off characters in order to determine where the field ends. If that's the case, I would personally prefer to write the whole thing in something else, e.g. Python.

这篇关于如何使用SED / AWK解析一个文件的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆