使用AWK忽略CSV文件字段中的逗号 [英] Ignoring comma in field of CSV file with awk

查看:178
本文介绍了使用AWK忽略CSV文件字段中的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从CSV文件最后一行的第二个字段中获取一个数字.到目前为止,我有这个:

I'm trying to get a number from the second field of the last row of a CSV file. So far, I have this:

awk -F"," 'END {print $2}' /file/path/fileName.csv

这是可行的,除非最后一行的第一个字段中带有逗号.所以对于这样的一行,

This works, unless the first field in the last row has a comma in it. So for a row that looks like this,

"Company Name, LLC", 12345, Type1, SubType3

... "Company Name, LLC"实际上是第一个字段,awk命令将返回LLC.

...where "Company Name, LLC" is actually the first field, the awk command will return LLC.

如何忽略第一个字段中的逗号,以便在第二个字段中获取信息?

How do i ignore the commas in the first field so I can obtain information in the second?

推荐答案

我认为您的要求是在GNU Awk中使用FPAT的完美用例,

I think your requirement is the perfect use case for using FPAT in GNU Awk,

通常,当使用FS时,gawk将字段定义为在每个字段分隔符之间出现的记录部分.换句话说,FS定义什么不是字段,而不是什么字段.但是,有时您确实想根据字段的名称来定义字段,而不是根据字段的名称来定义字段.

Normally, when using FS, gawk defines the fields as the parts of the record that occur in between each field separator. In other words, FS defines what a field is not, instead of what a field is. However, there are times when you really want to define the fields by what they are, and not by what they are not.

最臭名昭著的这种情况是所谓的逗号分隔值(CSV)数据.如果仅用逗号分隔数据,则不会有问题.当其中一个字段包含嵌入式逗号时,就会出现问题.在这种情况下,大多数程序会将字段嵌入在双引号中.

The most notorious such case is so-called comma-separated values (CSV) data. If commas only separated the data, there wouldn’t be an issue. The problem comes when one of the fields contains an embedded comma. In such cases, most programs embed the field in double quotes.

对于此处显示的CSV数据,每个字段要么是不是逗号的任何内容",要么是双引号,任何不是双引号的内容和右双引号".如果编写为正则表达式常量(请参见Regexp),则将具有/([^,]+)|("[^"]+")/.将其写为字符串需要我们转义双引号,从而导致:

In the case of CSV data as presented here, each field is either "anything that is not a comma," or "a double quote, anything that is not a double quote, and a closing double quote." If written as a regular expression constant (see Regexp), we would have /([^,]+)|("[^"]+")/. Writing this as a string requires us to escape the double quotes, leading to:

FPAT = "([^,]+)|(\"[^\"]+\")"

在您的输入文件中使用它,

Using that on your input file,

awk 'BEGIN{FPAT = "([^,]+)|(\"[^\"]+\")"}{print $1}' file
"Company Name, LLC"

这篇关于使用AWK忽略CSV文件字段中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆