用awk,而忽略逗号场里面解析一个CSV [英] Parse a csv using awk and ignoring commas inside a field

查看:276
本文介绍了用awk,而忽略逗号场里面解析一个CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件,其中的每一行定义一个给定的建筑物的房间。随着房间,每一行都有一个地板领域。我想提取是所有建筑物各楼层。

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings.

我的文件看起来像这样...

My file looks like this...

"u_floor","u_room","name"
0,"00BDF","AIRPORT TEST            "
0,0,"BRICKER HALL, JOHN W    "
0,3,"BRICKER HALL, JOHN W    "
0,5,"BRICKER HALL, JOHN W    "
0,6,"BRICKER HALL, JOHN W    "
0,7,"BRICKER HALL, JOHN W    "
0,8,"BRICKER HALL, JOHN W    "
0,9,"BRICKER HALL, JOHN W    "
0,19,"BRICKER HALL, JOHN W    "
0,20,"BRICKER HALL, JOHN W    "
0,21,"BRICKER HALL, JOHN W    "
0,25,"BRICKER HALL, JOHN W    "
0,27,"BRICKER HALL, JOHN W    "
0,29,"BRICKER HALL, JOHN W    "
0,35,"BRICKER HALL, JOHN W    "
0,45,"BRICKER HALL, JOHN W    "
0,59,"BRICKER HALL, JOHN W    "
0,60,"BRICKER HALL, JOHN W    "
0,61,"BRICKER HALL, JOHN W    "
0,63,"BRICKER HALL, JOHN W    "
0,"0006M","BRICKER HALL, JOHN W    "
0,"0008A","BRICKER HALL, JOHN W    "
0,"0008B","BRICKER HALL, JOHN W    "
0,"0008C","BRICKER HALL, JOHN W    "
0,"0008D","BRICKER HALL, JOHN W    "
0,"0008E","BRICKER HALL, JOHN W    "
0,"0008F","BRICKER HALL, JOHN W    "
0,"0008G","BRICKER HALL, JOHN W    "
0,"0008H","BRICKER HALL, JOHN W    "

我要的是在所有建筑物各楼层。

What I want is all floors in all buildings.

我是用猫时,awk,sort和uniq获得这个名单虽然我有跟,在建筑名称字段,如布里克HALL,JOHN W,它是摆脱我的整个CSV问题产生。

I am using cat, awk, sort and uniq to obtain this list although I am having a problem with the "," in the building name field such as "BRICKER HALL, JOHN W" and it is throwing off my entire csv generation.

cat Buildings.csv | awk -F, '{print $1","$2}' | sort | uniq > Floors.csv 

我怎样才能得到awk来使用逗号,但忽略了一个领域的之间的逗号?或者,是否有人有更好的解决办法?

How can I get awk to use the comma but ignore a comma in between "" of a field? Alternatively, does someone have a better solution?

cat Buildings.csv | awk -f csv.awk | awk -F" -> 2|"  '{print $2}' | awk -F"|" '{print $2","$3}' | sort | uniq > floors.csv 

有我们想要使用 CSV AWK 计划,然后从那里我想用一个 - > 2 |它支持基于CSV awk程序正在格式化。打印$ 2中只打印CSV解析的内容,这是因为程序打印原线后跟 - >#,其中#是CSV解析计数。 (。即列)从那里我可以拆分此AWK CSV结果| whcih就是它取代了逗号的使用。然后排序,uniq的和管道到一个文件,并完成了!

There we want to use the csv awk program and then from there I want to use a " -> 2|" which is formatting based on the csv awk program. The print $2 there prints only the csv parsed contents, this is because the program prints the original line followed by " -> #" where # is the count parsed from csv. (Ie. the columns.) From there I can split this awk csv result on the "|" whcih is what it replaces the comma's with. Then the sort, uniq and pipe out to a file and done!

感谢您的帮助。

推荐答案

您从 csv.awk 获得额外的输出从演示code。它的意图是让你使用脚本中的函数做分析,然后输出它,你想怎么。

The extra output you're getting from csv.awk is from demo code. It's intended that you use the functions within the script to do the parsing and then output it how you want.

csv.awk 的到底是 {...} 循环这表明的功能之一。它是code,它的输出 - > 2 |

At the end of csv.awk is the { ... } loop which demonstrates one of the functions. It's that code that's outputting the -> 2|.

相反大多认为,只需调用分析函数,并做打印CSV [1],CSV [2]

Instead most of that, just call the parsing function and do print csv[1], csv[2].

然后code的那部分看起来像:

That part of the code would then look like:

{
    num_fields = parse_csv($0, csv, ",", "\"", "\"", "\\n", 1);
    if (num_fields < 0) {
        printf "ERROR: %s (%d) -> %s\n", csverr, num_fields, $0;
    } else {
#        printf "%s -> ", $0;
#        printf "%s", num_fields;
#        for (i = 0;i < num_fields;i++) {
#            printf "|%s", csv[i];
#        }
#        printf "|\n";
        print csv[1], csv[2]
    }
}

将其保存为 your_script (例如)。

不要搭配chmod + X your_script

是不必要的。此外,您还可以做排序-u 而不是排序| uniq的

And cat is unnecessary. Also, you can do sort -u instead of sort | uniq.

然后,您的命令看起来像:

Your command would then look like:

./yourscript Buildings.csv | sort -u > floors.csv

这篇关于用awk,而忽略逗号场里面解析一个CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆