使用 awk 解析 csv 并忽略字段内的逗号 [英] Parse a csv using awk and ignoring commas inside a field

查看:33
本文介绍了使用 awk 解析 csv 并忽略字段内的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 csv 文件,其中每一行定义给定建筑物中的一个房间.除了房间,每一排都有一个地板区域.我要提取的是所有建筑物的所有楼层.

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings.

我的文件看起来像这样...

My file looks like this...

"u_floor","u_room","name"
0,"00BDF","AIRPORT TEST            "
0,0,"BRICKER HALL, JOHN W    "
0,3,"BRICKER HALL, JOHN W    "
0,5,"BRICKER HALL, JOHN W    "
0,6,"BRICKER HALL, JOHN W    "
0,7,"BRICKER HALL, JOHN W    "
0,8,"BRICKER HALL, JOHN W    "
0,9,"BRICKER HALL, JOHN W    "
0,19,"BRICKER HALL, JOHN W    "
0,20,"BRICKER HALL, JOHN W    "
0,21,"BRICKER HALL, JOHN W    "
0,25,"BRICKER HALL, JOHN W    "
0,27,"BRICKER HALL, JOHN W    "
0,29,"BRICKER HALL, JOHN W    "
0,35,"BRICKER HALL, JOHN W    "
0,45,"BRICKER HALL, JOHN W    "
0,59,"BRICKER HALL, JOHN W    "
0,60,"BRICKER HALL, JOHN W    "
0,61,"BRICKER HALL, JOHN W    "
0,63,"BRICKER HALL, JOHN W    "
0,"0006M","BRICKER HALL, JOHN W    "
0,"0008A","BRICKER HALL, JOHN W    "
0,"0008B","BRICKER HALL, JOHN W    "
0,"0008C","BRICKER HALL, JOHN W    "
0,"0008D","BRICKER HALL, JOHN W    "
0,"0008E","BRICKER HALL, JOHN W    "
0,"0008F","BRICKER HALL, JOHN W    "
0,"0008G","BRICKER HALL, JOHN W    "
0,"0008H","BRICKER HALL, JOHN W    "

我想要的是所有建筑物的所有楼层.

What I want is all floors in all buildings.

我正在使用 cat、awk、sort 和 uniq 来获取此列表,尽管我在建筑物名称字段(例如BRICKER HALL, JOHN W")中遇到了,"问题,并且它会丢弃我的整个 csv一代.

I am using cat, awk, sort and uniq to obtain this list although I am having a problem with the "," in the building name field such as "BRICKER HALL, JOHN W" and it is throwing off my entire csv generation.

cat Buildings.csv | awk -F, '{print $1","$2}' | sort | uniq > Floors.csv 

如何让 awk 使用逗号但忽略字段"之间的逗号?或者,有人有更好的解决方案吗?

How can I get awk to use the comma but ignore a comma in between "" of a field? Alternatively, does someone have a better solution?

根据建议使用 awk csv 解析器的答案,我得到了解决方案:

Based on the answer provided suggesting a awk csv parser I was able to get the solution:

cat Buildings.csv | awk -f csv.awk | awk -F" -> 2|"  '{print $2}' | awk -F"|" '{print $2","$3}' | sort | uniq > floors.csv 

我们想使用 csv awk 程序,然后我想使用"-> 2|"这是基于 csv awk 程序的格式.那里的打印 $2 仅打印 csv 解析的内容,这是因为程序打印原始行后跟-> #",其中 # 是从 csv 解析的计数.(即列.)从那里我可以在|"上拆分这个 awk csv 结果whcih 是用来代替逗号的.然后排序,uniq 和管道输出到一个文件并完成!

There we want to use the csv awk program and then from there I want to use a " -> 2|" which is formatting based on the csv awk program. The print $2 there prints only the csv parsed contents, this is because the program prints the original line followed by " -> #" where # is the count parsed from csv. (Ie. the columns.) From there I can split this awk csv result on the "|" whcih is what it replaces the comma's with. Then the sort, uniq and pipe out to a file and done!

感谢您的帮助.

推荐答案

您从 csv.awk 获得的额外输出来自演示代码.您打算使用脚本中的函数进行解析,然后以您想要的方式输出.

The extra output you're getting from csv.awk is from demo code. It's intended that you use the functions within the script to do the parsing and then output it how you want.

csv.awk 的末尾是 { ... } 循环,它演示了其中一个功能.这是输出 -> 的代码.2|.

At the end of csv.awk is the { ... } loop which demonstrates one of the functions. It's that code that's outputting the -> 2|.

大多数情况下,只需调用解析函数并执行print csv[1], csv[2].

Instead most of that, just call the parsing function and do print csv[1], csv[2].

这部分代码看起来像:

{
    num_fields = parse_csv($0, csv, ",", """, """, "\n", 1);
    if (num_fields < 0) {
        printf "ERROR: %s (%d) -> %s
", csverr, num_fields, $0;
    } else {
#        printf "%s -> ", $0;
#        printf "%s", num_fields;
#        for (i = 0;i < num_fields;i++) {
#            printf "|%s", csv[i];
#        }
#        printf "|
";
        print csv[1], csv[2]
    }
}

另存为 your_script(例如).

chmod +x your_script.

而且 cat 是不必要的.此外,您可以执行 sort -u 而不是 sort |uniq.

And cat is unnecessary. Also, you can do sort -u instead of sort | uniq.

您的命令将如下所示:

./yourscript Buildings.csv | sort -u > floors.csv

这篇关于使用 awk 解析 csv 并忽略字段内的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆