我可以使用 grep 提取 CSV 文件的单列吗? [英] Can I use grep to extract a single column of a CSV file?

查看:22
本文介绍了我可以使用 grep 提取 CSV 文件的单列吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决我必须尽快解决的问题.我有一个 csv 文件,字段以 ; 分隔.我被要求使用 grep 制作一个 shell 命令,以使用正则表达式仅列出第三列.我不能使用剪切.这是一个练习.

I'm trying to solve o problem I have to do as soon as possible. I have a csv file, fields separated by ;. I'm asked to make a shell command using grep to list only the third column, using regex. I can't use cut. It is an exercise.

我的文件是这样的:

1;Evan;Bell;39;Obigod Manor;Ekjipih;TN;25008
2;Wayne;Watkins;22;Lanme Place;Cotoiwi;NC;86578
3;Danny;Vega;25;Fofci Center;Momahbih;MS;21027
4;Larry;Robinson;23;Bammek Boulevard;Gaizatoh;NE;27517
5;Myrtie;Black;20;Savon Square;Gokubpat;PA;92219
6;Nellie;Greene;23;Utebu Plaza;Rotvezri;VA;17526
7;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
8;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
9;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
10;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
11;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302
12;Stanley;Tucker;54;Cure View;Woocabu;OH;45475
13;Lina;Holloway;41;Sajric River;Furutwe;ME;62184
14;Hettie;Carlson;57;Zuheho Pike;Gokrobo;PA;89098
15;Maud;Phelps;57;Lafni Drive;Gokemu;MD;87066
16;Della;Roberson;53;Zafe Glen;Celoshuv;WV;56749
17;Cory;Roberson;56;Riltav Manor;Uwsupep;LA;07983
18;Stella;Hayes;30;Omki Square;Figjitu;GA;35813
19;Robert;Griffin;22;Kiroc Road;Wiregu;OH;39594
20;Clyde;Reynolds;19;Lupow Ridge;Kedkuha;WI;29749
21;Calvin;Reyes;47;Paad Loop;Beejdij;KS;29247
22;Douglas;Graves;43;Gouk Square;Sekolim;NY;13226
23;Josephine;Estrada;48;Ocgig Pike;Beheho;WI;87305
24;Eugene;Matthews;26;Daew Drive;Riftemij;ME;93302

我想我应该使用类似的东西:cat <测试.csv |grep '正则表达式'.

I think I should use something like: cat < test.csv | grep 'regex'.

谢谢.

推荐答案

Right Tools For The Job: Using awk or cut

假设您想将第三列与特定字段匹配:

Right Tools For The Job: Using awk or cut

Assuming you want to match the third column against a specific field:

awk -F';' '$3 ~ /Foo/ { print $0 }' file.txt

...将打印第三个字段包含 Foo 的任何行.(将 print $0 更改为 print $3 将仅打印第三个字段).

...will print any line where the third field contains Foo. (Changing print $0 to print $3 would print only that third field).

如果你只想打印第三列,使用 cut: cut -d';'-f3

If you just want to print the third column regardless, use cut: cut -d';' -f3 <file.txt

grep 具有 -o 选项的系统上,您可以将两个实例链接在一起——一个用于修剪第四列之后的所有内容(并删除较少的行比四列),另一个只取最后一列(因此,第四列):

On a system where grep has the -o option, you can chain two instances together -- one to trim everything after the fourth column (and remove lines with less than four columns), another to take only the last remaining column (thus, the fourth):

str='foo;bar;baz;qux;meh;whatever'
grep -Eo '^[^;]*[;][^;]*[;][^;]*[;][^;]*' <<<"$str" 
  | grep -Eo '[^;]+$'

解释它是如何工作的:

  • ^,在方括号之外,只匹配行首.
  • [^;]* 匹配除 ; 零次或多次以外的任何字符.
  • [;] 只匹配字符 ;.
  • ^, outside of square brackets, matches only at the beginning of a line.
  • [^;]* matches any character except ; zero-or-more times.
  • [;] matches only the character ;.

...因此,正则表达式中的每个 [^;]*[;] 都匹配单个字段,无论该字段是否包含文本.将其中四个放在第一阶段意味着我们只匹配字段,并且 grep -o 告诉 grep 只发出它能够成功匹配的内容.

...thus, each [^;]*[;] in the regex matches a single field, whether or not that field contains text. Putting four of those in the first stage means we're matching only fields, and grep -o tells grep to only emit content it was successfully able to match.

这篇关于我可以使用 grep 提取 CSV 文件的单列吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆