如何限制一个查找和CSV内更换到只有一列? [英] How to restrict a find and replace to only one column within a CSV?

查看:178
本文介绍了如何限制一个查找和CSV内更换到只有一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个4列的CSV文件,例如:

  0001 @鱼@ @动物吃虫子

我用 SED 做一个查找和对文件替换,但我需要限制这个查找和替换只内第3列找到的文本。

我怎么能有一个查找和替换只有在此一列发生的?


解决方案

您确定要使用 SED ?怎么样 csvfix ?是您的CSV简单好用不带引号或嵌入的逗号或其他脏东西,使正则表达式...对付一般的CSV文件的不到满意的方式?我假设的 @ 是在你的格式逗号。

考虑使用 AWK 而不是 SED

  awk的-F @'$ 3〜/模式/ {OFS =@; $ 3 =替换; }

按理说,你应该有一个BEGIN块,设置OFS一次。对于输入的一条线,它没有做任何的赔率(你也许会是pssed衡量上万行输入相差硬$ P $,太):

  $回声模式@模式@模式@模式|
> awk的-F @'$ 3〜/模式/ {OFS =@; $ 3 =替换; }
模式@模式替换为@ @模式
$

如果 SED 似乎仍然有吸引力,那么:

  SED/^\\([^@]*@[^@]*\\)@pattern@\\(.*\\)/小号// \\ 1 @代替@ \\ 2 /

例如(并注意略有不同的输入和输出&ndash的;您可以修复它​​来处理相同 AWK 非常的容易,如果需要的话):

  $回声模式@模式@模式@模式|
> SED/^\\([^@]*@[^@]*\\)@pattern@\\(.*\\)/小号// \\ 1 @代替@ \\ 2 /'
模式@模式替换为@ @模式
$

第一正则表达式查找一个行的开始,一个场非在-标志,一个在符号,另一个场非在-标志和记住大量;它寻找一个在符号,图案(其必须在第三场,因为前两个字段已经被匹配的),另外在符号,然后该线的残基。当行匹配,则它替换与前两个字段线(不变,如需要),然后将更换第三字段,并且该线的残基(不变,根据需要)。

如果您需要编辑,而不是简单地替换第三个字段,那么你想想使用 AWK 或Perl或Python。如果你还只限于 SED ,那么你探索利用保留空间,而你操纵模式空间的另一部分压阵的一部分,并最终重新从整合保留空间和模式空间所需的输出线在打印前行。这是几乎一样凌乱因为它的声音;实际上,甚至可能比混乱听起来。我会用Perl去(因为我很久以前就学会它,它很容易做这样的事情),但你可以使用任何非 - SED 工具,你喜欢<。 / p>


的Perl编辑第三字段。需要注意的是默认的输出为 $ _ 这必须从阵列中的 @F 自动拆分重组领域

  $回声模式@模式@模式@模式| SH -x xxx.pl
&GT; perl的-pa -F @ -e'$ F [2] =〜S / \\ S * PAT(\\ W \\ W)R \\ S * / prefix- $ 1后缀/; $ _ =加入@,@F; '$ @
模式@模式@ preFIX-TE后缀@pattern
$

的说明。的的 -p 的意思是循环,阅读行到 $ _ 和印刷 $ _ 在每次迭代的结束。在 -a 表示自动分割 $ _ 入阵 @F 。在 -F @ 表示分隔符是 @ 。在 -e 之后是Perl程序。数组是从0开始在Perl索引,所以第三场被分成 $ F [2] (的印记&mdash;以下 @ $ &MDASH;这取决于你从数组或数组作为一个整体价值的工作改变了 =〜 是匹配操作符,它适用于RHS的正则表达式的LHS值的替代模式识别零个或多个空格 \\ s * 。其次是然后它会被记住到 $ 1 ,那么 RN <两个字字/ code>和零个或多个空格再次,也许应该有一个 ^ $ 在那里绑定到域的开始和结束,该替换是一个空间,'prefix-',记住的一对字母,和'-suffix'和一个空间,其中<​​code> $ _ =加入@,@F; 重组输入行 $ _ 从可能被修改单独的域,然后在 - p 打印了这一点。没有这么整齐,我想(所以有可能是一个更好的方式来做到这一点),但它的工作原理。你也可以做在Perl任意域任意变换没有太大的困难。 Perl中也有一个模块文字:: CSV (和高速C版本,正文:: CSV_XS )的可以处理非常复杂的CSV文件。

I have a 4-column CSV file, e.g.:

0001 @ fish @ animal @ eats worms

I use sed to do a find and replace on the file, but I need to limit this find and replace to only the text found inside column 3.

How can I have a find and replace only occur on this one column?

解决方案

Are you sure you want to be using sed? What about csvfix? Is your CSV nice and simple with no quotes or embedded commas or other nasties that make regexes...a less than satisfactory way of dealing with a general CSV file? I'm assuming that the @ is the 'comma' in your format.

Consider using awk instead of sed:

awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'

Arguably, you should have a BEGIN block that sets OFS once. For one line of input, it didn't make any odds (and you'd probably be hard-pressed to measure a difference on a million lines of input, too):

$ echo "pattern @ pattern @ pattern @ pattern" | 
> awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'
pattern @ pattern @replace@ pattern
$

If sed still seems appealing, then:

sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'

For example (and note the slightly different input and output – you can fix it to handle the same as the awk quite easily if need be):

$ echo "pattern@pattern@pattern@pattern" |
> sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'
pattern@pattern@replace@pattern
$

The first regex looks for the start of a line, a field of non-at-signs, an at-sign, another field of non-at-signs and remembers the lot; it looks for an at-sign, the pattern (which must be in the third field since the first two fields have been matched already), another at-sign, and then the residue of the line. When the line matches, then it replaces the line with the first two fields (unchanged, as required), then adds the replacement third field, and the residue of the line (unchanged, as required).

If you need to edit rather than simply replace the third field, then you think about using awk or Perl or Python. If you are still constrained to sed, then you explore using the hold space to hold part of the line while you manipulate the other part in the pattern space, and end up re-integrating your desired output line from the hold space and pattern space before printing the line. That's nearly as messy as it sounds; actually, possibly even messier than it sounds. I'd go with Perl (because I learned it long ago and it does this sort of thing quite easily), but you can use whichever non-sed tool you like.


Perl editing the third field. Note that the default output is $_ which had to be reassembled from the auto-split fields in the array @F.

$ echo "pattern@pattern@pattern@pattern" | sh -x xxx.pl
> perl -pa -F@ -e '$F[2] =~ s/\s*pat(\w\w)rn\s*/ prefix-$1-suffix /; $_ = join "@", @F; ' "$@"
pattern@pattern@ prefix-te-suffix @pattern
$

An explanation. The -p means 'loop, reading lines into $_ and printing $_ at the end of each iteration'. The -a means 'auto-split $_ into the array @F'. The -F@ means the field separator is @. The -e is followed by the Perl program. Arrays are indexed from 0 in Perl, so the third field is split into $F[2] (the sigil — the @ or $ — changes depending on whether you're working with a value from the array or the array as a whole. The =~ is a match operator; it applies the regex on the RHS to the value on the LHS. The substitute pattern recognizes zero or more spaces \s* followed by pat then two 'word' characters which are remembered into $1, then rn and zero or more spaces again; maybe there should be a ^ and $ in there to bind to the start and end of the field. The replacement is a space, 'prefix-', the remembered pair of letters, and '-suffix' and a space. The $_ = join "@", @F; reassembles the input line $_ from the possibly modified separate fields, and then the -p prints that out. Not quite as tidy as I'd like (so there's probably a better way to do it), but it works. And you can do arbitrary transforms on arbitrary fields in Perl without much difficulty. Perl also has a module Text::CSV (and a high-speed C version, Text::CSV_XS) which can handle really complex CSV files.

这篇关于如何限制一个查找和CSV内更换到只有一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆