对csv使用sed对文本文件 [英] Using sed on text files with a csv
问题描述
我一直在尝试使用csv在两个文本文件上进行批量查找和替换。我已经看到了SO的问题,没有人回答我的问题。
I've been trying to do bulk find and replace on two text files using a csv. I've seen the questions that SO suggests, and none seem to answer my question.
我为要修改的两个文本文件创建了两个变量。 csv有两列和几百行。第一列包含文本文件中已经包含的字符串(无空格),需要用第二列中相同行中的相应字符串替换。
I've created two variables for the two text files I want to modify. The csv has two columns and hundreds of rows. The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column.
作为测试,我试过脚本
#!/bin/bash
test1='long_file_name.txt'
find='string1'
replace='string2'
sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1
这是成功的,除了我需要为每一个行在csv中,使用由每个行中的csv给出的值。我的希望是我的while循环错误地使用,但我找不到错误。当我执行下面的脚本,我得到命令行提示,这使我认为发生了一些事情。当我检查文本文件,没有什么改变。
This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. My hunch is that my while loop was used wrongly, but I can't find the error. When I execute the script below, I get the command line prompt, which makes me think that something has happened. When I check the text files, nothing's changed.
这两个文本文件,这个脚本和csv都在同一个文件夹我这样做)。
The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this).
#!/bin/bash
textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'
while IFS=, read f1 f2
do
sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
mv $textfile1.tmp $textfile1
sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
mv $textfile2.tmp $textfile2
done <'findreplace.csv'
$ b b
在我看来,这段代码应该做我想做的(但不是);也许我误解了一些基本的东西(我是新的bash脚本)?
It seems to me that this code should do what I want it to do (but doesn't); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)?
csv看起来像这样,但有几百行。所有a_i应在其下一列中替换为对应的b_i。
The csv looks like this, but with hundreds of rows. All a_i's should be replaced with their counterpart b_i in the next column over.
a_1 b_1
a_2 b_2
a_3 b_3
以防万一这影响某事。我尝试用大括号a $ $ {var}包装变量名,但它仍然不起作用。
Something to note: All the strings actually contain underscores, just in case this affects something. I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work.
我很感激解决方案,但我也好奇地知道为什么上面的不工作。 (
I appreciate the solutions, but I'm also curious to know why the above doesn't work. (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!)
推荐答案
如果你要处理大量的数据,你的模式可以包含一个特殊字符,我会考虑使用Perl。特别是如果你要在 findreplace.csv
中有很多对。您可以使用以下脚本作为过滤器或就地修改与很多文件。作为副作用,它将加载替换并且每次调用仅创建一次Aho-Corrasic自动机,这将使得该解决方案相当高效( O(M + N)
而不是 O(M * N)
)。
If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. Especially if you are going to have a lot of pairs in findreplace.csv
. You can use following script as filter or in-place modification with lot of files. As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient (O(M+N)
instead of O(M*N)
in your solution).
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $in_place = ( @ARGV and $ARGV[0] =~ /^-i(.*)/ )
? do {
shift;
my $backup_extension = $1;
my $backup_name = $backup_extension =~ /\*/
? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
: sub { shift . $backup_extension };
my $oldargv = '-';
sub {
if ( $ARGV ne $oldargv ) {
rename( $ARGV, $backup_name->($ARGV) );
open( ARGVOUT, '>', $ARGV );
select(ARGVOUT);
$oldargv = $ARGV;
}
};
}
: sub { };
die "$0: File with replacements required." unless @ARGV;
my ( $re, %replace );
do {
my $filename = shift;
open my $fh, '<', $filename;
%replace = map { chomp; split ',', $_, 2 } <$fh>;
close $fh;
$re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
};
while (<>) {
$in_place->();
s/$re/$replace{$1}/g;
}
continue {print}
用法:
./replace.pl replace.csv <file.in >file.out
以及
./replace.pl replace.csv file.in >file.out
或就地
./replace.pl -i replace.csv file1.csv file2.csv file3.csv
或备份
./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv
或备份白金占位符
./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv
这篇关于对csv使用sed对文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!