对csv使用sed对文本文件 [英] Using sed on text files with a csv

查看:185
本文介绍了对csv使用sed对文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用csv在两个文本文件上进行批量查找和替换。我已经看到了SO的问题,没有人回答我的问题。

I've been trying to do bulk find and replace on two text files using a csv. I've seen the questions that SO suggests, and none seem to answer my question.

我为要修改的两个文本文件创建了两个变量。 csv有两列和几百行。第一列包含文本文件中已经包含的字符串(无空格),需要用第二列中相同行中的相应字符串替换。

I've created two variables for the two text files I want to modify. The csv has two columns and hundreds of rows. The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column.

作为测试,我试过脚本

#!/bin/bash

test1='long_file_name.txt'
find='string1'
replace='string2'

sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1

这是成功的,除了我需要为每一个行在csv中,使用由每个行中的csv给出的值。我的希望是我的while循环错误地使用,但我找不到错误。当我执行下面的脚本,我得到命令行提示,这使我认为发生了一些事情。当我检查文本文件,没有什么改变。

This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. My hunch is that my while loop was used wrongly, but I can't find the error. When I execute the script below, I get the command line prompt, which makes me think that something has happened. When I check the text files, nothing's changed.

这两个文本文件,这个脚本和csv都在同一个文件夹我这样做)。

The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this).

#!/bin/bash

textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'

while IFS=, read f1 f2
do
    sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
         mv $textfile1.tmp $textfile1
    sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
         mv $textfile2.tmp $textfile2
done <'findreplace.csv'


$ b b

在我看来,这段代码应该做我想做的(但不是);也许我误解了一些基本的东西(我是新的bash脚本)?

It seems to me that this code should do what I want it to do (but doesn't); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)?

csv看起来像这样,但有几百行。所有a_i应在其下一列中替换为对应的b_i。

The csv looks like this, but with hundreds of rows. All a_i's should be replaced with their counterpart b_i in the next column over.

a_1 b_1
a_2 b_2
a_3 b_3

以防万一这影响某事。我尝试用大括号a $ $ {var}包装变量名,但它仍然不起作用。

Something to note: All the strings actually contain underscores, just in case this affects something. I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work.

我很感激解决方案,但我也好奇地知道为什么上面的不工作。 (

I appreciate the solutions, but I'm also curious to know why the above doesn't work. (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!)

推荐答案

如果你要处理大量的数据,你的模式可以包含一个特殊字符,我会考虑使用Perl。特别是如果你要在 findreplace.csv 中有很多对。您可以使用以下脚本作为过滤器或就地修改与很多文件。作为副作用,它将加载替换并且每次调用仅创建一次Aho-Corrasic自动机,这将使得该解决方案相当高效( O(M + N)而不是 O(M * N))。

If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. Especially if you are going to have a lot of pairs in findreplace.csv. You can use following script as filter or in-place modification with lot of files. As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient (O(M+N) instead of O(M*N) in your solution).

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my $in_place = ( @ARGV and $ARGV[0] =~ /^-i(.*)/ )
    ? do {
    shift;
    my $backup_extension = $1;
    my $backup_name      = $backup_extension =~ /\*/
        ? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
        : sub { shift . $backup_extension };
    my $oldargv = '-';
    sub {
        if ( $ARGV ne $oldargv ) {
            rename( $ARGV, $backup_name->($ARGV) );
            open( ARGVOUT, '>', $ARGV );
            select(ARGVOUT);
            $oldargv = $ARGV;
        }
    };
    }
    : sub { };

die "$0: File with replacements required." unless @ARGV;
my ( $re, %replace );
do {
    my $filename = shift;
    open my $fh, '<', $filename;
    %replace = map { chomp; split ',', $_, 2 } <$fh>;
    close $fh;
    $re = join '|', map quotemeta, keys %replace;
    $re = qr/($re)/;
};

while (<>) {
    $in_place->();
    s/$re/$replace{$1}/g;
}
continue {print}

用法:

./replace.pl replace.csv <file.in >file.out

以及

./replace.pl replace.csv file.in >file.out

或就地

./replace.pl -i replace.csv file1.csv file2.csv file3.csv

或备份

./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv

或备份白金占位符

./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv

这篇关于对csv使用sed对文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆