Linux Shell的音译脚本 [英] Transliteration script for linux shell

查看:133
本文介绍了Linux Shell的音译脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个.txt文件,其中包含字母文本;我想将文本音译成另一个字母;字母1的某些字符与字母2的某些字符是1:1(即 a 变为 e ),而其他字符是1:2(即 x 变为 ch ).

I have multiple .txt files containing text in an alphabet; I want to transliterate the text into an other alphabet; some characters of alphabet1 are 1:1 with those of alphabet2 (i.e. a becomes e), whereas others are 1:2 (i.e. x becomes ch).

我想使用一个简单的Linux shell脚本来做到这一点.

I would like to do this using a simple script for the Linux shell.

使用 tr sed ,我可以转换1:1字符:

With tr or sed I can convert 1:1 characters:

sed -f y/abcdefghijklmnopqrstuvwxyz/nopqrstuvwxyzabcdefghijklm/

a 将变为 n b 将变为 o 等(我认为是凯撒的密码)

a will become n, b will become o et cetera (a Caesar's cipher, I think)

但是我该如何处理1:2个字符?

But how can I deal with 1:2 characters?

推荐答案

不是答案,只是为了展示一种简短,惯用的方式,用@konsolebox的答案填充table[]数组,如相关注释中所述:

Not an answer, just to show a briefer, idiomatic way to populate the table[] array from @konsolebox's answer as discussed in the related comments:

BEGIN {
    split("a  e b", old)
    split("x ch o", new)
    for (i in old)
        table[old[i]] = new[i]
    FS = OFS = ""
}

因此,清晰地显示了旧字符到新字符的映射,因为第一个split()中的字符被映射到它下面的字符,对于任何其他映射,您只需要更改字符串即可)中的内容,请不要更改table []的26种形式的显式分配.

so the mapping of old to new chars is clearly shown in that the char in the first split() is mapped to the char(s) below it and for any other mapping you want you just need to change the string(s) in the split(), not change 26-ish explicit assignments to table[].

您甚至可以创建通用脚本来进行映射,只需将新旧字符串作为变量传递即可.

You can even create a general script to do mappings and just pass in the old and new strings as variables:

BEGIN {
    split(o, old)
    split(n, new)
    for (i in old)
        table[old[i]] = new[i]
    FS = OFS = ""
}

然后在shell中添加如下内容:

then in shell anything like this:

old="a  e b"
new="x ch o"
awk -v o="$old" -v b="$new" -f script.awk file

并且您可以保护自己免受填充字符串的错误影响,例如:

and you can protect yourself from your own mistakes populating the strings, e.g.:

BEGIN {
    numOld = split(o, old)
    numNew = split(n, new)

    if (numOld != numNew) {
        printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
        exit 1
    }

    for (i=1; i <= numOld; i++) {
        if (old[i] in table) {
            printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
            exit 1
        }
        if (newvals[new[i]]++) {
            printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
        }
        table[old[i]] = new[i]
    }
}

请问您是否写过b映射到x,然后又错误地将b映射到y,这是不是很好?上面确实是最好的方法,但这当然是您的要求.

Wouldn't it be good to know if you wrote that b maps to x and then later mistakenly wrote that b maps to y? The above really is the best way to do this but your call of course.

这是一个完整的解决方案,如下面的评论中所述

Here's one complete solution as discussed in the comments below

BEGIN {
    numOld = split("a  e b", old)
    numNew = split("x ch o", new)

    if (numOld != numNew) {
        printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
        exit 1
    }

    for (i=1; i <= numOld; i++) {
        if (old[i] in table) {
            printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
            exit 1
        }
        if (newvals[new[i]]++) {
            printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
        }
        map[old[i]] = new[i]
    }

    FS = OFS = ""
}
{
    for (i = 1; i <= NF; ++i) {
        if ($i in map) {
            $i = map[$i]
        }
    }
    print
}

我将table数组重命名为map只是因为iMHO可以更好地表示该数组的用途.

I renamed the table array as map just because iMHO that better represents the purpose of the array.

将以上内容保存在文件script.awk中,并以awk -f script.awk inputfile

save the above in a file script.awk and run it as awk -f script.awk inputfile

这篇关于Linux Shell的音译脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆