用换行符/段落标记替换所有空格以制作单词列表 [英] Replace all whitespace with a line break/paragraph mark to make a word list

查看:82
本文介绍了用换行符/段落标记替换所有空格以制作单词列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为我们在课堂上翻译的希腊语文本列出词汇表.我想用段落标记替换每个空格或制表符,以便每个单词都出现在自己的行上.谁能给我 sed 命令,并解释我在做什么?我仍在努力弄清楚 sed.

I am trying to vocab list for a Greek text we are translating in class. I want to replace every space or tab character with a paragraph mark so that every word appears on its own line. Can anyone give me the sed command, and explain what it is that I'm doing? I’m still trying to figure sed out.

推荐答案

对于相当现代的 sed 版本,编辑标准输入以生成标准输出

For reasonably modern versions of sed, edit the standard input to yield the standard output with

$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g'
τέχνη
βιβλίο
γη
κήπος

如果你的词汇单词在名为 lesson1lesson2 的文件中,将 sed 的标准输出重定向到文件 all-vocab

If your vocabulary words are in files named lesson1 and lesson2, redirect sed’s standard output to the file all-vocab with

sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab

含义:

  • 字符类 [[:blank:]] 匹配单个空格字符或单个制表符.
    • 改用 [[:space:]] 来匹配任何单个空白字符(通常是空格、制表符、换行符、回车、换页和垂直制表符).
    • + 量词表示匹配一个或多个前面的模式.
    • 所以 [[:blank:]]+ 是一个或多个全是空格或制表符的字符的序列.
    • The character class [[:blank:]] matches either a single space character or a single tab character.
      • Use [[:space:]] instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab).
      • The + quantifier means match one or more of the previous pattern.
      • So [[:blank:]]+ is a sequence of one or more characters that are all space or tab.

      对于那些熟悉 Perl 兼容的正则表达式和支持 PCRE 的 sed 的人,使用 \s+ 来匹配至少一个空白字符的运行,如

      For those familiar with Perl-compatible regexes and a PCRE-capable sed, use \s+ to match runs of at least one whitespace character, as in

      sed -E -e 's/\s+/\n/g' old > new
      

      sed -e 's/\s\+/\n/g' old > new
      

      这些命令从文件 old 读取输入并将结果写入当前目录中名为 new 的文件.

      These commands read input from the file old and write the result to a file named new in the current directory.

      第 7 版 Unix 以来,几乎可以使用任何版本的 sed,命令调用是一个多一点巴洛克风格.

      Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.

      $ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\
      /g'
      τέχνη
      βιβλίο
      γη
      κήπος
      

      注意事项:

      • 在这里,我们甚至不假设存在不起眼的 + 量词,而是用单个空格或制表符 ([ \t]) 后跟零来模拟它或更多 ([ \t]*).
      • 同样,假设 sed 不理解 \n 换行符,我们必须将其逐字包含在命令行中.
        • \ 和命令第一行的结尾是一个继续标记,用于转义紧随其后的换行符,命令的其余部分在下一行.
          • 注意:在转义的换行符之前不能有空格.也就是说,第一行的结尾必须完全反斜杠后跟行尾.
          • Here we do not even assume the existence of the humble + quantifier and simulate it with a single space-or-tab ([ \t]) followed by zero or more of them ([ \t]*).
          • Similarly, assuming sed does not understand \n for newline, we have to include it on the command line verbatim.
            • The \ and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line.
              • Note: There must be no whitespace preceding the escaped newline. That is, the end of the first line must be exactly backslash followed by end-of-line.

              以上所有命令都使用单引号 ('') 而不是双引号 ("").考虑:

              The commands above all used single quotes ('') rather than double quotes (""). Consider:

              $ echo '\\\\' "\\\\"
              \\\\ \\
              

              也就是说,与双引号字符串相比,shell 对单引号字符串应用不同的转义规则.您通常希望使用引号保护正则表达式中常见的所有反斜杠.

              That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.

              这篇关于用换行符/段落标记替换所有空格以制作单词列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆