JavaScript正则表达式替换多个字母 [英] JavaScript Regular Expression Replace Multiple Letters

查看:207
本文介绍了JavaScript正则表达式替换多个字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码用其补充替换DNA串,其中A = T和C <=> G.我用正则表达式的非常基本的知识来做这个。如何使用正则表达式重构以下内容以捕获字母并将其替换为补码。

I have the following code to replace a string of DNA with its complement where A <=> T and C <=> G. I do this with very basic knowledge of regular expressions. How can I refactor the following using regular expressions to capture the letter and replace it with its complement.

function DNA(strand) {
    return strand.replace(/A|T|C|G/g, x => {
        return (x=="A") ? "T" : (x=="T") ? "A" (x=="C") ? "G" : "C";
    });
}


推荐答案

这是相当不优雅的,IMO ,但它是一个(两个?)步骤替换算法,使用javascript正则表达式功能 - 如果你有兴趣,我可以解释它正在做什么

This is rather inelegant, IMO, but it is a one(two?) step replacement algorithm that uses javascript regex capabilities - if you're interested, I can explain what the heck it's doing

function DNA(strand) {
    return strand
        .concat("||TACG")
        .replace(/A(?=.*?(T))|T(?=.*?(A))|C(?=.*?(G))|G(?=.*?(C))|\|\|....$/gi, "$1$2$3$4");
}

参见这个小提琴(现在更新了一些可测试性)来玩它。

See this fiddle (now updated a bit for testability) to play around with it.

这看起来很简单构建正则表达式的例子,但它并不是真的(如果你想让它全部都在正则表达式中,那就是)。使用简单的映射表(散列表),拆分字符,重新映射/转换它们以及将它们连接在一起(如@Jared Smith所做的那样)会更有效率,因为正则表达式引擎效率不高。如果这仅用于个人兴趣和学习正则表达式,那么请随时询问任何必要的解释。

This might seem like a simple example for which to build a regex, but it's not really (if you want it to all be in the regex, that is). It would be far more efficient to use a simple mapping table (hashtable), split the characters, remap/translate them, and join them together (as @Jared Smith did), since the regex engine is not very efficient. If this is solely for personal interest and learning regex, then please feel free to ask for any required explanation.

编辑jwco:

正如我所说,对于生产级别的解决方案来说,这是相当不优雅的(或者至少是低效率的),但作为艺术作品(?)可能相当优雅。它只使用JavaScript正则表达式(Regexp)功能,因此没有正则表达式条件或后视,如果JavaScript支持自由间距,您实际上可以使用正则表达式如下所示

As I stated, this is rather inelegant (or at least inefficient) for a production level solution, but perhaps rather elegant as an art piece(?). It uses only JavaScript regex(Regexp) capabilities, so no "regular expression conditions" or "look-behind", and if JavaScript supported "free-spacing", you could actually use the regex as shown below.

这是分解正则表达式组件的一种相对常见的方法,用于解释每个部分匹配,查找和捕获的内容:

This is a relatively common way of breaking down components of a regex to explain what each part is matching, looking for and capturing:

  A         #  Match an A, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (T)     #    Match and capture a T, literally (into group #1)
  )         #  End look-ahead
|           #-OR-
  T         #  Match a T, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (A)     #    Match and capture an A, literally (into group #2)
  )         #  End look-ahead
|           #-OR-
  C         #  Match a C, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (G)     #    Match and capture a G, literally (into group #3)
  )         #  End look-ahead
|           #-OR-
  G         #  Match a G, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (C)     #    Match and capture a C, literally (into group #4)
  )         #  End look-ahead
|           #-OR-
 \|\|....$  #  match two literal pipes (|), followed by four of any character and the end of the string

此表达式匹配的任何内容(应该是整个字符串的每个部分)都将替换为替换表达式 $ 1 $ 2 $ 3 $ 4 。 全局标志( / gi 中的 g )将使其一直尝试匹配更多的是要测试的字符串。

Anything matched by this expression (which should be every part of the entire string) will be replaced by the replacement expression $1$2$3$4. The "global" flag (the g in the /gi) will make it keep trying to match as long as there is more of the string to test.

表达式由五个可能的选项组成(每个可能的字母开关一个,然后是清理匹配)。除了匹配的特定字母之外,前四个选项是相同的。每个匹配并消耗一个特定的所需字母,然后在字符串中向前看以找到其翻译或补充,捕获它而不消耗任何其他,然后完成作为成功的替代因此,满足整个表达式。

The expression is made up of five possible options (one for each possible letter switch and then a "cleanup" match). The first four options are identical except for the particular letters matched. Each of these matches and consumes a particular desired letter, then "looks ahead" in the string to find its "translation" or "complement", captures it without consuming anything else, then completes as a successful alternative, thus satisfying the expression as a whole.

由于只有一个匹配组(1-4)可以匹配任何成功的测试字母,因此只有一个后向引用($ code> $ 1 等 $ 1 $ 2 $ 3 $ 4 )可能包含捕获的值。对于第五个选项( \ | \ | .... $ ),没有捕获,因此没有捕获组包含一个值替换匹配。

Since only one of the matching groups (1-4) could have matched for any successful tested letter, only one of the backreferences ($1, etc in $1$2$3$4) could possibly contain a captured value. In the case of the fifth option (\|\|....$), there is no capture, so none of the capture groups contain a value with which to replace the match.

在被送入正则表达式引擎之前,字符串 || TACG 被附加到来源,类似于端粒... ... sorta ... - 如果源字符串在较早的位置(或根本没有?!)中包含相应的补充字母,则提供替换源。正则表达式中的最后一个选项通过匹配并将其替换为空来有效地删除了这些无关的信息。

Before being fed into the regex engine, the string ||TACG is appended to the source, kind of like a telomere... ... sorta... -- this provides a replacement source, if the source string does not contain the appropriate "complement" letter in an earlier position (or at all?!). The last option in the regex effectively removes this extraneous information, by matching it and replacing it with nothing.

这可以针对任何替换集进行,但是会减少随着更多更改的附加效率降低。正如一位评论者(我希望是快活的)威胁所表明的那样,这种正则表达式的可维护性也是如此......嗯......这将是一个挑战。享受!

This could be done for any set of replacements, but gets less and less efficient as more changes are appended. Maintainability for such a regex would also, as indicated by a certain commenter's (I hope jovial) threat, ummm.... it would be a challenge. Enjoy!

这篇关于JavaScript正则表达式替换多个字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆