Haskell一次通过正则表达式替换多个子字符串 [英] Haskell replace multiple substrings with regex in a single pass

查看:106
本文介绍了Haskell一次通过正则表达式替换多个子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个字符串s和一个字符串ts元组列表,其中该元组中的第一个元素是s的子字符串,我想用相应的第二个元素替换.在我的情况下,字符串s始终是用空格分隔的不同单词的列表;此外,我希望将s中的每个单词全部替换为ts中的相应值(我希望我的意图在这里很清楚).第一次尝试可能是:

Suppose I have a string s and a list of tuples of strings ts, where the first element in the tuple is the substring of s I'd like to replace with the corresponding second element. In my case, the string s will always be a space-separated list of distinct words; moreover, I wish to replace each word in s in its entirety with the corresponding value in ts (I hope my intent is clear here). A first attempt at this might be:

import qualified Text.Regex as R -- from regex-compat    

replaceAllIn :: String -> [(String, String)] -> String
replaceAllIn = foldl (\acc (k, v) -> R.subRegex (R.makeRegex k) acc v)

当一个键是另一个键的子串时,这当然不起作用

This, of course, doesn't work when one key is a substring of another

λ> s = "blah blahblee"
λ> ts = [("blah", "asdf"), ("blahblee", ";lkj")]
λ> replaceAllIn s ts
"asdf asdfblee"

因为在第一次通过时第一个键替换了两次出现的"blah",而在第二次通过时剩下的字符串不再与"blahblee"匹配.

because the first key replaces both occurrences of "blah" upon the first pass, leaving a string that no longer has anything matching "blahblee" for the second pass.

有没有一种方法可以一次通过字符串来实现我想要的?还是有一种内置的方法(在某个库中的某个地方)一次替换多个模式?

Is there a way to achieve what I want in one pass through the string? Or is there a built-in way (in some library somewhere) to replace multiple patterns at once?

编辑:发布后,我立即意识到我不知道为什么在这里使用正则表达式.但是,如果我用MissingH的Data.String.Utils中的replace之类的东西替换正则表达式,此问题仍然有效.

Immediately after posting I realized I don't know why I'm using regex here. But the question remains valid if I replaced regex substitution with something like replace from MissingH's Data.String.Utils.

推荐答案

在前面的评论中,您可以先处理最长要替换的最长字符串:

Expanding on my comment from earlier, you can just process the longest string to be replaced first:

λ> s = "blah blahblee"
λ> ts = [("blah", "asdf"), ("blahblee", ";lkj")]
λ> import qualified Data.List as L
λ> replaceAllIn s (L.sortOn fst ts)
"asdf ;lkj"

这篇关于Haskell一次通过正则表达式替换多个子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆