使用grep进行模糊字符串匹配 [英] fuzzy string matching with grep

查看:319
本文介绍了使用grep进行模糊字符串匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配包含字符串的文件中的行,例如 ACTGGGTAAACTA.如果我愿意

I am trying to match rows in a file containing a string say ACTGGGTAAACTA. If I do

grep "ACTGGGTAAACTA" file 

它给了我完全匹配的行.有没有办法允许一定数量的错配(替换、插入或删除)?例如,我正在寻找序列

It gives me rows which have exact matches. Is there a way to allow for certain number of mismatches (substitutions, insertions or deletions)? For example, I am looking for sequences

  1. 最多 3 个允许的替代词,例如AGTGGGTAACCAA"等.

  1. Up to 3 allowed subtitutions like "AGTGGGTAACCAA" etc.

插入/删除(部分匹配,如ACTGGGAAAATAAACTA"或ACTAAAACTA")

Insertions/deletions (having a partial match like "ACTGGGAAAATAAACTA" or "ACTAAACTA")

推荐答案

曾经有一个工具叫做 agrep 用于模糊正则表达式匹配,但它被放弃了.

There used to be a tool called agrep for fuzzy regex matching, but it got abandoned.

http://en.wikipedia.org/wiki/Agrep 有一些历史以及相关工具的链接.

http://en.wikipedia.org/wiki/Agrep has a bit of history and links to related tools.

https://github.com/Wikinaut/agrep 看起来像是一个复兴的开源版本,但我没有测试过.

https://github.com/Wikinaut/agrep looks like a revived open source release, but I have not tested it.

如果失败,请查看您是否可以为您的发行版找到 tre-agrep.

Failing that, see if you can find tre-agrep for your distro.

这篇关于使用grep进行模糊字符串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆