数据中意外插入的全词匹配 [英] Whole word matching with unexpected insertion in data

查看:145
本文介绍了数据中意外插入的全词匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有字符串考虑

my $string = 'String need to be evaluated';

$string 我正在搜索评估或任何其他词.问题是他们可能会在字符串中插入一些标签例如.Str<data>ing 需要 eval<data>ua<data>ted 这是意料之外的.在这种情况下,我如何搜索单词?

in $string I'm searching evaluated or any other word. problem is their may be insertion of some tags in string eg. Str<data>ing need to be eval<data>ua<data>ted which is unexpected. In this case how could I search for the words?

这是我试过的代码:

my $string = 'Text to be evaluated';

my $string2 = "Te<data>xt need to be eval<data2>ua<data>ted";
# patten to match
$pattern = "evaluated";

@b = split('',$pattern);

for my $i(@b){
    $i="$i"."\(?:<data>\)?";
    print "$i#\n";
}
$pattern = join('',@b);

print "\n$pattern\n";

if ($string2 =~ /$pattern/){
    print "$pattern found\n";   
}

您是否建议任何其他方法或模块使其变得容易?我不知道会插入什么样的数据.

Do you suggest any other method or module to make it easy? i don't know what kind of data will get inserted.

推荐答案

不确定这是否是您需要的,但如何

Not sure if that is what you need but how about

@b = split('',$pattern);

for my $i(@b){
    $i=$i.".*";
    print "$i \n";
}
$pattern = join('',@b);

只要模式的字符仍然存在并且顺序正确,它就应该匹配任何在随机插入之前具有该模式的字符串.它确实在字符串 esouhgvw8vwrg355#*asrgl/\u[\w]atet(45)<data>efdvd 中找到了 evaluated 什么是尽可能嘈杂的东西.但是当然,如​​果无法区分插入和原始字符串,您将得到误报".例如,如果字符串曾经是 evaluted 并且它变成了类似 evaluted 的东西,你会得到一个肯定的.当然,如果你知道插入总是在标签中而文本不是,那么用户的回答会安全得多.

That should match any string that had the pattern before it got random insertions as long as the characters of the pattern are still there and in the correct order. It does find evaluated in the string esouhgvw8vwrg355#*asrgl/\u[\w]atet(45)<data>efdvd what is about as noisy as it gets. But of course, if it is impossible to distinguish between insertion and original string, you will get "false" positives. For example if the string used to be evaluted and it becomes something like evalu<hereisyourmissinga>ted you will get a positive. Of course, if you knew that insertions would always be in tags while text is not, users answer is much safer.

只要您单引号输入字符串,像 [\w] (45) 之类的字符也不应该受到伤害.我不明白为什么它们会在任何时候进行插值.

As long as you single quote your input string, characters like [\w] (45) and whatnot should not hurt either. I cannot see why they would be interpolated at any point.

这篇关于数据中意外插入的全词匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆