解析短语和关键字的搜索字符串 [英] parse search string for phrases and keywords

查看:144
本文介绍了解析短语和关键字的搜索字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我需要解析搜索字符串以查找php中的关键字和短语

i need to parse a search string for keywords and phrases in php, for example

字符串1:value of "measured response" detect goal "method valuation" study

将产生:value,of,measured reponse,detect,goal,method valuation,study

如果字符串具有以下条件,我也需要它才能工作:

i also need it to work if the string has:

  1. 没有用引号引起来的短语
  2. 引号中包含任意数量的短语,引号之外还包含任意数量的关键字,
  3. 仅引号中的短语,
  4. 仅以空格分隔的关键字.

我倾向于将preg_match与模式'/(\".*\")/'结合使用,以将词组放入数组中,然后从字符串中删除词组,最后将关键字放入数组中.我只是不能把所有东西都放在一起!

i'm leaning towards using preg_match with the pattern '/(\".*\")/' to get the phrases into an array, then remove the phrases from the string, then finally work the keywords into the array. i just can't pull everything together!

我还考虑用引号替换引号外的空格.然后将它们分解为一个数组.如果那是更好的选择,我该如何使用preg_replace做到这一点?

i'm also thinking of replacing spaces outside quotes with commas. then explode them to an array. if that's a better option, how do i do that with preg_replace?

是否有更好的方法可以解决此问题?帮助!谢谢大家

is there a better way to go about this? help! thanks much, everyone

推荐答案

preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

这应该会产生您想要的结果.

This should yield the results you are looking for.

说明:

# (?<!")\b\w+\b|(?<=")\b[^"]+
# 
# Match either the regular expression below (attempting the next alternative only if this one fails) «(?<!")\b\w+\b»
#    Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!")»
#       Match the character """ literally «"»
#    Assert position at a word boundary «\b»
#    Match a single character that is a "word character" (letters, digits, etc.) «\w+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
#    Assert position at a word boundary «\b»
# Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «(?<=")\b[^"]+»
#    Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
#       Match the character """ literally «"»
#    Assert position at a word boundary «\b»
#    Match any character that is NOT a """ «[^"]+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

这篇关于解析短语和关键字的搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆