解析短语和关键字的搜索字符串 [英] parse search string for phrases and keywords
问题描述
例如,我需要解析搜索字符串以查找php中的关键字和短语
i need to parse a search string for keywords and phrases in php, for example
字符串1:value of "measured response" detect goal "method valuation" study
将产生:value,of,measured reponse,detect,goal,method valuation,study
如果字符串具有以下条件,我也需要它才能工作:
i also need it to work if the string has:
- 没有用引号引起来的短语
- 引号中包含任意数量的短语,引号之外还包含任意数量的关键字,
- 仅引号中的短语,
- 仅以空格分隔的关键字.
我倾向于将preg_match
与模式'/(\".*\")/'
结合使用,以将词组放入数组中,然后从字符串中删除词组,最后将关键字放入数组中.我只是不能把所有东西都放在一起!
i'm leaning towards using preg_match
with the pattern '/(\".*\")/'
to get the phrases into an array, then remove the phrases from the string, then finally work the keywords into the array. i just can't pull everything together!
我还考虑用引号替换引号外的空格.然后将它们分解为一个数组.如果那是更好的选择,我该如何使用preg_replace
做到这一点?
i'm also thinking of replacing spaces outside quotes with commas. then explode them to an array. if that's a better option, how do i do that with preg_replace
?
是否有更好的方法可以解决此问题?帮助!谢谢大家
is there a better way to go about this? help! thanks much, everyone
推荐答案
preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
这应该会产生您想要的结果.
This should yield the results you are looking for.
说明:
# (?<!")\b\w+\b|(?<=")\b[^"]+
#
# Match either the regular expression below (attempting the next alternative only if this one fails) «(?<!")\b\w+\b»
# Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!")»
# Match the character """ literally «"»
# Assert position at a word boundary «\b»
# Match a single character that is a "word character" (letters, digits, etc.) «\w+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert position at a word boundary «\b»
# Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «(?<=")\b[^"]+»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
# Match the character """ literally «"»
# Assert position at a word boundary «\b»
# Match any character that is NOT a """ «[^"]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
这篇关于解析短语和关键字的搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!