Google 运算符的正则表达式 [英] Regular expressions for Google operators
问题描述
使用 PHP,我正在尝试通过支持 Google 等运营商来改进我网站上的搜索
Using PHP, I'm trying to improve the search on my site by supporting Google like operators e.g.
- 关键字 = 自然/默认
- 关键字"或搜索词组"=完全匹配
- 关键字* = 部分匹配
为此,我需要将字符串拆分为两个数组.将精确的单词(但没有双引号)放入 $Array1() 并将其他所有内容(自然和部分关键字)放入 Array2() 中.
For this to work I need to to split the string into two arrays. One for the exact words (but without the double quotes) into $Array1() and put everything else (natural and partial keywords) into Array2().
对于以下字符串,哪些正则表达式可以实现这一点?
What regular expressions would achieve this for the following string?
示例字符串($string)
今天我正在尝试"*谷歌搜索"测试"
today i'm "trying" out a* "google search" "test"
想要的结果
$Array1 = array(
[0]=>trying
[1]=>google search
[2]=>testing
);
$Array2 = array(
[0]=>today
[1]=>i'm
[2]=>out
[3]=>a*
);
<小时>
1) 精确 我已经为精确的正则表达式尝试了以下方法,但它返回两个数组,一个带双引号,一个不带双引号.我可以只使用 $result[1] 但这里可能有一个我遗漏的技巧.
1) Exact I've tried the following for the exact regexp but it returns two arrays, one with and one without the double quotes. I could just use $result[1] but there could be a trick that I'm missing here.
preg_match_all(
'/"([^"]+)"/iu',
'today i\'m "trying" \'out\' a* "google search" "test"',
$result
);
2) Natural/Partial 以下规则返回正确的关键字,但与几个空白值一起.这个正则表达式规则可能很草率,还是我应该通过 array_filter() 运行数组?
2) Natural/Partial The following rule returns the correct keywords, but along with several blank values. This regexp rule maybe sloppy or should I just run the array through array_filter()?
preg_split(
'/"([^"]+)"|(\s)/iu',
'today i\'m "trying" \'out\' a* "google search" "test"'
);
推荐答案
您可以使用 strtok
标记字符串.
You can use strtok
to tokenize the string.
例如,参见从这个 tokenizedQuoted
派生的这个 tokenizeQuoted
函数strtok
手册页上的注释中的 code> 函数:
See for example this tokenizeQuoted
function derived from this tokenizedQuoted
function in the comments on the strtok
manual page:
// split a string into an array of space-delimited tokens, taking double-quoted and single-quoted strings into account
function tokenizeQuoted($string, $quotationMarks='"\'') {
$tokens = array(array(),array());
for ($nextToken=strtok($string, ' '); $nextToken!==false; $nextToken=strtok(' ')) {
if (strpos($quotationMarks, $nextToken[0]) !== false) {
if (strpos($quotationMarks, $nextToken[strlen($nextToken)-1]) !== false) {
$tokens[0][] = substr($nextToken, 1, -1);
} else {
$tokens[0][] = substr($nextToken, 1) . ' ' . strtok($nextToken[0]);
}
} else {
$tokens[1][] = $nextToken;
}
}
return $tokens;
}
这是一个使用示例:
$string = 'today i\'m "trying" out a* "google search" "test"';
var_dump(tokenizeQuoted($string));
输出:
array(2) {
[0]=>
array(3) {
[0]=>
string(6) "trying"
[1]=>
string(13) "google search"
[2]=>
string(4) "test"
}
[1]=>
array(4) {
[0]=>
string(5) "today"
[1]=>
string(3) "i'm"
[2]=>
string(3) "out"
[3]=>
string(2) "a*"
}
}
这篇关于Google 运算符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!