Google 运算符的正则表达式 [英] Regular expressions for Google operators

查看:84
本文介绍了Google 运算符的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 PHP,我正在尝试通过支持 Google 等运营商来改进我网站上的搜索

Using PHP, I'm trying to improve the search on my site by supporting Google like operators e.g.

  • 关键字 = 自然/默认
  • 关键字"或搜索词组"=完全匹配
  • 关键字* = 部分匹配

为此,我需要将字符串拆分为两个数组.将精确的单词(但没有双引号)放入 $Array1() 并将其他所有内容(自然和部分关键字)放入 Array2() 中.

For this to work I need to to split the string into two arrays. One for the exact words (but without the double quotes) into $Array1() and put everything else (natural and partial keywords) into Array2().

对于以下字符串,哪些正则表达式可以实现这一点?

What regular expressions would achieve this for the following string?

示例字符串($string)

今天我正在尝试"*谷歌搜索"测试"

today i'm "trying" out a* "google search" "test"

想要的结果

$Array1 = array(
  [0]=>trying
  [1]=>google search
  [2]=>testing
);

$Array2 = array(
  [0]=>today
  [1]=>i'm
  [2]=>out
  [3]=>a*
);

<小时>

1) 精确 我已经为精确的正则表达式尝试了以下方法,但它返回两个数组,一个带双引号,一个不带双引号.我可以只使用 $result[1] 但这里可能有一个我遗漏的技巧.


1) Exact I've tried the following for the exact regexp but it returns two arrays, one with and one without the double quotes. I could just use $result[1] but there could be a trick that I'm missing here.

preg_match_all(
    '/"([^"]+)"/iu', 
    'today i\'m "trying" \'out\' a* "google search" "test"', 
    $result
);

2) Natural/Partial 以下规则返回正确的关键字,但与几个空白值一起.这个正则表达式规则可能很草率,还是我应该通过 array_filter() 运行数组?

2) Natural/Partial The following rule returns the correct keywords, but along with several blank values. This regexp rule maybe sloppy or should I just run the array through array_filter()?

preg_split(
    '/"([^"]+)"|(\s)/iu', 
    'today i\'m "trying" \'out\' a* "google search" "test"'
);

推荐答案

您可以使用 strtok 标记字符串.

You can use strtok to tokenize the string.

例如,参见从这个 tokenizedQuoted 派生的这个 tokenizeQuoted 函数strtok 手册页上的注释中的 code> 函数:

See for example this tokenizeQuoted function derived from this tokenizedQuoted function in the comments on the strtok manual page:

// split a string into an array of space-delimited tokens, taking double-quoted and single-quoted strings into account
function tokenizeQuoted($string, $quotationMarks='"\'') {
    $tokens = array(array(),array());
    for ($nextToken=strtok($string, ' '); $nextToken!==false; $nextToken=strtok(' ')) {
        if (strpos($quotationMarks, $nextToken[0]) !== false) {
            if (strpos($quotationMarks, $nextToken[strlen($nextToken)-1]) !== false) {
                $tokens[0][] = substr($nextToken, 1, -1);
            } else {
                $tokens[0][] = substr($nextToken, 1) . ' ' . strtok($nextToken[0]);
            }
        } else {
            $tokens[1][] = $nextToken;
        }
    }
    return $tokens;
}

这是一个使用示例:

$string = 'today i\'m "trying" out a* "google search" "test"';
var_dump(tokenizeQuoted($string));

输出:

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(6) "trying"
    [1]=>
    string(13) "google search"
    [2]=>
    string(4) "test"
  }
  [1]=>
  array(4) {
    [0]=>
    string(5) "today"
    [1]=>
    string(3) "i'm"
    [2]=>
    string(3) "out"
    [3]=>
    string(2) "a*"
  }
}

这篇关于Google 运算符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆