Google Analytics(分析)正则表达式-替代无负前瞻性 [英] Google Analytics Regex - Alternative to no negative lookahead

查看:75
本文介绍了Google Analytics(分析)正则表达式-替代无负前瞻性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google Analytics(分析)不再允许其过滤器中的否定超前查询.事实证明,仅创建包含我希望其包含的链接的自定义报告非常困难.

Google Analytics does not allow negative lookahead anymore within its filters. This is proving to be very difficult to create a custom report only including the links I would like it to include.

如果启用了正则表达式,则包含负前瞻的功能是:

The regex that includes negative lookahead that would work if it was enabled is:

test.com(\/\??index\_(.*)\.php\??(.*)|\/\?(.*)|\/|)+(\s)*(?!.)

此匹配项:

test.com
test.com/
test.com/index_fb2.php
test.com/index_fb2.php?ref=23
test.com/index_fb2.php?ref=23&e=35
test.com/?ref=23 
test.com/?ref=23&e=35

并且不匹配(应该匹配):

and does not match (as it should):

test.com/ambassadors
test.com/admin/?signup=true 
test.com/randomtext/

我正在寻找一种方法来使我的正则表达式适应仍保持相同的匹配,但不使用负前瞻.

I am looking to find out how to adapt my regex to still hold the same matches but without the use of negative lookahead.

谢谢!

推荐答案

Google Analytics(分析)似乎不支持单行和多行模式,这对我来说很有意义. URL不能包含换行符,因此,如果点不匹配则无所谓,并且除了整个字符串的开头和结尾之外,^$都不需要匹配.

Google Analytics doesn't seem to support single-line and multiline modes, which makes sense to me. URLs can't contain newlines, so it doesn't matter if the dot doesn't match them and there's never any need for ^ and $ to match anywhere but the beginning and end of the whole string.

这意味着您的正则表达式中的(?!.)完全等同于$,后者仅在字符串的最末端匹配(如\z一样,采用支持它的形式).由于这是正则表达式中唯一的先行内容,因此您永远都不会遇到这个问题.您应该一直使用$.

That means the (?!.) in your regex is exactly equivalent to $, which matches only at the very end of the string (like \z, in flavors that support it). Since that's the only lookahead in your regex, you should never have have had this problem; you should have been using $ all along.

但是,您的正则表达式还有其他问题,主要是由于过度依赖(.*)造成的.例如,它匹配以下字符串:

However, your regex has other problems, mostly owing to over-reliance on (.*). For example, it matches these strings:

test.com/?^#(%)!*%supercalifragilisticexpialidocious
test.com/index_ecky-ecky-ecky-ecky-PTANG!-vroop-boing_rowr.php (ni! shh!)

...我很确定您不需要. :P

...which I'm pretty sure you don't want. :P

尝试此正则表达式:

test\.com(?:/(?:index_\w+\.php)?(?:\?ref=\d+(?:&e=\d+)?)?)?\s*$

或更可读:

test\.com
(?:
  /
  (?:index_\w+\.php)?
  (?:
    \?ref=\d+
    (?:
      &e=\d+
    )?
  )?
)?
\s*$

出于说明的目的,我对(例如)可以存在哪些参数,它们出现的顺序以及它们的值可以进行许多简化的假设.我也想知道是否真的有必要匹配域(test.com).我没有使用Google Analytics(分析)的经验,但是比赛不应该在之后域开始(并锚定)吗?并且您真的必须在末尾留有空格吗?在我看来,正则表达式应该更像这样:

For illustration purposes I'm making a lot of simplifying assumptions about (e.g.) what parameters can be present, what order they'll appear in, and what their values can be. I'm also wondering if it's really necessary to match the domain (test.com). I have no experience with Google Analytics, but shouldn't the match start (and be anchored) right after domain? And do you really have to allow for whitespace at the end? It seems to me the regex should be more like this:

^/(?:index_\w+\.php)?(?:\?ref=\d+(?:&e=\d+)?)?$

这篇关于Google Analytics(分析)正则表达式-替代无负前瞻性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆