通过在@ font-face中搜索替换来从@ font-face中提取网址 [英] Extracting urls from @font-face by searching within @font-face for replacement

查看:79
本文介绍了通过在@ font-face中搜索替换来从@ font-face中提取网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Web服务,该服务重写CSS文件中的URL,以便可以通过CDN来提供它们.

I have a web service that rewrites urls in css files so that they can be served via a CDN.

css文件可以包含图像或字体的URL.

The css files can contain urls to images or fonts.

我目前具有以下正则表达式以匹配css文件中的所有url:

I currently have the following regex to match ALL urls within the css file:

(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))

但是,我现在想引入对自定义字体的支持,并且需要定位@font-fontface中的url:

However, I now want to introduce support for custom fonts and need to target the urls within @font-fontface:

@font-face {
  font-family: 'FontAwesome';
  src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
  src: url("fonts/fontawesome-webfont.eot?#iefix&v=4.0.3") format("embedded-opentype"), url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"), url("fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"), url("fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular") format("svg");
  font-weight: normal;
  font-style: normal;
}

然后我想到了以下内容:

I then came up with the following:

@font-face\s*\{.*(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))\s*\}

问题在于,这与所有内容都匹配,而不仅仅是内部的url.我以为可以这样使用lookbehind:

The problem is that this matches everything and not just the urls inside. I thought I can use lookbehind like so:

(?<=@font-face\s*\{.*)(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))(?<=-\s*\})

不幸的是,PCRE(PHP使用)不支持在后面的变量重复,所以我很困惑.

Unfortunately, PCRE (which PHP uses) does not support variable repetition within a lookbehind, so I am stuck.

我不希望通过扩展名来检查字体,因为某些字体具有.svg扩展名,这些扩展名可能与具有.svg扩展名的图像冲突.

I do not wish to check for fonts by their extension as some fonts have the .svg extension which can conflict with images with the .svg extension.

此外,我还想修改原始正则表达式以匹配@font-face之外的所有其他URL:

In addition, I would also like to modify my original regex to match all other urls that are NOT within an @font-face:

.someclass {
  background: url('images/someimage.png') no-repeat;
}

由于我无法使用lookbehinds,如何从@font-face内的URL和@font-face内的URL中提取URL?

Since I am unable to use lookbehinds, how can I extract the urls from those within a @font-face and those that are not within a @font-face?

推荐答案

您可以使用此

$pattern = <<<'LOD'
~
(?(DEFINE)
    (?<quoted_content>
        (["']) (?>[^"'\\]++ | \\{2} | \\. | (?!\g{-1})["'] )*+ \g{-1}
    )
    (?<comment> /\* .*? \*/ )
    (?<url_skip> (?: https?: | data: ) [^"'\s)}]*+ )
    (?<other_content>
        (?> [^u}/"']++ | \g<quoted_content> | \g<comment>
          | \Bu | u(?!rl\s*+\() | /(?!\*) 
          | \g<url_start> \g<url_skip> ["']?+
        )++
    )
    (?<anchor> \G(?<!^) ["']?+ | @font-face \s*+ { )
    (?<url_start> url\( \s*+ ["']?+ )
)

\g<comment> (*SKIP)(*FAIL) |

\g<anchor> \g<other_content>?+ \g<url_start> \K [./]*+ 

( [^"'\s)}]*+ )    # url
~xs
LOD;

$result = preg_replace($pattern, 'http://cdn.test.com/fonts/$8', $data);
print_r($result);

测试字符串

$data = <<<'LOD'
@font-face {
  font-family: 'FontAwesome';
  src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
  src: url(fonts/fontawesome-webfont.eot?#iefix&v=4.0.3) format("embedded-opentype"),
     /*url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"),*/
       url("http://domain.com/fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"),
       url('fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular') format("svg");
  font-weight: normal;
  font-style: normal;
}
/*
@font-face {
  font-family: 'Font1';
  src: url("fonts/font1.eot");
} */
@font-face {
  font-family: 'Fon\'t2';
  src: url("fonts/font2.eot");
}
@font-face {
  font-family: 'Font3';
  src: url("../fonts/font3.eot");
}
LOD;

主要思想:

为了提高可读性,该模式分为命名子模式. (?(DEFINE)...)与任何内容都不匹配,它只是一个定义部分.

Main idea:

For more readability the pattern is divided into named subpatterns. The (?(DEFINE)...) doesn't match anything, it is only a definition section.

此模式的主要技巧是使用\G锚,这意味着:字符串的开头或与先例匹配相邻的.我在(?<!^)后面添加了一个负数回角,以避免该定义的第一部分.

The main trick of this pattern is the use of the \G anchor that means: start of the string or contiguous to a precedent match. I added a negative lookbehind (?<!^) to avoid the first part of this definition.

<anchor>命名的子模式是最重要的,因为它仅在找到@font-face {或紧随URL结束后才允许匹配(这就是您看到["']?+的原因).

The <anchor> named subpattern is the most important because it allows a match only if @font-face { is found or immediately after the end of an url (this is the reason why you can see a ["']?+).

<other_content>代表不是url部分的所有内容,但也匹配也必须跳过的url部分(以"http:","data:"开头的URL).该子模式的重要细节是它不能与@ font-face的右花括号匹配.

<other_content> represents all that is not an url section but matches url sections that must be skipped too(urls that begin with "http:", "data:"). The important detail of this subpattern is that it can't match the closing curly bracket of @font-face.

<url_start>的任务仅是匹配url(".

\K从匹配结果中重置所有之前已匹配的子字符串.

\K resets all the substring that has been matched before from the match result.

([^"'\s)}]*+)匹配网址(唯一与前导./../匹配的结果中保留的内容)

([^"'\s)}]*+) matches the url (the only thing that stay in the match result with the leading ./../ )

由于<other_content>和url子模式不能与}匹配(在引号或注释部分之外),因此您一定不要匹配@ font-face定义之外的内容,第二个结果是该模式总是在最后一个URL之后失败.因此,在下一次尝试时,连续分支"将失败,直到下一个@ font-face.

Since <other_content> and the url subpattern can't match a } (that is outside quoted or comment parts), you are sure to never match something outside of the @font-face definition, the second consequence is that the pattern always fails after the last url. Thus, at the next attempt the "contiguous branch" will fail until the next @font-face.

主模式以\g<comment> (*SKIP)(*FAIL) |开头,以跳过注释/*....*/中的所有内容. \g<comment>是指基本子模式,该子模式描述了注释的外观.如果模式右边的字符失败,则(*SKIP)禁止重试之前匹配过的子字符串(在左边,由g<comment>). (*FAIL)强制模式失败. 使用此技巧,注释将被跳过并且不是匹配结果(由于模式失败).

The main pattern begins with \g<comment> (*SKIP)(*FAIL) | to skip all content inside comments /*....*/. \g<comment> refers to the basic subpattern that describes how a comment look like. (*SKIP) forbids to retry the substring that has been matched before (on his left, by g<comment>), if the pattern fails on his right. (*FAIL) forces the pattern to fail. With this trick, comments are skipped and are not a match result (since the pattern fails).

quoted_content: <other_content>中使用它来避免匹配引号内的url(/*.

quoted_content: It's used in <other_content> to avoid to match url( or /* that are inside quotes.

(["'])              # capture group: the opening quote
(?>                 # atomic group: all possible content between quotes
    [^"'\\]++       # all that is not a quote or a backslash
  |                 # OR
    \\{2}           # two backslashes: (two \ doesn't escape anything)
  |                 # OR
    \\.             # any escaped character
  |                 # OR
    (?!\g{-1})["']  # the other quote (this one that is not in the capture group)
)*+                 # repeat zero or more time the atomic group
\g{-1}              # backreference to the last capturing group

other_content: 不是大括号的所有内容,也不是没有http:data:

(?>                     # open an atomic group
    [^u}/"']++          # all character that are not problematic!
  |
    \g<quoted_content>  # string inside quotes
  |
    \g<comment>         # string inside comments
  |
    \Bu                 # "u" not preceded by a word boundary
  |
    u(?!rl\s*+\()       # "u" not followed by "rl("  (not the start of an url definition)
  |                   
    /(?!\*)             # "/" not followed by "*" (not the start of a comment)
  |
    \g<url_start>       # match the url that begins with "http:"
    \g<url_skip> ["']?+ # until the possible quote
)++                     # repeat the atomic group one or more times

锚点

\G(?<!^) ["']?+    # contiguous to a precedent match with a possible closing quote
|                  # OR
@font-face \s*+ {  # start of the @font-face definition

注意:

您可以改善主要模式:

Notice:

You can improve the main pattern:

在@ font-face的最后一个URL之后,正则表达式引擎尝试与<anchor>的连续分支"匹配,并匹配所有字符,直到使该模式失败的}.然后,对于每个相同的字符,正则表达式引擎必须尝试两个分支或<anchor>(在}之前,它总是会失败的.

After the last url of @font-face, the regex engine attempts to match with the "contiguous branch" of <anchor> and match all characters until the } that makes the pattern fail. Then, on each same characters, the regex engine must try the two branches or <anchor> (that will always fail until the }.

为避免这些无用的尝试,您可以将主要模式更改为:

To avoid these useless tries, you can change the main pattern to:

\g<comment> (*SKIP)(*FAIL) |

\g<anchor> \g<other_content>?+
(?>
    \g<url_start> \K [./]*+  ([^"'\s)}]*+)
  | 
    } (*SKIP)(*FAIL)
)

在这种新情况下,最后一个URL后的第一个字符由连续分支"匹配,\g<other_content>匹配所有,直到}\g<url_start>立即失败,}被匹配且使模式失败,并禁止重试这些字符.

With this new scenario, the first character after the last url is matched by the "contiguous branch", \g<other_content> matches all until the }, \g<url_start> fails immediatly, the } is matched and (*SKIP)(*FAIL) make the pattern fail and forbids to retry these characters.

这篇关于通过在@ font-face中搜索替换来从@ font-face中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆