通过在@ font-face中搜索替换来从@ font-face中提取网址 [英] Extracting urls from @font-face by searching within @font-face for replacement

查看：79 发布时间：2020/7/3 1:34:18 php css regex

本文介绍了通过在@ font-face中搜索替换来从@ font-face中提取网址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Web服务，该服务重写CSS文件中的URL，以便可以通过CDN来提供它们.

I have a web service that rewrites urls in css files so that they can be served via a CDN.

css文件可以包含图像或字体的URL.

The css files can contain urls to images or fonts.

我目前具有以下正则表达式以匹配css文件中的所有url:

I currently have the following regex to match ALL urls within the css file:

(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))

但是，我现在想引入对自定义字体的支持，并且需要定位@font-fontface中的url:

However, I now want to introduce support for custom fonts and need to target the urls within @font-fontface:

@font-face {
  font-family: 'FontAwesome';
  src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
  src: url("fonts/fontawesome-webfont.eot?#iefix&v=4.0.3") format("embedded-opentype"), url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"), url("fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"), url("fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular") format("svg");
  font-weight: normal;
  font-style: normal;
}

然后我想到了以下内容:

I then came up with the following:

@font-face\s*\{.*(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))\s*\}

问题在于，这与所有内容都匹配，而不仅仅是内部的url.我以为可以这样使用lookbehind:

The problem is that this matches everything and not just the urls inside. I thought I can use lookbehind like so:

(?<=@font-face\s*\{.*)(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))(?<=-\s*\})

不幸的是，PCRE(PHP使用)不支持在后面的变量重复，所以我很困惑.

Unfortunately, PCRE (which PHP uses) does not support variable repetition within a lookbehind, so I am stuck.

我不希望通过扩展名来检查字体，因为某些字体具有.svg扩展名，这些扩展名可能与具有.svg扩展名的图像冲突.

I do not wish to check for fonts by their extension as some fonts have the .svg extension which can conflict with images with the .svg extension.

此外，我还想修改原始正则表达式以匹配@font-face之外的所有其他URL:

In addition, I would also like to modify my original regex to match all other urls that are NOT within an @font-face:

.someclass {
  background: url('images/someimage.png') no-repeat;
}

由于我无法使用lookbehinds，如何从@font-face内的URL和@font-face内的URL中提取URL?

Since I am unable to use lookbehinds, how can I extract the urls from those within a @font-face and those that are not within a @font-face?

主要思想:

为了提高可读性，该模式分为命名子模式. (?(DEFINE)...)与任何内容都不匹配，它只是一个定义部分.

Main idea:

For more readability the pattern is divided into named subpatterns. The (?(DEFINE)...) doesn't match anything, it is only a definition section.

此模式的主要技巧是使用\G锚，这意味着:字符串的开头或与先例匹配相邻的.我在(?<!^)后面添加了一个负数回角，以避免该定义的第一部分.

The main trick of this pattern is the use of the \G anchor that means: start of the string or contiguous to a precedent match. I added a negative lookbehind (?<!^) to avoid the first part of this definition.

以<anchor>命名的子模式是最重要的，因为它仅在找到@font-face {或紧随URL结束后才允许匹配(这就是您看到["']?+的原因).

The <anchor> named subpattern is the most important because it allows a match only if @font-face { is found or immediately after the end of an url (this is the reason why you can see a ["']?+).

<other_content>代表不是url部分的所有内容，但也匹配也必须跳过的url部分(以"http:"，"data:"开头的URL).该子模式的重要细节是它不能与@ font-face的右花括号匹配.

<other_content> represents all that is not an url section but matches url sections that must be skipped too(urls that begin with "http:", "data:"). The important detail of this subpattern is that it can't match the closing curly bracket of @font-face.

<url_start>的任务仅是匹配url(".

\K从匹配结果中重置所有之前已匹配的子字符串.

\K resets all the substring that has been matched before from the match result.

([^"'\s)}]*+)匹配网址(唯一与前导./../匹配的结果中保留的内容)

([^"'\s)}]*+) matches the url (the only thing that stay in the match result with the leading ./../ )

由于<other_content>和url子模式不能与}匹配(在引号或注释部分之外)，因此您一定不要匹配@ font-face定义之外的内容，第二个结果是该模式总是在最后一个URL之后失败.因此，在下一次尝试时，连续分支"将失败，直到下一个@ font-face.

Since <other_content> and the url subpattern can't match a } (that is outside quoted or comment parts), you are sure to never match something outside of the @font-face definition, the second consequence is that the pattern always fails after the last url. Thus, at the next attempt the "contiguous branch" will fail until the next @font-face.

主模式以\g<comment> (*SKIP)(*FAIL) |开头，以跳过注释/*....*/中的所有内容. \g<comment>是指基本子模式，该子模式描述了注释的外观.如果模式右边的字符失败，则(*SKIP)禁止重试之前匹配过的子字符串(在左边，由g<comment>). (*FAIL)强制模式失败. 使用此技巧，注释将被跳过并且不是匹配结果(由于模式失败).

The main pattern begins with \g<comment> (*SKIP)(*FAIL) | to skip all content inside comments /*....*/. \g<comment> refers to the basic subpattern that describes how a comment look like. (*SKIP) forbids to retry the substring that has been matched before (on his left, by g<comment>), if the pattern fails on his right. (*FAIL) forces the pattern to fail. With this trick, comments are skipped and are not a match result (since the pattern fails).

quoted_content: 在<other_content>中使用它来避免匹配引号内的url(或/*.

quoted_content: It's used in <other_content> to avoid to match url( or /* that are inside quotes.

(["'])              # capture group: the opening quote
(?>                 # atomic group: all possible content between quotes
    [^"'\\]++       # all that is not a quote or a backslash
  |                 # OR
    \\{2}           # two backslashes: (two \ doesn't escape anything)
  |                 # OR
    \\.             # any escaped character
  |                 # OR
    (?!\g{-1})["']  # the other quote (this one that is not in the capture group)
)*+                 # repeat zero or more time the atomic group
\g{-1}              # backreference to the last capturing group

other_content: 不是大括号的所有内容，也不是没有http:或data:

(?>                     # open an atomic group
    [^u}/"']++          # all character that are not problematic!
  |
    \g<quoted_content>  # string inside quotes
  |
    \g<comment>         # string inside comments
  |
    \Bu                 # "u" not preceded by a word boundary
  |
    u(?!rl\s*+\()       # "u" not followed by "rl("  (not the start of an url definition)
  |                   
    /(?!\*)             # "/" not followed by "*" (not the start of a comment)
  |
    \g<url_start>       # match the url that begins with "http:"
    \g<url_skip> ["']?+ # until the possible quote
)++                     # repeat the atomic group one or more times

锚点

\G(?<!^) ["']?+    # contiguous to a precedent match with a possible closing quote
|                  # OR
@font-face \s*+ {  # start of the @font-face definition

注意:

您可以改善主要模式:

Notice:

You can improve the main pattern:

在@ font-face的最后一个URL之后，正则表达式引擎尝试与<anchor>的连续分支"匹配，并匹配所有字符，直到使该模式失败的}.然后，对于每个相同的字符，正则表达式引擎必须尝试两个分支或<anchor>(在}之前，它总是会失败的.

After the last url of @font-face, the regex engine attempts to match with the "contiguous branch" of <anchor> and match all characters until the } that makes the pattern fail. Then, on each same characters, the regex engine must try the two branches or <anchor> (that will always fail until the }.

为避免这些无用的尝试，您可以将主要模式更改为:

To avoid these useless tries, you can change the main pattern to:

\g<comment> (*SKIP)(*FAIL) |

\g<anchor> \g<other_content>?+
(?>
    \g<url_start> \K [./]*+  ([^"'\s)}]*+)
  | 
    } (*SKIP)(*FAIL)
)

在这种新情况下，最后一个URL后的第一个字符由连续分支"匹配，\g<other_content>匹配所有，直到}，\g<url_start>立即失败，}被匹配且使模式失败，并禁止重试这些字符.

With this new scenario, the first character after the last url is matched by the "contiguous branch", \g<other_content> matches all until the }, \g<url_start> fails immediatly, the } is matched and (*SKIP)(*FAIL) make the pattern fail and forbids to retry these characters.

这篇关于通过在@ font-face中搜索替换来从@ font-face中提取网址的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过在@ font-face中搜索替换来从@ font-face中提取网址 [英] Extracting urls from @font-face by searching within @font-face for replacement

问题描述

推荐答案

主要思想:

Main idea:

注意:

Notice:

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

通过在@ font-face中搜索替换来从@ font-face中提取网址 [英] Extracting urls from @font-face by searching within @font-face for replacement

问题描述

推荐答案

主要思想:

Main idea:

注意:

Notice:

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭