通过在@ font-face中搜索替换来从@ font-face中提取网址 [英] Extracting urls from @font-face by searching within @font-face for replacement
问题描述
我有一个Web服务,该服务重写CSS文件中的URL,以便可以通过CDN来提供它们.
I have a web service that rewrites urls in css files so that they can be served via a CDN.
css文件可以包含图像或字体的URL.
The css files can contain urls to images or fonts.
我目前具有以下正则表达式以匹配css文件中的所有url:
I currently have the following regex to match ALL urls within the css file:
(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))
但是,我现在想引入对自定义字体的支持,并且需要定位@font-fontface
中的url:
However, I now want to introduce support for custom fonts and need to target the urls within @font-fontface
:
@font-face {
font-family: 'FontAwesome';
src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
src: url("fonts/fontawesome-webfont.eot?#iefix&v=4.0.3") format("embedded-opentype"), url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"), url("fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"), url("fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular") format("svg");
font-weight: normal;
font-style: normal;
}
然后我想到了以下内容:
I then came up with the following:
@font-face\s*\{.*(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))\s*\}
问题在于,这与所有内容都匹配,而不仅仅是内部的url.我以为可以这样使用lookbehind:
The problem is that this matches everything and not just the urls inside. I thought I can use lookbehind like so:
(?<=@font-face\s*\{.*)(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))(?<=-\s*\})
不幸的是,PCRE(PHP使用)不支持在后面的变量重复,所以我很困惑.
Unfortunately, PCRE (which PHP uses) does not support variable repetition within a lookbehind, so I am stuck.
我不希望通过扩展名来检查字体,因为某些字体具有.svg
扩展名,这些扩展名可能与具有.svg
扩展名的图像冲突.
I do not wish to check for fonts by their extension as some fonts have the .svg
extension which can conflict with images with the .svg
extension.
此外,我还想修改原始正则表达式以匹配@font-face
之外的所有其他URL:
In addition, I would also like to modify my original regex to match all other urls that are NOT within an @font-face
:
.someclass {
background: url('images/someimage.png') no-repeat;
}
由于我无法使用lookbehinds,如何从@font-face
内的URL和@font-face
内的URL中提取URL?
Since I am unable to use lookbehinds, how can I extract the urls from those within a @font-face
and those that are not within a @font-face
?
推荐答案
您可以使用此
$pattern = <<<'LOD'
~
(?(DEFINE)
(?<quoted_content>
(["']) (?>[^"'\\]++ | \\{2} | \\. | (?!\g{-1})["'] )*+ \g{-1}
)
(?<comment> /\* .*? \*/ )
(?<url_skip> (?: https?: | data: ) [^"'\s)}]*+ )
(?<other_content>
(?> [^u}/"']++ | \g<quoted_content> | \g<comment>
| \Bu | u(?!rl\s*+\() | /(?!\*)
| \g<url_start> \g<url_skip> ["']?+
)++
)
(?<anchor> \G(?<!^) ["']?+ | @font-face \s*+ { )
(?<url_start> url\( \s*+ ["']?+ )
)
\g<comment> (*SKIP)(*FAIL) |
\g<anchor> \g<other_content>?+ \g<url_start> \K [./]*+
( [^"'\s)}]*+ ) # url
~xs
LOD;
$result = preg_replace($pattern, 'http://cdn.test.com/fonts/$8', $data);
print_r($result);
测试字符串
$data = <<<'LOD'
@font-face {
font-family: 'FontAwesome';
src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
src: url(fonts/fontawesome-webfont.eot?#iefix&v=4.0.3) format("embedded-opentype"),
/*url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"),*/
url("http://domain.com/fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"),
url('fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular') format("svg");
font-weight: normal;
font-style: normal;
}
/*
@font-face {
font-family: 'Font1';
src: url("fonts/font1.eot");
} */
@font-face {
font-family: 'Fon\'t2';
src: url("fonts/font2.eot");
}
@font-face {
font-family: 'Font3';
src: url("../fonts/font3.eot");
}
LOD;
主要思想:
为了提高可读性,该模式分为命名子模式. (?(DEFINE)...)
与任何内容都不匹配,它只是一个定义部分.
Main idea:
For more readability the pattern is divided into named subpatterns. The (?(DEFINE)...)
doesn't match anything, it is only a definition section.
此模式的主要技巧是使用\G
锚,这意味着:字符串的开头或与先例匹配相邻的.我在(?<!^)
后面添加了一个负数回角,以避免该定义的第一部分.
The main trick of this pattern is the use of the \G
anchor that means: start of the string or contiguous to a precedent match. I added a negative lookbehind (?<!^)
to avoid the first part of this definition.
以<anchor>
命名的子模式是最重要的,因为它仅在找到@font-face {
或紧随URL结束后才允许匹配(这就是您看到["']?+
的原因).
The <anchor>
named subpattern is the most important because it allows a match only if @font-face {
is found or immediately after the end of an url (this is the reason why you can see a ["']?+
).
<other_content>
代表不是url部分的所有内容,但也匹配也必须跳过的url部分(以"http:","data:"开头的URL).该子模式的重要细节是它不能与@ font-face的右花括号匹配.
<other_content>
represents all that is not an url section but matches url sections that must be skipped too(urls that begin with "http:", "data:"). The important detail of this subpattern is that it can't match the closing curly bracket of @font-face.
<url_start>
的任务仅是匹配url("
.
\K
从匹配结果中重置所有之前已匹配的子字符串.
\K
resets all the substring that has been matched before from the match result.
([^"'\s)}]*+)
匹配网址(唯一与前导./../
匹配的结果中保留的内容)
([^"'\s)}]*+)
matches the url (the only thing that stay in the match result with the leading ./../
)
由于<other_content>
和url子模式不能与}
匹配(在引号或注释部分之外),因此您一定不要匹配@ font-face定义之外的内容,第二个结果是该模式总是在最后一个URL之后失败.因此,在下一次尝试时,连续分支"将失败,直到下一个@ font-face.
Since <other_content>
and the url subpattern can't match a }
(that is outside quoted or comment parts), you are sure to never match something outside of the @font-face definition, the second consequence is that the pattern always fails after the last url. Thus, at the next attempt the "contiguous branch" will fail until the next @font-face.
主模式以\g<comment> (*SKIP)(*FAIL) |
开头,以跳过注释/*....*/
中的所有内容. \g<comment>
是指基本子模式,该子模式描述了注释的外观.如果模式右边的字符失败,则(*SKIP)
禁止重试之前匹配过的子字符串(在左边,由g<comment>
). (*FAIL)
强制模式失败.
使用此技巧,注释将被跳过并且不是匹配结果(由于模式失败).
The main pattern begins with \g<comment> (*SKIP)(*FAIL) |
to skip all content inside comments /*....*/
. \g<comment>
refers to the basic subpattern that describes how a comment look like. (*SKIP)
forbids to retry the substring that has been matched before (on his left, by g<comment>
), if the pattern fails on his right. (*FAIL)
forces the pattern to fail.
With this trick, comments are skipped and are not a match result (since the pattern fails).
quoted_content:
在<other_content>
中使用它来避免匹配引号内的url(
或/*
.
quoted_content:
It's used in <other_content>
to avoid to match url(
or /*
that are inside quotes.
(["']) # capture group: the opening quote
(?> # atomic group: all possible content between quotes
[^"'\\]++ # all that is not a quote or a backslash
| # OR
\\{2} # two backslashes: (two \ doesn't escape anything)
| # OR
\\. # any escaped character
| # OR
(?!\g{-1})["'] # the other quote (this one that is not in the capture group)
)*+ # repeat zero or more time the atomic group
\g{-1} # backreference to the last capturing group
other_content: 不是大括号的所有内容,也不是没有http:
或data:
(?> # open an atomic group
[^u}/"']++ # all character that are not problematic!
|
\g<quoted_content> # string inside quotes
|
\g<comment> # string inside comments
|
\Bu # "u" not preceded by a word boundary
|
u(?!rl\s*+\() # "u" not followed by "rl(" (not the start of an url definition)
|
/(?!\*) # "/" not followed by "*" (not the start of a comment)
|
\g<url_start> # match the url that begins with "http:"
\g<url_skip> ["']?+ # until the possible quote
)++ # repeat the atomic group one or more times
锚点
\G(?<!^) ["']?+ # contiguous to a precedent match with a possible closing quote
| # OR
@font-face \s*+ { # start of the @font-face definition
注意:
您可以改善主要模式:
Notice:
You can improve the main pattern:
在@ font-face的最后一个URL之后,正则表达式引擎尝试与<anchor>
的连续分支"匹配,并匹配所有字符,直到使该模式失败的}
.然后,对于每个相同的字符,正则表达式引擎必须尝试两个分支或<anchor>
(在}
之前,它总是会失败的.
After the last url of @font-face, the regex engine attempts to match with the "contiguous branch" of <anchor>
and match all characters until the }
that makes the pattern fail. Then, on each same characters, the regex engine must try the two branches or <anchor>
(that will always fail until the }
.
为避免这些无用的尝试,您可以将主要模式更改为:
To avoid these useless tries, you can change the main pattern to:
\g<comment> (*SKIP)(*FAIL) |
\g<anchor> \g<other_content>?+
(?>
\g<url_start> \K [./]*+ ([^"'\s)}]*+)
|
} (*SKIP)(*FAIL)
)
在这种新情况下,最后一个URL后的第一个字符由连续分支"匹配,\g<other_content>
匹配所有,直到}
,\g<url_start>
立即失败,}
被匹配且
With this new scenario, the first character after the last url is matched by the "contiguous branch", \g<other_content>
matches all until the }
, \g<url_start>
fails immediatly, the }
is matched and (*SKIP)(*FAIL)
make the pattern fail and forbids to retry these characters.
这篇关于通过在@ font-face中搜索替换来从@ font-face中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!