(Java)RegEx从CSS获取URL? [英] (Java) RegEx to get the URLs from CSS?

查看:118
本文介绍了(Java)RegEx从CSS获取URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析CSS,以便从链接的样式表中获取URL.这是一个Java应用程序. (我尝试使用CSSParser( http://cssparser.sourceforge.net/),它解析时会默默地删除许多规则.)

I'm parsing CSS to get the URLs out of linked style sheets. This is a Java app. (I tried using the CSSParser ( http://cssparser.sourceforge.net/ ), however, it is silently dropping many of the rules when it parses.)

所以我只是在使用正则表达式.我想要一个仅获取URL的正则表达式,并且足够健壮以应对来自野外的真实CSS:

So I'm just using Regex. I'd like a regex that gets me just the URLs, and is robust enough to deal with real css from the wild:

background-image: url('test/test.gif');
background: url("test2/test2.gif");
background-image: url(test3/test3.gif);
background: url   ( test4/ test4.gif );
background: url( " test5/test5.gif"   );

您明白了.这是在Java的正则表达式实现中(不是我的最爱).

You get the idea. This is in Java's regex implementation (not my favorite).

推荐答案

正则表达式的问题在于它们有时过于严格,超出了您的需要.如果您向我们展示了您当前无法正常工作的正则表达式,我将能够为您提供更多帮助.

The problem with regexes is that they are sometimes too strict than you need. If you shown us your currently non-perfectly-working regex I would have been able to help you more.

第一条评论:浏览器倾向于容忍大多数 HTML/CSS错误(不是JavaScript,这是一种编程语言,而不是标记语言).

First comment: browsers tend to tolerate the majority of HTML/CSS mistakes (NOT JavaScript, which is a programming and not a markup language).

您可以从background(-image)?令牌开始以锁定第一部分.如何进行?很难...

You could start with the background(-image)? token to lock the first part. How to proceed? Very difficult...

您总是有冒号,因此您可以将其添加到令牌的常量部分,然后根据示例(而非CSS规范)判断出可变数量的空格,后跟url令牌.空格的可变数[\w]*,这成为我们正则表达式的一部分.

You always have colon, so you can add to the constant part of the token, and then, judging from your example (not from CSS specs) a variable number of whitespaces followed by url token. A variable number of whitespaces is [\w]*, and this becomes part of our regex.

我尝试过RegexBuddy

I tried this with RegexBuddy

background(-image)?: url[\s]*\([\s]*(?<url>[^\)]*)\);

不幸的是,它捕获了URL内的空格

Unfortunately, it captures whitespaces inside URLs

Matched text: background-image: url('test/test.gif');
Match offset: 0
Match length: 39
Backreference 1: -image
Backreference 1 offset: 10
Backreference 1 length: 6
Backreference 2: 'test/test.gif'
Backreference 2 offset: 22
Backreference 2 length: 15

Matched text: background: url   ( test4/ test4.gif );
Match offset: 119
Match length: 39
Backreference 1: 
Backreference 1 offset: -1
Backreference 1 length: 0
Backreference 2:  test4/ test4.gif 
Backreference 2 offset: 138
Backreference 2 length: 18

因此,当您获得带有此URL的URL时,必须修剪字符串.在示例4中,我无法从url组中排除空格,但是,应该将其中包含空格的URL匹配,并且在本示例中,该示例不应该是正确的您没有%20test4.gif文件

So, when you get the URL with this you must trim the string. I couldn't exclude whitespaces from url group as of example 4, which, however, should match a URL with a whitespace in it, and which shouldn't be correct is this examples as soon as you don't have a %20test4.gif file

我更喜欢以下版本的正则表达式

I prefer the following version of the regex

background(-image)?: url[\s]*\([\s]*(?<url>[^\)]*)[\s]*\)[\s]*;

它可以容忍更多空白

这篇关于(Java)RegEx从CSS获取URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆