在文本字符串中查找与 Twitter 使用完全相同的任何 URL [英] Find Any URL in text string exactly like Twitter Uses
问题描述
有很多类似的问题,但是他们没有回答网址没有www.
、http://
等的问题.我是什么要做的是检查字符串是否包含具有任何类型 url 的 url.当您提交推文时,Twitter 会执行此操作.
There are many similar questions, however they don't answer the problem of a url not having www.
, http://
, etc. What I'm looking to do is check whether or not a string contains a url with ANY type of url. Twitter does this when you submit a Tweet.
可接受的网址包括但不限于:
Acceptable URLs would include, but not be limited to:
- google.com
- images.google.com
- http://google.com
- http://www.google.com
- http://www.google.com/anyquerystring
我从 Daring Fireball 中尝试过的两个正则表达式这个问题:
Two Regex expressions I've tried from Daring Fireball & This question:
var regex = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\"\\.,<>?\u00AB\u00BB\u201C\u201D\u2018\u2019]))/i;
var regex = /(?:<\w+.*?>|[^=!:'"\/]|^)((?:https?:\/\/|www\.)[-\w]+(?:\.[-\w]+)*(?::\d+)?(?:\/(?:(?:[~\w\+%-]|(?:[,.;@:][^\s$]))+)?)*(?:\?[\w\+%&=.;:-]+)?(?:\#[\w\-\.]*)?)(?:\p{P}|\s|<|$)/;
这是我正在做的测试示例:http://jsfiddle.net/3Wn26/5/
Here is an example of the testing I'm doing: http://jsfiddle.net/3Wn26/5/
推荐答案
我认为没有什么好的方法可以可靠地(随着时间的推移)做到这一点.现在新 gTLD 即将推出,很难跟上.无论如何,我试了一下.
I don't think there's a good way to do this reliably (over time). Now that the new gTLDs are coming, it's going to be hard to keep up. Anyway, I gave it a shot.
/
(
\b
(?:(https?|ftp):\/\/)?
(
(?:www\d{0,3}\.)?
(
[a-z0-9.-]+\.
(?:[a-z]{2,4}|museum|travel)
(?:\/[^\/\s]+)*
)
)
\b
)
/ix
捕获组
- 整个网址,例如:
http://www.google.com/anyquerystringSAY/Rfy/srA/yh
- 协议,例如:
http
- URL 包括
www.
,例如:www.google.com/swrua8rua8rUWRWAURHAJSrjuhFAhjT/Rtgfsbdh
- 网址不包括
www.
,例如:google.com/sarwar8wa8r/R/A(R8
或images.google.com/w9r89w9ar8a9sjfriJRIUS(RY/(你
- The entire URL, ex:
http://www.google.com/anyquerystringSAY/Rfy/srA/yh
- The protocol, ex:
http
- URL including
www.
, ex:www.google.com/swrua8rua8rUWRWAURHAJSrjuhFAhjT/Rtgfsbdh
- URL excluding
www.
, ex:google.com/sarwar8wa8r/R/A(R8
orimages.google.com/w9r89w9ar8a9sjfriJRIUS(RY/(YUr
或者,您可以将 (?:[az]{2,4}|museum|travel)
位替换为所有 此处列出的列表,但该列表永远不会停止增长,因此我怀疑它是否值得.(你可以看到我添加了两个例外 museum 和 travel.)
Optionally, you can replace the (?:[a-z]{2,4}|museum|travel)
bit with all the ones listed here, but that list is never going to stop growing, so I doubt it's worth it. (You can see I added the two exceptions museum and travel.)
另外请注意我添加了 ftp,如果您不需要它,请随意删除它.
Also notice I added ftp, feel free to remove that if you don't need it.
希望这会有所帮助.
这篇关于在文本字符串中查找与 Twitter 使用完全相同的任何 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!