使用JavaScript检测文本中的URL [英] Detect URLs in text with JavaScript
问题描述
是否有人建议检测一组字符串中的网址?
Does anyone have suggestions for detecting URLs in a set of strings?
arrayOfStrings.forEach(function(string){
// detect URLs in strings and do something swell,
// like creating elements with links.
});
更新:我使用此正则表达式进行链接检测...显然有几个几年后。
Update: I wound up using this regex for link detection… Apparently several years later.
kLINK_DETECTION_REGEX = /(([a-z]+:\/\/)?(([a-z0-9\-]+\.)+([a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(:[0-9]{1,5})?(\/[a-z0-9_\-\.~]+)*(\/([a-z0-9_\-\.]*)(\?[a-z0-9+_\-\.%=&]*)?)?(#[a-zA-Z0-9!$&'()*+.=-_~:@/?]*)?)(\s+|$)/gi
完整助手(可选支持Handlebars)位于要点#1654670 。
The full helper (with optional Handlebars support) is at gist #1654670.
推荐答案
首先,您需要一个与网址匹配的正确的正则表达式。这很难做到。请参见此处, where 和 :
First you need a good regex that matches urls. This is hard to do. See here, here and here:
...几乎任何东西都是有效的网址。
是
分割它的一些标点符号规则。如果没有任何
标点符号,您仍然拥有有效的
网址。
...almost anything is a valid URL. There are some punctuation rules for splitting it up. Absent any punctuation, you still have a valid URL.
仔细检查RFC并查看
是否可以构建无效的网址。
规则非常灵活。
Check the RFC carefully and see if you can construct an "invalid" URL. The rules are very flexible.
例如 :::::
是一个有效的URL。
路径是:::::
。一个漂亮的
愚蠢的文件名,但是有效的文件名。
For example :::::
is a valid URL.
The path is ":::::"
. A pretty
stupid filename, but a valid filename.
此外, /////
是有效的网址。
netloc(主机名)是。路径
是///
。再次,愚蠢。另外
有效。此网址规范化为///
等效。
Also, /////
is a valid URL. The
netloc ("hostname") is ""
. The path
is "///"
. Again, stupid. Also
valid. This URL normalizes to "///"
which is the equivalent.
类似糟糕://///更糟/////
完全有效。愚蠢但有效。
Something like "bad://///worse/////"
is perfectly valid. Dumb but valid.
无论如何,这个答案并不是为了给你最好的正则表达式,而是一个如何做字符串的证明用JavaScript包装文本。
Anyway, this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text, with JavaScript.
好的,让我们使用这个: /(https?:\ / \ / [^ \]] +)/ g
OK so lets just use this one: /(https?:\/\/[^\s]+)/g
再次,这是一个糟糕的正则表达式。它会有很多误报。但是这个例子已经足够了。
Again, this is a bad regex. It will have many false positives. However it's good enough for this example.
function urlify(text) {
var urlRegex = /(https?:\/\/[^\s]+)/g;
return text.replace(urlRegex, function(url) {
return '<a href="' + url + '">' + url + '</a>';
})
// or alternatively
// return text.replace(urlRegex, '<a href="$1">$1</a>')
}
var text = "Find me at http://www.example.com and also at http://stackoverflow.com";
var html = urlify(text);
// html now looks like:
// "Find me at <a href="http://www.example.com">http://www.example.com</a> and also at <a href="http://stackoverflow.com">http://stackoverflow.com</a>"
总之,尝试:
$$('#pad dl dd').each(function(element) {
element.innerHTML = urlify(element.innerHTML);
});
这篇关于使用JavaScript检测文本中的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!