Javascript:从string(包括查询字符串)中提取URL并返回数组 [英] Javascript: extract URLs from string (inc. querystring) and return array
问题描述
我知道这已被问过一千次(道歉),但搜索SO / Google等我还没有得到确定的答案。
I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.
基本上,我需要一个JS函数,当传递一个字符串时,识别&基于正则表达式提取所有URL,返回所有找到的数组。例如:
Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:
function findUrls(searchText){
var regex=???
result= searchText.match(regex);
if(result){return result;}else{return false;}
}
该函数应该能够检测并返回任何潜在的URL。我知道这个(关闭括号等)的不良困难/问题,所以我觉得这个过程需要:
The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:
拆分字符串( searchText
)进入不同的部分(开始/结束),没有任何内容,空格或回车的任何一方返回,导致不同的内容块,例如执行拆分。
Split the string (searchText
) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.
对于拆分产生的每个内容块,查看它是否适合任何构造的URL的逻辑,即它是否包含一个句点跟随文本(用于限定潜在URL的一个常量规则)。
For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).
正则表达式应该查看句点是否紧跟其他文本后面的类型tld,目录结构&查询字符串,并以URL的允许类型的文本开头。
The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.
我知道可能会产生误报,但是通过调用将检查任何返回的值。 URL本身,因此可以忽略。我发现的其他函数通常也不会返回URL查询字符串(如果存在)。
I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.
从一个文本块,函数应该能够返回任何类型的URL,即使这意味着将will.i.am识别为有效的!
From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!
例如。 http://www.google.com ,google.com,www.google.com, http://google.com ,
ftp.google.com,https://等...以及任何带有查询字符串
应该返回...
eg. http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// etc...and any derivation thereof with a query string should be returned...
非常感谢,如果在SO的其他地方存在,请再次道歉但我的搜索没有返回它..
Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..
推荐答案
我只是使用URI.js - 让它变得简单。
I just use URI.js -- makes it easy.
var source = "Hello www.example.com,\n"
+ "http://google.com is a search engine, like http://www.bing.com\n"
+ "http://exämple.org/foo.html?baz=la#bumm is an IDN URL,\n"
+ "http://123.123.123.123/foo.html is IPv4 and "
+ "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n"
+ "links can also be in parens (http://example.org) "
+ "or quotes »http://example.org«.";
var result = URI.withinString(source, function(url) {
return "<a>" + url + "</a>";
});
/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://exämple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes »<a>http://example.org</a>«.
*/
- https://github.com/medialize/URI.js
- http://medialize.github.io/URI.js/
- https://github.com/medialize/URI.js
- http://medialize.github.io/URI.js/
这篇关于Javascript:从string(包括查询字符串)中提取URL并返回数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!