提取字符串中的链接并返回对象数组 [英] Extract links in a string and return an array of objects

查看:68
本文介绍了提取字符串中的链接并返回对象数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从服务器接收到一个字符串,并且该字符串包含文本和链接(主要以http://,https://和www.开头,很少有不同,但是如果它们不同,则无所谓).

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).

示例:

简单文本简单文本domain.ext/subdir再次文本文本youbank.com/transfertomealltheirmoney/witharegex再次文本文本"

"simple text simple text simple text domain.ext/subdir again text text text youbank.com/transfertomealltheirmoney/witharegex text text text and again text"

我需要执行以下操作的JS函数:-查找所有链接(无论是否重复);-返回一个对象数组,每个对象代表一个链接,以及返回键,该键返回链接在文本中的开始位置和结束位置,例如:

I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:

[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]

这可能吗?怎么样?

@Touffy:我尝试了此操作,但是我无法获得任何字符串多长时间,而只能是起始索引.而且,它不能检测到www: var str =包含很多链接的字符串(因此,我不允许我发布它们)"var regex =/(\ b(https?| ftp | file | www):\/\/[-A-Z0-9 +& @#\/%?=〜_ |!:,.;] * [-A-Z0-9 +& @#\/%=〜_ |])/ig;var结果,索引= [];while((结果= regex.exec(str))){index.push({startsAt:result.index});};console.log(indices [0] .link); console.log(indices [1] .link);

@Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)" var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = []; while ( (result = regex.exec(str)) ) { indices.push({startsAt:result.index}); }; console.log(indices[0].link);console.log(indices[1].link);

推荐答案

一种解决方法是使用正则表达式.假设有什么输入,您可以做类似的事情

One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like

 var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
 var matches = input.match(expression);

然后,您可以使用 indexOf

for(match in matches)
    {
        var result = {};
        result['link'] = matches[match];
        result['startsAt'] = input.indexOf(matches[match]);
        result['endsAt'] = 
            input.indexOf(matches[match]) + matches[match].length;
     }

当然,您可能必须修改正则表达式本身才能满足您的特定需求.

Of course, you may have to tinker with the regular expression itself to suit your specific needs.

您可以在此小提琴

这篇关于提取字符串中的链接并返回对象数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆