在文本字符串中查找与 Twitter 使用完全相同的任何 URL [英] Find Any URL in text string exactly like Twitter Uses

查看:22
本文介绍了在文本字符串中查找与 Twitter 使用完全相同的任何 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有很多类似的问题,但是他们没有回答网址没有www.http://等的问题.我是什么要做的是检查字符串是否包含具有任何类型 url 的 url.当您提交推文时,Twitter 会执行此操作.

There are many similar questions, however they don't answer the problem of a url not having www., http://, etc. What I'm looking to do is check whether or not a string contains a url with ANY type of url. Twitter does this when you submit a Tweet.

可接受的网址包括但不限于:

Acceptable URLs would include, but not be limited to:

我从 Daring Fireball 中尝试过的两个正则表达式这个问题:

Two Regex expressions I've tried from Daring Fireball & This question:

var regex = /\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\"\\.,<>?\u00AB\u00BB\u201C\u201D\u2018\u2019]))/i;

var regex = /(?:<\w+.*?>|[^=!:'"\/]|^)((?:https?:\/\/|www\.)[-\w]+(?:\.[-\w]+)*(?::\d+)?(?:\/(?:(?:[~\w\+%-]|(?:[,.;@:][^\s$]))+)?)*(?:\?[\w\+%&=.;:-]+)?(?:\#[\w\-\.]*)?)(?:\p{P}|\s|<|$)/;

这是我正在做的测试示例:http://jsfiddle.net/3Wn26/5/

Here is an example of the testing I'm doing: http://jsfiddle.net/3Wn26/5/

推荐答案

我认为没有什么好的方法可以可靠地(随着时间的推移)做到这一点.现在新 gTLD 即将推出,很难跟上.无论如何,我试了一下.

I don't think there's a good way to do this reliably (over time). Now that the new gTLDs are coming, it's going to be hard to keep up. Anyway, I gave it a shot.

/
  (
    \b
      (?:(https?|ftp):\/\/)?
      (
        (?:www\d{0,3}\.)?
        (
          [a-z0-9.-]+\.
          (?:[a-z]{2,4}|museum|travel)
          (?:\/[^\/\s]+)*
        )
      )
    \b
  )
/ix

捕获组

  1. 整个网址,例如:http://www.google.com/anyquerystringSAY/Rfy/srA/yh
  2. 协议,例如:http
  3. URL 包括 www.,例如:www.google.com/swrua8rua8rUWRWAURHAJSrjuhFAhjT/Rtgfsbdh
  4. 网址不包括 www.,例如:google.com/sarwar8wa8r/R/A(R8images.google.com/w9r89w9ar8a9sjfriJRIUS(RY/(你
  1. The entire URL, ex: http://www.google.com/anyquerystringSAY/Rfy/srA/yh
  2. The protocol, ex: http
  3. URL including www., ex: www.google.com/swrua8rua8rUWRWAURHAJSrjuhFAhjT/Rtgfsbdh
  4. URL excluding www., ex: google.com/sarwar8wa8r/R/A(R8 or images.google.com/w9r89w9ar8a9sjfriJRIUS(RY/(YUr

或者,您可以将 (?:[az]{2,4}|museum|travel) 位替换为所有 此处列出的列表,但该列表永远不会停止增长,因此我怀疑它是否值得.(你可以看到我添加了两个例外 museumtravel.)

Optionally, you can replace the (?:[a-z]{2,4}|museum|travel) bit with all the ones listed here, but that list is never going to stop growing, so I doubt it's worth it. (You can see I added the two exceptions museum and travel.)

另外请注意我添加了 ftp,如果您不需要它,请随意删除它.

Also notice I added ftp, feel free to remove that if you don't need it.

希望这会有所帮助.

这篇关于在文本字符串中查找与 Twitter 使用完全相同的任何 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆