域正则表达式拆分 [英] domain regex split

查看:58
本文介绍了域正则表达式拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拆分一些域,但无法弄清楚正则表达式...

I have some domains I want to split but can't figure out the regex...

我有:

  • http://www.google.com/tomato
  • http://int.google.com
  • http://google.co.uk

考虑到其中任何一个,我只想提取 google.有什么想法吗?

Given any of these, i'm trying to extract google only. Any ideas?

推荐答案

您可以在最好的基础上做到这一点.URL 的最后一部分始终是 TLD(和可选的根).而且您基本上是在寻找任何长度超过 2 个字母的前面的单词:

You can do this on a best bet basis. The last part of the URL is always the TLD (and optional root). And you are basically looking for any preceeding word that is longer than 2 letters:

$url = "http://www.google.co.uk./search?q=..";

preg_match("#http://
            (?:[^/]+\.)*       # cut off any preceeding www*
            ([\w-]{3,})        # main domain name
            (\.\w\w)?          # two-letter second level domain .co
            \.\w+\.?           # TLD
            (/|:|$)            # end regex with / or : or string end
            #x", 
      $url, $match);

如果您希望有更多的二级域名(可能是 .com?),请添加另一个 \w.但这不是很通用,如果允许,您实际上需要一个顶级域名列表.

If you expect any longer second-level domains (.com maybe?) then add another \w. But this is not very generic, you would actually need a list for TLDs were this was allowed.

这篇关于域正则表达式拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆