去哪里PHP parse_url()不 - 只解析域 [英] Going where PHP parse_url() doesn't - Parsing only the domain

查看:79
本文介绍了去哪里PHP parse_url()不 - 只解析域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PHP的parse_url()有一个主机字段,其中包含完整的主机。我正在寻找最可靠(最低成本)的方式来只返回域名和TLD。



给出了示例:





我只想查询 google.com google.co.uk 。我已经考虑了一个有效TLD /后缀的表,只允许这些和一个字。你会做任何其他方式吗?有没有人知道这种事情的预先设定的有效REGEX?

解决方案

这样的东西怎么样? >

  function getDomain($ url){
$ pieces = parse_url($ url);
$ domain = isset($ pieces ['host'])? $ pieces ['host']:'';
if(preg_match('/(?P< domain> [a-z0-9] [a-z0-9\ - ] {1,63} \。[az\。] { 6})$ / i',$ domain,$ regs)){
return $ regs ['domain'];
}
返回false;
}

将使用经典的 parse_url ,然后查找一个没有任何子域的域(www是子域)。不会像localhost那样工作。



//编辑:



尝试一下:

  echo getDomain('http://www.google.com/test.html') 。 <峰; br />; 
echo getDomain('https://news.google.co.uk/?id=12345')。 <峰; br />;
echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc')。 <峰; br />;
echo getDomain('https://testing.multiple.subdomain.google.co.uk/')。 <峰; br />;
echo getDomain('http://nothingelsethan.com')。 <峰; br />;

应该返回:

  google.com 
google.co.uk
google.com
google.co.uk
nothingelsethan.com

当然,如果没有通过 parse_url ,所以确保它是一个格式正确的URL。



//附录:



Alnitak是对的。上面提到的解决方案将在大多数案例中工作,但不一定全部,需要维护,以确保他们不是具有超过6个字符的新TLD等等。提取域的唯一可靠方法是使用维护列表,例如 http://publicsuffix.org/ 。起初比较痛苦,但长期来说更容易,更健壮。您需要确保了解每种方法的优缺点,以及它如何与您的项目相符。


PHP's parse_url() has a host field, which includes the full host. I'm looking for the most reliable (and least costly) way to only return the domain and TLD.

Given the examples:

I am looking for only google.com or google.co.uk. I have contemplated a table of valid TLD's/suffixes and only allowing those and one word. Would you do it any other way? Does anyone know of a pre-canned valid REGEX for this sort of thing?

解决方案

How about something like that?

function getDomain($url) {
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

Will extract the domain name using the classic parse_url and then look for a valid domain without any subdomain (www being a subdomain). Won't work on things like 'localhost'. Will return false if it didn't match anything.

// Edit:

Try it out with:

echo getDomain('http://www.google.com/test.html') . '<br/>';
echo getDomain('https://news.google.co.uk/?id=12345') . '<br/>';
echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc') . '<br/>';
echo getDomain('https://testing.multiple.subdomain.google.co.uk/') . '<br/>';
echo getDomain('http://nothingelsethan.com') . '<br/>';

And it should return:

google.com
google.co.uk
google.com
google.co.uk
nothingelsethan.com

Of course, it won't return anything if it doesn't get through parse_url, so make sure it's a well-formed URL.

// Addendum:

Alnitak is right. The solution presented above will work in most cases but not necessarily all and needs to be maintained to make sure, for example, that their aren't new TLD with .morethan6characters and so on. The only reliable way of extracting the domain is to use a maintained list such as http://publicsuffix.org/. It's more painful at first but easier and more robust on the long-term. You need to make sure you understand the pros and cons of each method and how it fits with your project.

这篇关于去哪里PHP parse_url()不 - 只解析域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆