MySQL查询从URL中提取域 [英] Mysql query to extract domains from urls

查看:328
本文介绍了MySQL查询从URL中提取域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对不起,我的英语水平

我有这个查询要从网址中提取域

i have this query to extract domain from urls

SELECT SUBSTRING(LEFT(url, LOCATE('/', url, 8) - 1), 8) AS domain...

仅在网址为www.google.com/something

it works only when the url is like www.google.com/something

不适用于类似

www.google.it(不带斜杠)

www.google.it (without trailing slash)

www.google.it/abc/xzy/(永久链接)

www.google.it/abc/xzy/ (permalink)

你知道如何解决吗?

推荐答案

我必须结合一些以前的答案,再加上一些用于数据集的黑客工具.这是对我有用的方法,它返回域和任何子域:

I had to combine some of the previous answers , plus a little more hackery for my data set . This is what works for me , it returns the domain and any sub-domains:

SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain

说明(导致非平凡的SQL很少有意义):

Explanation ( cause non-trivial SQL rarely makes sense ):

SUBSTRING_INDEX(target_url, '/', 3)-如果URL具有协议,则剥离任何路径
SUBSTRING_INDEX(THAT, '://', -1)-从THAT中剥离任何协议
SUBSTRING_INDEX(THAT, '/', 1)-从THAT剥离任何路径(如果没有协议)
SUBSTRING_INDEX(THAT, '?', 1)-从THAT中剥离查询字符串(如果没有路径或结尾/)

SUBSTRING_INDEX(target_url, '/', 3) - strips any path if the url has a protocol
SUBSTRING_INDEX(THAT, '://', -1) - strips any protocol from THAT
SUBSTRING_INDEX(THAT, '/', 1) - strips any path from THAT ( if there was no protocol )
SUBSTRING_INDEX(THAT, '?', 1) - strips the query string from THAT ( if there was no path or trailing / )

测试用例:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain
FROM ( 
    SELECT       'http://test.com' as target_url 
    UNION SELECT 'https://test.com' 
    UNION SELECT 'http://test.com/one' 
    UNION SELECT 'http://test.com/?huh' 
    UNION SELECT 'http://test.com?http://ouch.foo' 
    UNION SELECT 'test.com' 
    UNION SELECT 'test.com/one'
    UNION SELECT 'test.com/one/two'
    UNION SELECT 'test.com/one/two/three'
    UNION SELECT 'test.com/one/two/three?u=http://maaaaannn'
    UNION SELECT 'http://one.test.com'
    UNION SELECT 'one.test.com/one'
    UNION SELECT 'two.one.test.com/one' ) AS Test; 

结果:

'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'one.test.com'
'one.test.com'
'two.one.test.com'

这篇关于MySQL查询从URL中提取域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆