从Google表格中的URL字符串中提取根域 [英] Extracting rootdomains from URL string in Google Sheets
问题描述
我正在尝试从Google表格中的URL字符串中提取rootdomain.我知道如何获取域名,并且我有删除www.
的公式,但是现在我意识到它不会删除"mysite" .site.com之类的子域名前缀;不从域名中剥离mysite
的位置.
问题:如何检索domain.com
rootdomain ,其中域字符串与字母数字字符,然后是1点,然后是字母数字字符(仅此而已)联系>
到目前为止Google表格中的公式:
=REGEXREPLACE(REGEXREPLACE(D3923;"(http(s)?://)?(www\.)?";"");"/.*";"")
也许可以简化...
测试用例
https://www.domain.com/ => domain.com
https://domain.com/ => domain.com
http://www.domain.nl/ => domain.com
http://domain.de/ => domain.com
http://www.domain.co.uk/ => domain.co.uk
http://domain.co.au/ => domain.co.au
sub.domain.org/ => sub.domain.com
sub.domain.org => sub.domain.com
domain.com => domain.com
http://www.domain.nl?par=1 => domain.com
https://www.domain.nl/test/?par=1 => domain.com
http2://sub2.startpagina.nl/test/?par=1 => domain.com
当前使用:
=trim(REGEXEXTRACT(REGEXREPLACE(REGEXREPLACE(A2;"https?://";"");"^(w{3}\.)?";"")&"/";"([^/?]+)"))
似乎工作正常
更新时间:2016年7月7日
(感谢所有帮助!)
Hi I am trying to extract the rootdomain from URL string in Google Sheets. I know how to get the domain and I have the formula to remove www.
but now I realize it does not strip subdomain prefixes like 'mysite'.site.com; where mysite
is not stripped from the domain name.
Question: How can I retrieve the domain.com
rootdomain where the domain string contacts alphanumeric characters, then 1 dot, then alphanumeric characters (and nothing more)
Formula so far in Google Sheets:
=REGEXREPLACE(REGEXREPLACE(D3923;"(http(s)?://)?(www\.)?";"");"/.*";"")
Maybe this can be simplified ...
Test cases
https://www.domain.com/ => domain.com
https://domain.com/ => domain.com
http://www.domain.nl/ => domain.com
http://domain.de/ => domain.com
http://www.domain.co.uk/ => domain.co.uk
http://domain.co.au/ => domain.co.au
sub.domain.org/ => sub.domain.com
sub.domain.org => sub.domain.com
domain.com => domain.com
http://www.domain.nl?par=1 => domain.com
https://www.domain.nl/test/?par=1 => domain.com
http2://sub2.startpagina.nl/test/?par=1 => domain.com
Currently using:
=trim(REGEXEXTRACT(REGEXREPLACE(REGEXREPLACE(A2;"https?://";"");"^(w{3}\.)?";"")&"/";"([^/?]+)"))
Seems to work fine
Updated:7-7-2016
(thanks for all the help!)
这篇关于从Google表格中的URL字符串中提取根域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!