在Google表格中提取网址域根目录 [英] Extract url domain root in Google Sheet

查看:138
本文介绍了在Google表格中提取网址域根目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个表中,我有完整的URL列表,如:

In a table, I have lists of full urls like :

https://www.example.com/page -1/product-x?utm-source = google

目标:我只想提取网址的域名部分:

Objective : I want to extract the domain name part of the url only :

https://www.example.com/

我正在使用以下公式:

=REGEXEXTRACT(A1;"^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+)")

对其进行测试时,正则表达式可以正常工作:

The regex is working fine when testing it :

https://www.example.com/

但是在Google工作表中,它显示为:

However in Google sheet, It displays like :

example.com

  • 为什么相同正则表达式的结果不相同?
  • 如何在Google表格中更正它?
  • 推荐答案

    您可以通过删除捕获组(例如,此处([^:\/\n?]+) => [^:\/\n?]+)或将捕获组转换为非捕获组来修复模式.捕获一个(即([^:\/\n?]+) => (?:[^:\/\n?]+)):

    You can fix the pattern by removing the capturing group (i.e. here, ([^:\/\n?]+) => [^:\/\n?]+) or by converting the capturing groups to non-capturing ones (i.e. ([^:\/\n?]+) => (?:[^:\/\n?]+)):

    =REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?[^:/\n?]+")
    =REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?(?:[^:/\n?]+)")
    

    注意:

    • 如果正则表达式包含捕获组,则REGEXEXTRACT返回捕获的值
    • 如果正则表达式中没有捕获组,则该函数仅返回整个匹配值.
    • If the regex contains capturing group(s), the REGEXEXTRACT returns captured value(s)
    • If there are no capturing groups in the regex, the function returns the whole match value only.

    请注意,您不需要在RE2正则表达式中转义/正斜杠,因为它们是借助Google表格中的字符串文字定义的.

    Note you do not need to escape / forward slashes in RE2 regexps since they are defined with the help of string literals in Google Sheets.

    可以将模式简化为^(?:https?://)?[^:/\n?]+,该模式可选地匹配http://https://,然后匹配一个或多个除/,换行符或?以外的字符.

    The pattern may be reduced to ^(?:https?://)?[^:/\n?]+, that matches http:// or https:// optionally, and then matches one or more chars other than /, newline, or ?.

    请参见此RE2正则表达式演示.

    这篇关于在Google表格中提取网址域根目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆