BigQuery域功能区分大小写敏感性差异 [英] BigQuery Domain Function Case Sensitivity Discrepancy

查看:128
本文介绍了BigQuery域功能区分大小写敏感性差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用包含URL的Data的BigQuery查询时,我们注意到 DOMAIN 函数的行为与URL的行为不同。



这可以用这个简单的查询来证明:

  SELECT 
域('WWW.FOO.COM.AU'),
域(LOWER('http://WWW.FOO.COM.AU/')),
域('http: ')

全部大写的URL的结果不会似乎是正确的,并且

解决方案

不幸的是, DOMAIN (以及遗留SQL中的其他URL处理函数)有许多限制。尽管我们尚未在标准SQL 中提供相同的功能(取消选中使用传统SQL选项下),您可以使用正则表达式在更多情况下构建自己的工作。 数字 StackOverflow问题,我们可以将其中一个答案用作:

  CREATE TEMPORARY FUNCTION GetDomain(url STRING)AS(
REGEXP_EXTRACT(url,r'^(?: https?:\ / \ /)?(?:[ ^ @ \\\
] + @)(?: www\)([^:\ / \\\
] +)'));??
$ b with T as(
SELECT url
FROM UNNEST(['WWW.FOO.COM.AU:8080','google.com',
'www .abc.xyz','http://example.com'])AS域)
SELECT
url,
GetDomain(url)AS域
FROM T;

+ --------------------- + ---------------- +
| url |域|
+ --------------------- + ---------------- +
| www.abc.xyz | abc.xyz |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com | google.com |
| http://example.com | example.com |
+ --------------------- + ---------------- +


When using BigQuery query with Data containing URLs we noticed that the DOMAIN function behaves differently from the case of the URL.

This can be demonstrated with this simple query:

SELECT
    domain('WWW.FOO.COM.AU'),
    domain(LOWER('http://WWW.FOO.COM.AU/')),
    domain('http://WWW.FOO.COM.AU/')

The result of URL of full uppercase does not seem to be right and the documentation does not mentioned anything regarding case in URLs.

解决方案

DOMAIN (and the other URL-handling functions in legacy SQL) have a number of limitations, unfortunately. While we don't have an equivalent yet in standard SQL (uncheck the "Use Legacy SQL" box under Options), you can make up your own that works in more cases using a regular expression. There are a number of StackOverflow questions about domain extraction, and we can put one of the answers to use as:

CREATE TEMPORARY FUNCTION GetDomain(url STRING) AS (
  REGEXP_EXTRACT(url, r'^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)'));

WITH T AS (
  SELECT url
  FROM UNNEST(['WWW.FOO.COM.AU:8080', 'google.com',
               'www.abc.xyz', 'http://example.com']) AS url)
SELECT
  url,
  GetDomain(url) AS domain
FROM T;

+---------------------+----------------+
|         url         |     domain     |
+---------------------+----------------+
| www.abc.xyz         | abc.xyz        |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com          | google.com     |
| http://example.com  | example.com    |
+---------------------+----------------+

这篇关于BigQuery域功能区分大小写敏感性差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆