BigQuery域功能区分大小写敏感性差异 [英] BigQuery Domain Function Case Sensitivity Discrepancy
问题描述
当使用包含URL的Data的BigQuery查询时,我们注意到 DOMAIN
函数的行为与URL的行为不同。
这可以用这个简单的查询来证明:
SELECT
域('WWW.FOO.COM.AU'),
域(LOWER('http://WWW.FOO.COM.AU/')),
域('http: ')
全部大写的URL的结果不会似乎是正确的,并且
不幸的是, DOMAIN
(以及遗留SQL中的其他URL处理函数)有许多限制。尽管我们尚未在标准SQL 中提供相同的功能(取消选中使用传统SQL选项下),您可以使用正则表达式在更多情况下构建自己的工作。 数字 StackOverflow问题,我们可以将其中一个答案用作:
CREATE TEMPORARY FUNCTION GetDomain(url STRING)AS(
REGEXP_EXTRACT(url,r'^(?: https?:\ / \ /)?(?:[ ^ @ \\\
] + @)(?: www\)([^:\ / \\\
] +)'));??
$ b with T as(
SELECT url
FROM UNNEST(['WWW.FOO.COM.AU:8080','google.com',
'www .abc.xyz','http://example.com'])AS域)
SELECT
url,
GetDomain(url)AS域
FROM T;
+ --------------------- + ---------------- +
| url |域|
+ --------------------- + ---------------- +
| www.abc.xyz | abc.xyz |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com | google.com |
| http://example.com | example.com |
+ --------------------- + ---------------- +
When using BigQuery query with Data containing URLs we noticed that the DOMAIN
function behaves differently from the case of the URL.
This can be demonstrated with this simple query:
SELECT
domain('WWW.FOO.COM.AU'),
domain(LOWER('http://WWW.FOO.COM.AU/')),
domain('http://WWW.FOO.COM.AU/')
The result of URL of full uppercase does not seem to be right and the documentation does not mentioned anything regarding case in URLs.
DOMAIN
(and the other URL-handling functions in legacy SQL) have a number of limitations, unfortunately. While we don't have an equivalent yet in standard SQL (uncheck the "Use Legacy SQL" box under Options), you can make up your own that works in more cases using a regular expression. There are a number of StackOverflow questions about domain extraction, and we can put one of the answers to use as:
CREATE TEMPORARY FUNCTION GetDomain(url STRING) AS (
REGEXP_EXTRACT(url, r'^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)'));
WITH T AS (
SELECT url
FROM UNNEST(['WWW.FOO.COM.AU:8080', 'google.com',
'www.abc.xyz', 'http://example.com']) AS url)
SELECT
url,
GetDomain(url) AS domain
FROM T;
+---------------------+----------------+
| url | domain |
+---------------------+----------------+
| www.abc.xyz | abc.xyz |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com | google.com |
| http://example.com | example.com |
+---------------------+----------------+
这篇关于BigQuery域功能区分大小写敏感性差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!