BigQuery如何处理语义版本数据类型 [英] How to deal with semantic version data type in BigQuery
问题描述
我知道BigQuery中没有数据类型.您希望在BigQuery中处理什么语义版本?
I know that there is no data type in BigQuery. What would you prefer to deal with semantic versions in BigQuery?
我有以下架构:
software:string,
software_version:string
software_version列为 string
,但我存储在其中的数据为semver格式:`MAJOR.MINOR.PATCH-prerelease
software_version column is string
but the data I store there is in semver format: `MAJOR.MINOR.PATCH-prerelease
我特别想执行运算符<
>
=
.
I especially want to perform operators <
>
=
.
select '4.0.0' < '4.0.0-beta'
这将返回 true
,但是根据semver 定义,它是错误的.因为char -
用于预发布.
This returns true
, but according to the semver definition it's false. Because the char -
is used for prerelease.
推荐答案
下面是BigQuery标准SQL
Below is for BigQuery Standard SQL
您可以使用 compareSemanticVersion
UDF比较两个语义版本
和/或使用 normaizedSemanticVersion
UDF通过ORDER BY子句对输出进行排序.
参见下面的示例,其中两个用例(按比较和按顺序排列)
You can use compareSemanticVersion
UDF to compare two semantic versions
and/or use normaizedSemanticVersion
UDF to sort output via ORDER BY clause.
See example below with both (compare and order by) use cases in one
#standardSQL
CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING)
AS ((
SELECT STRING_AGG(
IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp
) || 'zzzzzzzzzzzzzz'
FROM (
SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars,
FROM (
SELECT OFFSET, char, isDigit,
COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp
FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET,
UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit
)
GROUP BY grp, isDigit
)));
CREATE TEMP FUNCTION compareSemanticVersions(
normSemanticVersion1 STRING,
normSemanticVersion2 STRING)
AS ((
SELECT CASE
WHEN v1 < v2 THEN 'v2 newer than v1'
WHEN v1 > v2 THEN 'v1 newer than v2'
ELSE 'same versions'
END
FROM UNNEST([STRUCT(
normaizedSemanticVersion(normSemanticVersion1) AS v1,
normaizedSemanticVersion(normSemanticVersion2) AS v2
)])
));
WITH test AS (
SELECT '1.10.0-alpha' AS v1 , '1.0.0-alpha.1' AS v2 UNION ALL
SELECT '4.0.0', '4.0.0-beta' UNION ALL
SELECT '1.0.0-alpha.1' , '1.0.0-alpha.beta' UNION ALL
SELECT '1.0.0-alpha.beta' , '1.0.0-beta' UNION ALL
SELECT '1.0.0-beta' , '1.0.0-beta.2' UNION ALL
SELECT '1.0.0-beta.2' , '1.0.0-beta.11' UNION ALL
SELECT '1.0.0-beta.11' , '1.0.0-rc.1' UNION ALL
SELECT '1.0.0-rc.1' , '1.0.0' UNION ALL
SELECT '1.0.0-alpha-1.1+build1234-a', '1.0.0-alpha-1.1+build1234-a'
)
SELECT v1, v2, compareSemanticVersions(v1, v2) result
FROM test
ORDER BY normaizedSemanticVersion(v1)
有输出
Row v1 v2 result
1 1.0.0-alpha-1.1+build1234-a 1.0.0-alpha-1.1+build1234-a same versions
2 1.0.0-alpha.1 1.0.0-alpha.beta v2 newer than v1
3 1.0.0-alpha.beta 1.0.0-beta v2 newer than v1
4 1.0.0-beta.2 1.0.0-beta.11 v2 newer than v1
5 1.0.0-beta.11 1.0.0-rc.1 v2 newer than v1
6 1.0.0-beta 1.0.0-beta.2 v1 newer than v2
7 1.0.0-rc.1 1.0.0 v2 newer than v1
8 1.10.0-alpha 1.0.0-alpha.1 v1 newer than v2
9 4.0.0 4.0.0-beta v1 newer than v2
注意:在阅读您提供的参考资料后,我根据对语义版本控制的理解来编写以上UDF.潜在的一些极端情况仍然需要解决.但是绝对可以在简单的情况下使用,我希望您能够采用这些UDF并根据您的特定需求调整输出,甚至可以优化我最终在这里使用的
Note: I wrote above UDFs based on how I understood Semantic Versioning after reading reference you provided. There are potentially some edge cases that still needs to be addressed. But definitely should work for simple cases and I hope you will be able to simply adopt those UDFs and adjust output for your particular needs and maybe even to optimize the I ended up using here
还有一个供参考:在 normaizedSemanticVersion
UDF中,我正在使用 zzzzzzzzzz
来解决一些极端情况.我尝试的另一种选择是 .. zzzzzzzzzz
(请注意两个额外的点)-我认为这对于更复杂的情况会提供更好的结果-但我真的没有时间完成测试.请尝试
One more as FYI: in the normaizedSemanticVersion
UDF I am using zzzzzzzzzz
just to address some edge-cases. Another option I tried was ..zzzzzzzzzz
(note two extra dots) - I think this gives better result for more complex cases - but I was really out of time to complete testing. Please try
例如,在语义版本控制"页面中,有一个示例:1.0.0-alpha<1.0.0-alpha.1<1.0.0-alpha.beta<1.0.0-beta<1.0.0-beta.2<1.0.0-beta.11<1.0.0-rc.1<1.0.0.
For example, in Semantic Versioning page there is an example: 1.0.0-alpha < 1.0.0-alpha.1 < 1.0.0-alpha.beta < 1.0.0-beta < 1.0.0-beta.2 < 1.0.0-beta.11 < 1.0.0-rc.1 < 1.0.0.
要与该示例中的顺序相同-应该使用 .. zzzzzzzzzz
-参见下文
To make this same order as in that example - ..zzzzzzzzzz
should be use - see below
#standardSQL
CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING)
AS ((
SELECT STRING_AGG(
IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp
) || '..zzzzzzzzzzzzzz'
FROM (
SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars,
FROM (
SELECT OFFSET, char, isDigit,
COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp
FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET,
UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit
)
GROUP BY grp, isDigit
)));
CREATE TEMP FUNCTION compareSemanticVersions(
normSemanticVersion1 STRING,
normSemanticVersion2 STRING)
AS ((
SELECT
CASE
WHEN v1 < v2 THEN 'v2 newer than v1'
WHEN v1 > v2 THEN 'v1 newer than v2'
ELSE 'same versions'
END
FROM UNNEST([STRUCT(
normaizedSemanticVersion(normSemanticVersion1) AS v1,
normaizedSemanticVersion(normSemanticVersion2) AS v2
)])
));
WITH test AS (
SELECT 1 `order`, '1.0.0-alpha' version UNION ALL
SELECT 2, '1.0.0-alpha.1' UNION ALL
SELECT 3, '1.0.0-alpha.beta' UNION ALL
SELECT 4, '1.0.0-beta' UNION ALL
SELECT 5, '1.0.0-beta.2' UNION ALL
SELECT 6, '1.0.0-beta.11' UNION ALL
SELECT 7, '1.0.0-rc.1' UNION ALL
SELECT 8, '1.0.0.'
)
SELECT *
FROM test
ORDER BY normaizedSemanticVersion(version)
具有与语义版本控制规范匹配的输出
with output that matches Semantic Versioning specification
Row order version
1 1 1.0.0-alpha
2 2 1.0.0-alpha.1
3 3 1.0.0-alpha.beta
4 4 1.0.0-beta
5 5 1.0.0-beta.2
6 6 1.0.0-beta.11
7 7 1.0.0-rc.1
8 8 1.0.0.
这篇关于BigQuery如何处理语义版本数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!