如何在 BigQuery 中处理语义版本数据类型 [英] How to deal with semantic version data type in BigQuery

查看:17
本文介绍了如何在 BigQuery 中处理语义版本数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 BigQuery 中没有数据类型.您更喜欢在 BigQuery 中处理哪些语义版本?

I know that there is no data type in BigQuery. What would you prefer to deal with semantic versions in BigQuery?

我有以下架构:

software:string,
software_version:string

software_version 列是 string 但我在那里存储的数据是 semver 格式:`MAJOR.MINOR.PATCH-prerelease

software_version column is string but the data I store there is in semver format: `MAJOR.MINOR.PATCH-prerelease

我特别想执行操作符< > =.

I especially want to perform operators < > =.

select '4.0.0' < '4.0.0-beta' 

这将返回 true,但根据 semver 定义,这是错误的.因为字符 - 用于预发布.

This returns true, but according to the semver definition it's false. Because the char - is used for prerelease.

推荐答案

以下是 BigQuery Standard SQL

Below is for BigQuery Standard SQL

您可以使用 compareSemanticVersion UDF 来比较两个语义版本
和/或使用 normaizedSemanticVersion UDF 通过 ORDER BY 子句对输出进行排序.
请参阅下面的示例,其中包含两个(比较和排序依据)用例

You can use compareSemanticVersion UDF to compare two semantic versions
and/or use normaizedSemanticVersion UDF to sort output via ORDER BY clause.
See example below with both (compare and order by) use cases in one

#standardSQL
CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING) 
AS ((
  SELECT STRING_AGG(
      IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp 
    ) || 'zzzzzzzzzzzzzz' 
  FROM (
    SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars,
    FROM (
      SELECT OFFSET, char, isDigit,
        COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp
      FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET, 
      UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit
    )
    GROUP BY grp, isDigit
)));
CREATE TEMP FUNCTION compareSemanticVersions(
  normSemanticVersion1 STRING, 
  normSemanticVersion2 STRING) 
AS ((
  SELECT CASE 
      WHEN v1 < v2 THEN 'v2 newer than v1'
      WHEN v1 > v2 THEN 'v1 newer than v2'
      ELSE 'same versions'
    END
  FROM UNNEST([STRUCT(
    normaizedSemanticVersion(normSemanticVersion1) AS v1, 
    normaizedSemanticVersion(normSemanticVersion2) AS v2
  )])
));
WITH test AS (
  SELECT '1.10.0-alpha' AS v1 , '1.0.0-alpha.1' AS v2 UNION ALL
  SELECT '4.0.0', '4.0.0-beta' UNION ALL
  SELECT '1.0.0-alpha.1'     , '1.0.0-alpha.beta' UNION ALL
  SELECT '1.0.0-alpha.beta'  , '1.0.0-beta' UNION ALL
  SELECT '1.0.0-beta'        , '1.0.0-beta.2' UNION ALL
  SELECT '1.0.0-beta.2'      , '1.0.0-beta.11' UNION ALL
  SELECT '1.0.0-beta.11'     , '1.0.0-rc.1' UNION ALL
  SELECT '1.0.0-rc.1'        , '1.0.0' UNION ALL
  SELECT '1.0.0-alpha-1.1+build1234-a', '1.0.0-alpha-1.1+build1234-a'
)
SELECT v1, v2, compareSemanticVersions(v1, v2) result
FROM test 
ORDER BY normaizedSemanticVersion(v1)   

带输出

Row v1                              v2                              result   
1   1.0.0-alpha-1.1+build1234-a     1.0.0-alpha-1.1+build1234-a     same versions    
2   1.0.0-alpha.1                   1.0.0-alpha.beta                v2 newer than v1     
3   1.0.0-alpha.beta                1.0.0-beta                      v2 newer than v1     
4   1.0.0-beta.2                    1.0.0-beta.11                   v2 newer than v1     
5   1.0.0-beta.11                   1.0.0-rc.1                      v2 newer than v1     
6   1.0.0-beta                      1.0.0-beta.2                    v1 newer than v2     
7   1.0.0-rc.1                      1.0.0                           v2 newer than v1     
8   1.10.0-alpha                    1.0.0-alpha.1                   v1 newer than v2     
9   4.0.0                           4.0.0-beta                      v1 newer than v2    

注意:我在阅读您提供的参考资料后,根据我对语义版本控制的理解写了上面的 UDF.可能还有一些边缘情况需要解决.但绝对应该适用于简单的情况,我希望您能够简单地采用这些 UDF 并根据您的特定需求调整输出,甚至可以优化我最终在这里使用的

Note: I wrote above UDFs based on how I understood Semantic Versioning after reading reference you provided. There are potentially some edge cases that still needs to be addressed. But definitely should work for simple cases and I hope you will be able to simply adopt those UDFs and adjust output for your particular needs and maybe even to optimize the I ended up using here

另外一个仅供参考:在 normaizedSemanticVersion UDF 中,我使用 zzzzzzzzzz 只是为了解决一些边缘情况.我尝试的另一个选项是 ..zzzzzzzzzz(注意两个额外的点)——我认为这对于更复杂的情况提供了更好的结果——但我真的没有时间完成测试.请试试

One more as FYI: in the normaizedSemanticVersion UDF I am using zzzzzzzzzz just to address some edge-cases. Another option I tried was ..zzzzzzzzzz (note two extra dots) - I think this gives better result for more complex cases - but I was really out of time to complete testing. Please try

例如,在语义版本控制页面中有一个例子:1.0.0-alpha <1.0.0-alpha.1 <1.0.0-alpha.beta <1.0.0-β<1.0.0-beta.2 <1.0.0-beta.11 <1.0.0-rc.1 <1.0.0.

For example, in Semantic Versioning page there is an example: 1.0.0-alpha < 1.0.0-alpha.1 < 1.0.0-alpha.beta < 1.0.0-beta < 1.0.0-beta.2 < 1.0.0-beta.11 < 1.0.0-rc.1 < 1.0.0.

要制作与该示例中相同的顺序 - 应使用 ..zzzzzzzzzz - 见下文

To make this same order as in that example - ..zzzzzzzzzz should be use - see below

#standardSQL
CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING) 
AS ((
  SELECT STRING_AGG(
      IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp 
    ) || '..zzzzzzzzzzzzzz' 
  FROM (
    SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars,
    FROM (
      SELECT OFFSET, char, isDigit,
        COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp
      FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET, 
      UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit
    )
    GROUP BY grp, isDigit
)));
CREATE TEMP FUNCTION compareSemanticVersions(
  normSemanticVersion1 STRING, 
  normSemanticVersion2 STRING) 
AS ((
  SELECT
    CASE 
      WHEN v1 < v2 THEN 'v2 newer than v1'
      WHEN v1 > v2 THEN 'v1 newer than v2'
      ELSE 'same versions'
    END
  FROM UNNEST([STRUCT(
    normaizedSemanticVersion(normSemanticVersion1) AS v1, 
    normaizedSemanticVersion(normSemanticVersion2) AS v2
  )])
));
WITH test AS (
  SELECT 1 `order`, '1.0.0-alpha' version UNION ALL
  SELECT 2, '1.0.0-alpha.1' UNION ALL
  SELECT 3, '1.0.0-alpha.beta' UNION ALL
  SELECT 4, '1.0.0-beta' UNION ALL
  SELECT 5, '1.0.0-beta.2' UNION ALL
  SELECT 6, '1.0.0-beta.11' UNION ALL
  SELECT 7, '1.0.0-rc.1' UNION ALL
  SELECT 8, '1.0.0.' 
)
SELECT *
FROM test
ORDER BY normaizedSemanticVersion(version)   

具有与语义版本控制规范匹配的输出

with output that matches Semantic Versioning specification

Row order   version  
1   1   1.0.0-alpha  
2   2   1.0.0-alpha.1    
3   3   1.0.0-alpha.beta     
4   4   1.0.0-beta   
5   5   1.0.0-beta.2     
6   6   1.0.0-beta.11    
7   7   1.0.0-rc.1   
8   8   1.0.0.       

这篇关于如何在 BigQuery 中处理语义版本数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆