BYTES的按位二进制运算符需要相等长度的输入.左侧有16个字节,右侧有4个字节 [英] Bitwise binary operator for BYTES requires equal length of the inputs. Got 16 bytes on the left hand side and 4 bytes on the right hand side

查看:56
本文介绍了BYTES的按位二进制运算符需要相等长度的输入.左侧有16个字节,右侧有4个字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用IP将用户与他们的地理位置匹配.

I'm trying to match users to their geo locations using their IPs.

IP是ipv4,ipv6和一些行和无效条目的组合.

IPs is a mix of ipv4, ipv6 and some rows with invalid entries.

我使用了Felipe Hoffa的这篇文章中的说明

I used instructions from this article by Felipe Hoffa https://towardsdatascience.com/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds-e9e652480bd2 to match my IPs with the IPs in the blocks file.

问题是尝试使用NET.SAFE_IP_FROM_STRING(ip_address)&时出现错误.NET.IP_NET_MASK(4,掩码)函数.错误是:

The problem is I get an error when trying to use the NET.SAFE_IP_FROM_STRING(ip_address) & NET.IP_NET_MASK(4,mask) functions. The error is:

BYTES的按位二进制运算符需要相等长度的输入.左侧有16个字节,右侧有4个字节.

Bitwise binary operator for BYTES requires equal length of the inputs. Got 16 bytes on the left hand side and 4 bytes on the right hand side.

因此,在使用正则表达式将IP传递给函数之前,我尝试确保IP是有效的.这适用于ipv4.我现在已经找到了正则表达式也可以匹配ipv6,从我的检查来看,它似乎是准确的.但是我仍然得到错误.我不知道为什么以及如何解决我的查询以获得正确的结果.

So I tried to ensure IPs are valid before passing them to the function using regex. This works for ipv4. I've now figured out regex to match ipv6 as well, and from my checks, it seems accurate. However I still get the error. I can't figure out why and how to fix my query to get the correct result.

请参阅下面的整个查询

(
    SELECT
      *,
      NET.SAFE_IP_FROM_STRING(ip_address) & NET.IP_NET_MASK(4,
        mask) network_bin
    FROM (
      SELECT
        * EXCEPT (is_valid)
      FROM (
        SELECT
          *,
          CASE
            WHEN (REGEXP_CONTAINS(ip_address, r'\A[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7}\z') OR (NOT REGEXP_CONTAINS(ip_address, r'\A(.*?[a-f0-9](:|\z)){8}') AND REGEXP_CONTAINS(ip_address, r'\A([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,6})?::([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,6})?\z')) OR REGEXP_CONTAINS(ip_address, r'\A[a-f0-9]{1,4}(:[a-f0-9]{1,4}){5}:(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z') OR (NOT REGEXP_CONTAINS(ip_address, r'\A(.*?[a-f0-9]:){6}') AND REGEXP_CONTAINS(ip_address, r'\A([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,4})?::([a-f0-9]{1,4}:){0,5}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z')) ) THEN TRUE
            WHEN REGEXP_CONTAINS(ip_address, r"^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$") THEN TRUE
          ELSE
          FALSE
        END
          AS is_valid
        FROM (
          SELECT
            user,
            ip_address,
            date
          FROM 
            `project.dataset.table`)
      WHERE
        is_valid IS TRUE),
      UNNEST(GENERATE_ARRAY(9,32)) mask)

推荐答案

可能的问题是NET.SAFE_IP_FROM_STRING对于IPv4返回4个字节,对于IPv6返回16个字节.所以&NET.IP_NET_MASK(4,mask)对于IPv4是可以的,但是对于IPv6,您需要使用&NET.IP_NET_MASK(16,mask).一种选择是引入类似 is_ipv4 之类的东西,并像这样使用它:

probably the issue is that NET.SAFE_IP_FROM_STRING returns 4 bytes for IPv4 and 16 bytes for IPv6. So & NET.IP_NET_MASK(4, mask) is ok for IPv4, but for IPv6 you need to use & NET.IP_NET_MASK(16, mask). One of the options would be to introduce something like is_ipv4 and use it like:

IF(
  is_ipv4,
  NET.SAFE_IP_FROM_STRING(ip_address) & NET.IP_NET_MASK(4, mask),
  NET.SAFE_IP_FROM_STRING(ip_address) & NET.IP_NET_MASK(16, mask)
) network_bin

这篇关于BYTES的按位二进制运算符需要相等长度的输入.左侧有16个字节,右侧有4个字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆