如何在BigQuery标准SQL中将IP地址转换为地理位置? [英] How to transform IP addresses into geolocation in BigQuery standard SQL?

查看:208
本文介绍了如何在BigQuery标准SQL中将IP地址转换为地理位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我已阅读 https://cloudplatform .googleblog.com/2014/03/geoip-geolocation-with-google-bigquery.html

但是我想知道是否有#standardSQL的方式来做到这一点.到目前为止,由于迁移文档中建议的更改存在局限性,因此转换PARSE_IP和NTH()仍然有很多挑战.

But I was wondering if there was a #standardSQL way of doing it. So far, I have a lot of challenge converting PARSE_IP and NTH() since the suggested changes in the migration docs have limitations.

PARSE_IP(contributor_ip)NET.IPV4_TO_INT64(NET.SAFE_IP_FROM_STRING(contributor_ip))不适用于IPv6 IP地址.

Going from PARSE_IP(contributor_ip) to NET.IPV4_TO_INT64(NET.SAFE_IP_FROM_STRING(contributor_ip)) does not work for IPv6 IP addresses.

NTH(1, latitude) lat转到latitude[SAFE_ORDINAL(1)]无效,因为纬度被认为是字符串.

Going from NTH(1, latitude) lat to latitude[SAFE_ORDINAL(1)] does not work since latitude is considered a string.

可能还有更多的迁移问题,我还没有遇到.有谁知道如何将IP地址转换为BigQuery标准SQL中的地理位置?

And there might be more migration problems that I have yet to encounter. Does anyone know how to transform IP addresses into geolocation in BigQuery standard SQL?

P.S.我将如何从地理位置定位到确定时区?

P.S. How would I go from geolocation to determining timezone?

那么这有什么区别

#legacySQL
SELECT
  COUNT(*) c,
  city,
  countryLabel,
  NTH(1, latitude) lat,
  NTH(1, longitude) lng
FROM (
  SELECT
    INTEGER(PARSE_IP(contributor_ip)) AS clientIpNum,
    INTEGER(PARSE_IP(contributor_ip)/(256*256)) AS classB
  FROM
    [publicdata:samples.wikipedia]
  WHERE
    contributor_ip IS NOT NULL ) AS a
JOIN EACH
  [fh-bigquery:geocode.geolite_city_bq_b2b] AS b
ON
  a.classB = b.classB
WHERE
  a.clientIpNum BETWEEN b.startIpNum
  AND b.endIpNum
  AND city != ''
GROUP BY
  city,
  countryLabel
ORDER BY
  1 DESC

SELECT
  COUNT(*) c,
  city,
  countryLabel,
  ANY_VALUE(latitude) lat,
  ANY_VALUE(longitude) lng
FROM (
  SELECT
    CASE
      WHEN BYTE_LENGTH(contributor_ip) < 16 THEN SAFE_CAST(NET.IPV4_TO_INT64(NET.SAFE_IP_FROM_STRING(contributor_ip)) AS INT64)
      ELSE NULL
    END AS clientIpNum,
    CASE
      WHEN BYTE_LENGTH(contributor_ip) < 16 THEN SAFE_CAST(NET.IPV4_TO_INT64(NET.SAFE_IP_FROM_STRING(contributor_ip)) / (256*256) AS INT64) 
      ELSE NULL
    END AS classB
  FROM
    `publicdata.samples.wikipedia`
  WHERE
    contributor_ip IS NOT NULL ) AS a
JOIN
  `fh-bigquery.geocode.geolite_city_bq_b2b` AS b
ON
  a.classB = b.classB
WHERE
  a.clientIpNum BETWEEN b.startIpNum
  AND b.endIpNum
  AND city != ''
GROUP BY
  city,
  countryLabel
ORDER BY
  1 DESC

edit2:好像我设法通过不正确地转换浮点数来解决问题.现在,标准SQL从旧版SQL中返回41815行,而不是56347行,这可能是由于缺乏标准SQL从IPv6到int的转换,但可能是由于其他原因所致.此外,旧版SQL查询的性能要好得多,运行时间大约为10秒,而不是标准SQL的一整分钟.

edit2: Seems like I manage to figure out the problem via not casting a float correctly. Right now, the standard SQL returns 41815 rows instead the 56347 rows from the legacy SQL which may be due to the lack of conversion from IPv6 to int for standard SQL, but it might be due to something else. Also the legacy SQL query performs much better, running at about 10 seconds instead of the full minute from the standard SQL.

推荐答案

根据 https://gist .github.com/matsukaz/a145c2553a0faa59e32ad7c25e6a92f7

#standardSQL
SELECT
  id,
  IFNULL(city, 'Other') AS city,
  IFNULL(countryLabel, 'Other') AS countryLabel,
  latitude,
  longitude
FROM (
  SELECT
    id,
    NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip)) AS clientIpNum,
    TRUNC(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip))/(256*256)) AS classB
  FROM
    `<project>.<dataset>.log` ) AS a
LEFT OUTER JOIN
  `fh-bigquery.geocode.geolite_city_bq_b2b` AS b
ON
  a.classB = b.classB
  AND a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum
ORDER BY
  id ASC

这篇关于如何在BigQuery标准SQL中将IP地址转换为地理位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆