如何使用BigQuery获取任何城市的历史天气? [英] How to get the historical weather for any city with BigQuery?

查看:189
本文介绍了如何使用BigQuery获取任何城市的历史天气?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BigQuery已将NOAA的gsod数据作为公共数据集加载-从1929年开始: https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/

BigQuery has NOAA's gsod data loaded as a public dataset - starting in 1929: https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/

如何检索任何城市的历史数据?

How can I retrieve the historical data for any city?

推荐答案

2019年更新:为方便起见

Update 2019: For convenience

SELECT * 
FROM `fh-bigquery.weather_gsod.all`
WHERE name='SAN FRANCISCO INTERNATIONAL A'
ORDER BY date DESC

每天更新-如果没有更新,请在此处报告

Updated daily - or report here if it doesn't

例如,要获得1980年以来旧金山车站最热的日子:

For example, to get the hottest days for San Francisco stations since 1980:

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until
FROM `fh-bigquery.weather_gsod.all` 
WHERE name LIKE 'SAN FRANC%'
AND date > '1980-01-01'
GROUP BY 1,2
ORDER BY active_until DESC

请注意,由于使用了群集表,该查询仅处理了28MB.

Note that this query processed only 28MB thanks to a clustered table.

类似,但不使用工作站名称,而是使用一个位置和一个由该位置聚类的表:

And similar, but instead of using the station name I'll use a location and a table clustered by the location:

WITH city AS (SELECT ST_GEOGPOINT(-122.465, 37.807))

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) station_until
FROM `fh-bigquery.weather_gsod.all_geoclustered`  
WHERE EXTRACT(YEAR FROM date) > 1980
AND ST_DISTANCE(point_gis, (SELECT * FROM city)) < 40000
GROUP BY name, state
HAVING EXTRACT(YEAR FROM station_until)>2018
ORDER BY ST_DISTANCE(ANY_VALUE(point_gis), (SELECT * FROM city)) 
LIMIT 5

2017年更新:标准SQL和最新表:

Update 2017: Standard SQL and up-to-date tables:

SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM `bigquery-public-data.noaa_gsod.gsod2016`
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day


其他示例,以显示这十年来芝加哥最冷的日子:


Additional example, to show the coldest days in Chicago in this decade:

#standardSQL
SELECT year, FORMAT('%s%s',mo,da) day ,min
FROM `fh-bigquery.weather_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod201*` b
ON a.usaf=b.stn AND a.wban=b.wban
WHERE name='CHICAGO/O HARE ARPT'
AND min!=9999.9
AND mo<'03'
ORDER BY 1,2


要检索任何城市的历史天气,首先我们需要查找该城市的气象站报告. [fh-bigquery:weather_gsod.stations]表包含已知电台的名称,它们的州(如果在美国),国家/地区和其他详细信息.


To retrieve the historical weather for any city, first we need to find what station reports in that city. The table [fh-bigquery:weather_gsod.stations] contains the name of known stations, their state (if in the US), country, and other details.

因此,要查找德克萨斯州奥斯丁的所有电台,我们将使用类似以下的查询:

So to find all the stations in Austin, TX, we would use a query like this:

SELECT state, name, lat, lon
FROM [fh-bigquery:weather_gsod.stations] 
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
LIMIT 10

此方法有两个需要解决的问题:

This approach has 2 problems that need to be solved:

  • 该表中没有每个已知的电台-我需要获取此文件的更新版本.因此,如果您在这里找不到所需的电台,请不要放弃.
  • 并非在此文件中找到的每个站都每年都在运行-因此我们需要查找要查找的年份中有数据的站.

要解决第二个问题,我们需要将stations表与我们正在寻找的实际数据结合在一起.以下查询将查找奥斯丁附近的站点,列c则查看2015年有多少天有实际数据:

To solve the second problem, we need to join the stations table with the actual data we are looking for. The following query looks for stations around Austin, and the column c looks at how many days during 2015 have actual data:

SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long
FROM [fh-bigquery:weather_gsod.gsod2015] a
JOIN [fh-bigquery:weather_gsod.stations] b 
ON a.wban=b.wban
AND a.stn=b.usaf
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
GROUP BY 1,2
LIMIT 10

那很好!我们在2015年找到了4个包含奥斯丁数据的电台.

That's good! We found 4 stations with data for Austin during 2015.

请注意,我们必须以特殊方式处理雨":当气象站不监视降雨时,将其标记为99.99,而不是null.我们的查询将这些值过滤掉.

Note that we had to treat "rain" in a special way: When a station doesn't monitor for rain, instead of null, it marks it as 99.99. Our query filters those values out.

现在我们知道了这些站的stn和wban编号,我们可以选择其中任何一个并将结果可视化:

Now that we know the stn and wban numbers for these stations, we can pick any of them and visualize the results:

SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM [fh-bigquery:weather_gsod.gsod2015]
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

这篇关于如何使用BigQuery获取任何城市的历史天气?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆