在两列上加入同一个表 [英] JOIN the same table on two columns

查看：107 发布时间：2018/5/7 17:47:25 sql join google-bigquery

本文介绍了在两列上加入同一个表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用JOIN将导入和导出数据中的国家和产品ID替换为存储在单独表格中的实际国家和产品名称。在数据源表（ data ）中，有两列包含国家/地区ID，用于出发地和目的地，我将用国家/地区名替换这两个列。

我所提出的代码两次引用了 country_names 表 - 作为 country_names 和 country_names2 ， - 它没有似乎很优雅。我希望能够只用一个名字来引用表格一次。我会很感激，如果有人指出我更优雅，也许更有效的方式来实现相同的结果。

  SELECT 
 country_names.name AS起源$ b $ country_names2.name AS dest $ b $ product_names.name AS product $ b $ SUM（data.export_val）AS export_val $ b $ SUM（data.import_val ）AS import_val 
 FROM 
 OEC.year_origin_destination_hs92_6 AS数据
 JOIN 
 OEC.products_hs_92 AS product_names 
 ON 
 data.hs92 = product_names.hs92 
 JOIN 
 OEC.country_names AS country_names 
 ON 
 data.origin = country_names.id_3char 
 JOIN 
 OEC.country_names AS country_names2 
 ON 
 data.dest = country_names2.id_3char 
 WHERE 
 data.year> 2012 
 AND data.export_val> 1E8 
 GROUP BY 
来源，
 dest，
产品

将产品ID转换为产品名称的表格有6K +行。这是一个小样本：

pre code $ id $ hs92
63215 3215油墨
2130110 130110 Lac
21002 1002 Rye
2100200 100200 Rye
52706 2706 Tar
20902 902 Tea
42203 2203 Beer
42302 2302 Bran
178703 8703 Cars

将国家/地区转换为国家/地区名称（这是我必须加入两次的表格）世界上所有的国家。（ id_3char 是使用的列。）下面是一个示例：

  id id_3char name 
 euchi chi海峡群岛
 askhm khm柬埔寨
 eublx blx比利时 - 卢森堡
 eublr blr白俄罗斯
 e mnene mne黑山
 euhun hun匈牙利
蒙古蒙古
 nabhs bhs巴哈马
 afsen sen塞内加尔

以下是来自包含和 dest 两列的总计205M行的导入和导出数据表，我正在进行连接：

  year origin dest hs92 export_val import_val 
 2009 can is isr 300410 2152838.47 3199.24 
 1995 chn jpn 590190 275748.65 554154.24 
 2000 deu gmb 100610 1573508.44 1327.0 
 2008 deu jpn 540822 10000.0 202 062.43 
 2010 deu ukr 950390 1626012.04 159423.38 
 2006 esp prt 080530 2470699.19 125291.33 
 2006 grc ind 844859 8667.0 3182.0 
 2000 ltu deu 630399 6018.12 5061.96 
 2005 usa zaf 290219 2126216.52 34561.61 
 1997 ven ecu 281122 155347.73 1010.0

解决方案

我认为你已经这样做了，它可以被视为足够好，只要按照原样使用即可：o）

同时，如果由于某种原因你真的真的要避免在该国家/地区表上进行两次连接 - 您可以做的是在select语句下面实现，比如说'OEC.origin_destination_pairs`表

  SELECT 
 o.id_3char o_id_3char，
 o.name o_name，
 d.id_3char d_id_3char，
 d.name d_name 
 FROM`OEC.country_names` o 
 CROSS JOIN`OEC.country_names` d 
  
 $ b $ p 
 $ b $ p 
 
 
 
 pre 
 
 
 
 $ p $  SELECT 
 country_names.o_name AS起源$ b $ country_names.d_name AS dest，
 product_names.name AS产品
 SUM（data.export_val）AS export_val， 
 SUM（data.import_val）AS import_val 
 FROM OEC.year_origin_destination_hs92_6 AS data 
 JOIN OEC.products_hs_92 AS product_names 
 ON data.hs92 = product_names.hs92 
 JOIN OEC .origin_destination_pairs AS country_names 
 ON data.origin = country_names.o_id_3char 
 AND data.dest = country_names2.d_id_3char 
 WHERE data.year> 2012 
 AND data.export_val> 1E8 
 GROUP BY 
来源，
 dest，
产品

上面的动机是在你的特定情况下存储和查询的代价你的`OEC.country_names`表大小只有大约10KB 每次你查询它时支付就好像它是10MB（费用四舍五入到最接近的MB，每个查询引用每个表至少处理10 MB数据，并且每个查询处理至少10 MB数据。）

因此，如果您将实现上述表格 - 它仍然会小于10MB，因此查询费用没有区别存储该表格的情况与此类似 - 没有可见的费用变化p>

您可以查看更多关于定价的信息 here

 
I use JOINs to replace country and product IDs in import and export data with actual country and products names stored in separate tables. In the data source table (data), there are two columns with country IDs, for origin and destination, both of which I am replacing with country names.

The code I have come up with refers to the country_names table twice – as country_names, and country_names2, – which doesn’t seem to be very elegant. I expected to be able to refer to the table just once, by a single name. I would be grateful if someone pointed me to a more elegant and maybe more efficient way to achieve the same result.
SELECT
  country_names.name AS origin,
  country_names2.name AS dest,
  product_names.name AS product,
  SUM(data.export_val) AS export_val,
  SUM(data.import_val) AS import_val
FROM
  OEC.year_origin_destination_hs92_6 AS data
JOIN
  OEC.products_hs_92 AS product_names
ON
  data.hs92 = product_names.hs92
JOIN
  OEC.country_names AS country_names
ON
  data.origin = country_names.id_3char
JOIN
  OEC.country_names AS country_names2
ON
  data.dest = country_names2.id_3char
WHERE
  data.year > 2012
  AND data.export_val > 1E8
GROUP BY
  origin,
  dest,
  product
The table to convert product IDs to product names has 6K+ rows. Here is a small sample:
id      hs92    name     
63215   3215    Ink  
2130110 130110  Lac  
21002   1002    Rye  
2100200 100200  Rye  
52706   2706    Tar  
20902   902     Tea  
42203   2203    Beer     
42302   2302    Bran     
178703  8703    Cars
The table to convert country IDs to country names (which is the table I have to JOIN on twice) has 264 rows for all countries in the world. (id_3char is the column used.) Here is a sample:
id      id_3char    name     
euchi   chi         Channel Islands  
askhm   khm         Cambodia     
eublx   blx         Belgium-Luxembourg   
eublr   blr         Belarus  
eumne   mne         Montenegro   
euhun   hun         Hungary  
asmng   mng         Mongolia     
nabhs   bhs         Bahamas  
afsen   sen         Senegal
And here is a sample of data from the import and export data table with a total of 205M rows that has the two columns origin and dest that I am making a join on:
year    origin  dest    hs92    export_val  import_val   
2009    can     isr     300410  2152838.47  3199.24  
1995    chn     jpn     590190  275748.65   554154.24    
2000    deu     gmb     100610  1573508.44  1327.0   
2008    deu     jpn     540822  10000.0     202062.43    
2010    deu     ukr     950390  1626012.04  159423.38    
2006    esp     prt     080530  2470699.19  125291.33    
2006    grc     ind     844859  8667.0      3182.0   
2000    ltu     deu     630399  6018.12     5061.96  
2005    usa     zaf     290219  2126216.52  34561.61     
1997    ven     ecu     281122  155347.73   1010.0

 解决方案 
I think you already have it done such that it can be considered good enough to just use as is :o)  

Meantime, If for some reason you really-really want to avoid two joins on that country table - what you can do is to materialize below select statement into let's say `OEC.origin_destination_pairs` table 
SELECT 
  o.id_3char o_id_3char,
  o.name o_name,
  d.id_3char d_id_3char,
  d.name d_name
FROM `OEC.country_names` o
CROSS JOIN `OEC.country_names` d   
Then you can just join on that new table as below   
SELECT
  country_names.o_name AS origin,
  country_names.d_name AS dest,
  product_names.name AS product,
  SUM(data.export_val) AS export_val,
  SUM(data.import_val) AS import_val
FROM OEC.year_origin_destination_hs92_6 AS data
JOIN OEC.products_hs_92 AS product_names
   ON data.hs92 = product_names.hs92
JOIN OEC.origin_destination_pairs AS country_names
   ON data.origin = country_names.o_id_3char
  AND data.dest = country_names2.d_id_3char
WHERE data.year > 2012
  AND data.export_val > 1E8
GROUP BY
    origin,
    dest,
    product
The motivation behind above is cost of storing and querying in your particular case

Your `OEC.country_names` table is just about 10KB in size

Each time you query it you pay as if it is 10MB (Charges are rounded to the nearest MB, with a minimum 10 MB data processed per table referenced by the query, and with a minimum 10 MB data processed per query.)  

So, if you will materialize above mentioned table - it will still be less than 10MB so no difference in querying charges

Similar situation with storing that table - no visible changes in charges   

You can check more about pricing here

                        这篇关于在两列上加入同一个表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在两列上加入同一个表 [英] JOIN the same table on two columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在两列上加入同一个表 [英] JOIN the same table on two columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭