在两列上加入同一个表 [英] JOIN the same table on two columns

查看:107
本文介绍了在两列上加入同一个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用JOIN将导入和导出数据中的国家和产品ID替换为存储在单独表格中的实际国家和产品名称。在数据源表( data )中,有两列包含国家/地区ID,用于出发地和目的地,我将用国家/地区名替换这两个列。



我所提出的代码两次引用了 country_names 表 - 作为 country_names country_names2 , - 它没有似乎很优雅。我希望能够只用一个名字来引用表格一次。我会很感激,如果有人指出我更优雅,也许更有效的方式来实现相同的结果。

  SELECT 
country_names.name AS起源$ b $ country_names2.name AS dest $ b $ product_names.name AS product $ b $ SUM(data.export_val)AS export_val $ b $ SUM(data.import_val )AS import_val
FROM
OEC.year_origin_destination_hs92_6 AS数据
JOIN
OEC.products_hs_92 AS product_names
ON
data.hs92 = product_names.hs92
JOIN
OEC.country_names AS country_names
ON
data.origin = country_names.id_3char
JOIN
OEC.country_names AS country_names2
ON
data.dest = country_names2.id_3char
WHERE
data.year> 2012
AND data.export_val> 1E8
GROUP BY
来源,
dest,
产品

将产品ID转换为产品名称的表格有6K +行。这是一个小样本:

pre code $ id $ hs92
63215 3215油墨
2130110 130110 Lac
21002 1002 Rye
2100200 100200 Rye
52706 2706 Tar
20902 902 Tea
42203 2203 Beer
42302 2302 Bran
178703 8703 Cars

将国家/地区转换为国家/地区名称(这是我必须加入两次的表格)世界上所有的国家。 ( id_3char 是使用的列。)下面是一个示例:

  id id_3char name 
euchi chi海峡群岛
askhm khm柬埔寨
eublx blx比利时 - 卢森堡
eublr blr白俄罗斯
e mnene mne黑山
euhun hun匈牙利
蒙古蒙古
nabhs bhs巴哈马
afsen sen塞内加尔

以下是来自包含 dest 两列的总计205M行的导入和导出数据表,我正在进行连接:

  year origin dest hs92 export_val import_val 
2009 can is isr 300410 2152838.47 3199.24
1995 chn jpn 590190 275748.65 554154.24
2000 deu gmb 100610 1573508.44 1327.0
2008 deu jpn 540822 10000.0 202 062.43
2010 deu ukr 950390 1626012.04 159423.38
2006 esp prt 080530 2470699.19 125291.33
2006 grc ind 844859 8667.0 3182.0
2000 ltu deu 630399 6018.12 5061.96
2005 usa zaf 290219 2126216.52 34561.61
1997 ven ecu 281122 155347.73 1010.0


解决方案

我认为你已经这样做了,它可以被视为足够好,只要按照原样使用即可:o)

同时,如果由于某种原因你真的真的要避免在该国家/地区表上进行两次连接 - 您可以做的是在select语句下面实现,比如说'OEC.origin_destination_pairs`表

  SELECT 
o.id_3char o_id_3char,
o.name o_name,
d.id_3char d_id_3char,
d.name d_name
FROM`OEC.country_names` o
CROSS JOIN`OEC.country_names` d

$ b $ p
$ b $ p



pre



$ p $ SELECT
country_names.o_name AS起源$ b $ country_names.d_name AS dest,
product_names.name AS产品
SUM(data.export_val)AS export_val,
SUM(data.import_val)AS import_val
FROM OEC.year_origin_destination_hs92_6 AS data
JOIN OEC.products_hs_92 AS product_names
ON data.hs92 = product_names.hs92
JOIN OEC .origin_destination_pairs AS country_names
ON data.origin = country_names.o_id_3char
AND data.dest = country_names2.d_id_3char
WHERE data.year> 2012
AND data.export_val> 1E8
GROUP BY
来源,
dest,
产品



上面的动机是在你的特定情况下存储和查询的代价

你的`OEC.country_names`表大小只有大约10KB

每次你查询它时支付就好像它是10MB(费用四舍五入到最接近的MB,每个查询引用每个表至少处理10 MB数据,并且每个查询处理至少10 MB数据。)

因此,如果您将实现上述表格 - 它仍然会小于10MB,因此查询费用没有区别

存储该表格的情况与此类似 - 没有可见的费用变化
p>

您可以查看更多关于定价的信息 here


I use JOINs to replace country and product IDs in import and export data with actual country and products names stored in separate tables. In the data source table (data), there are two columns with country IDs, for origin and destination, both of which I am replacing with country names.

The code I have come up with refers to the country_names table twice – as country_names, and country_names2, – which doesn’t seem to be very elegant. I expected to be able to refer to the table just once, by a single name. I would be grateful if someone pointed me to a more elegant and maybe more efficient way to achieve the same result.

SELECT
  country_names.name AS origin,
  country_names2.name AS dest,
  product_names.name AS product,
  SUM(data.export_val) AS export_val,
  SUM(data.import_val) AS import_val
FROM
  OEC.year_origin_destination_hs92_6 AS data
JOIN
  OEC.products_hs_92 AS product_names
ON
  data.hs92 = product_names.hs92
JOIN
  OEC.country_names AS country_names
ON
  data.origin = country_names.id_3char
JOIN
  OEC.country_names AS country_names2
ON
  data.dest = country_names2.id_3char
WHERE
  data.year > 2012
  AND data.export_val > 1E8
GROUP BY
  origin,
  dest,
  product

The table to convert product IDs to product names has 6K+ rows. Here is a small sample:

id      hs92    name     
63215   3215    Ink  
2130110 130110  Lac  
21002   1002    Rye  
2100200 100200  Rye  
52706   2706    Tar  
20902   902     Tea  
42203   2203    Beer     
42302   2302    Bran     
178703  8703    Cars

The table to convert country IDs to country names (which is the table I have to JOIN on twice) has 264 rows for all countries in the world. (id_3char is the column used.) Here is a sample:

id      id_3char    name     
euchi   chi         Channel Islands  
askhm   khm         Cambodia     
eublx   blx         Belgium-Luxembourg   
eublr   blr         Belarus  
eumne   mne         Montenegro   
euhun   hun         Hungary  
asmng   mng         Mongolia     
nabhs   bhs         Bahamas  
afsen   sen         Senegal

And here is a sample of data from the import and export data table with a total of 205M rows that has the two columns origin and dest that I am making a join on:

year    origin  dest    hs92    export_val  import_val   
2009    can     isr     300410  2152838.47  3199.24  
1995    chn     jpn     590190  275748.65   554154.24    
2000    deu     gmb     100610  1573508.44  1327.0   
2008    deu     jpn     540822  10000.0     202062.43    
2010    deu     ukr     950390  1626012.04  159423.38    
2006    esp     prt     080530  2470699.19  125291.33    
2006    grc     ind     844859  8667.0      3182.0   
2000    ltu     deu     630399  6018.12     5061.96  
2005    usa     zaf     290219  2126216.52  34561.61     
1997    ven     ecu     281122  155347.73   1010.0

解决方案

I think you already have it done such that it can be considered good enough to just use as is :o)

Meantime, If for some reason you really-really want to avoid two joins on that country table - what you can do is to materialize below select statement into let's say `OEC.origin_destination_pairs` table

SELECT 
  o.id_3char o_id_3char,
  o.name o_name,
  d.id_3char d_id_3char,
  d.name d_name
FROM `OEC.country_names` o
CROSS JOIN `OEC.country_names` d   

Then you can just join on that new table as below

SELECT
  country_names.o_name AS origin,
  country_names.d_name AS dest,
  product_names.name AS product,
  SUM(data.export_val) AS export_val,
  SUM(data.import_val) AS import_val
FROM OEC.year_origin_destination_hs92_6 AS data
JOIN OEC.products_hs_92 AS product_names
   ON data.hs92 = product_names.hs92
JOIN OEC.origin_destination_pairs AS country_names
   ON data.origin = country_names.o_id_3char
  AND data.dest = country_names2.d_id_3char
WHERE data.year > 2012
  AND data.export_val > 1E8
GROUP BY
    origin,
    dest,
    product

The motivation behind above is cost of storing and querying in your particular case
Your `OEC.country_names` table is just about 10KB in size
Each time you query it you pay as if it is 10MB (Charges are rounded to the nearest MB, with a minimum 10 MB data processed per table referenced by the query, and with a minimum 10 MB data processed per query.)

So, if you will materialize above mentioned table - it will still be less than 10MB so no difference in querying charges
Similar situation with storing that table - no visible changes in charges

You can check more about pricing here

这篇关于在两列上加入同一个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆