在两列上加入同一个表 [英] JOIN the same table on two columns
问题描述
我使用JOIN将导入和导出数据中的国家和产品ID替换为存储在单独表格中的实际国家和产品名称。在数据源表( data )中,有两列包含国家/地区ID,用于出发地和目的地,我将用国家/地区名替换这两个列。
我所提出的代码两次引用了 country_names 表 - 作为 country_names 和 country_names2 , - 它没有似乎很优雅。我希望能够只用一个名字来引用表格一次。我会很感激,如果有人指出我更优雅,也许更有效的方式来实现相同的结果。
SELECT
country_names.name AS起源$ b $ country_names2.name AS dest $ b $ product_names.name AS product $ b $ SUM(data.export_val)AS export_val $ b $ SUM(data.import_val )AS import_val
FROM
OEC.year_origin_destination_hs92_6 AS数据
JOIN
OEC.products_hs_92 AS product_names
ON
data.hs92 = product_names.hs92
JOIN
OEC.country_names AS country_names
ON
data.origin = country_names.id_3char
JOIN
OEC.country_names AS country_names2
ON
data.dest = country_names2.id_3char
WHERE
data.year> 2012
AND data.export_val> 1E8
GROUP BY
来源,
dest,
产品
将产品ID转换为产品名称的表格有6K +行。这是一个小样本:
pre code $ id $ hs92
63215 3215油墨
2130110 130110 Lac
21002 1002 Rye
2100200 100200 Rye
52706 2706 Tar
20902 902 Tea
42203 2203 Beer
42302 2302 Bran
178703 8703 Cars
将国家/地区转换为国家/地区名称(这是我必须加入两次的表格)世界上所有的国家。 ( id_3char 是使用的列。)下面是一个示例:
id id_3char name
euchi chi海峡群岛
askhm khm柬埔寨
eublx blx比利时 - 卢森堡
eublr blr白俄罗斯
e mnene mne黑山
euhun hun匈牙利
蒙古蒙古
nabhs bhs巴哈马
afsen sen塞内加尔
以下是来自包含 和 dest 两列的总计205M行的导入和导出数据表,我正在进行连接:
year origin dest hs92 export_val import_val
2009 can is isr 300410 2152838.47 3199.24
1995 chn jpn 590190 275748.65 554154.24
2000 deu gmb 100610 1573508.44 1327.0
2008 deu jpn 540822 10000.0 202 062.43
2010 deu ukr 950390 1626012.04 159423.38
2006 esp prt 080530 2470699.19 125291.33
2006 grc ind 844859 8667.0 3182.0
2000 ltu deu 630399 6018.12 5061.96
2005 usa zaf 290219 2126216.52 34561.61
1997 ven ecu 281122 155347.73 1010.0
我认为你已经这样做了,它可以被视为足够好,只要按照原样使用即可:o)
同时,如果由于某种原因你真的真的
要避免在该国家/地区表上进行两次连接 - 您可以做的是在select语句下面实现,比如说'OEC.origin_destination_pairs`表
SELECT
o.id_3char o_id_3char,
o.name o_name,
d.id_3char d_id_3char,
d.name d_name
FROM`OEC.country_names` o
CROSS JOIN`OEC.country_names` d
$ c
$ b $ p
$ b $ p
pre
$ p $ SELECT
country_names.o_name AS起源$ b $ country_names.d_name AS dest,
product_names.name AS产品
SUM(data.export_val)AS export_val,
SUM(data.import_val)AS import_val
FROM OEC.year_origin_destination_hs92_6 AS data
JOIN OEC.products_hs_92 AS product_names
ON data.hs92 = product_names.hs92
JOIN OEC .origin_destination_pairs AS country_names
ON data.origin = country_names.o_id_3char
AND data.dest = country_names2.d_id_3char
WHERE data.year> 2012
AND data.export_val> 1E8
GROUP BY
来源,
dest,
产品
上面的动机是在你的特定情况下存储和查询的代价
你的`OEC.country_names`表大小只有大约10KB
每次你查询它时支付就好像它是10MB(费用四舍五入到最接近的MB,每个查询引用每个表至少处理10 MB数据,并且每个查询处理至少10 MB数据。)
因此,如果您将实现上述表格 - 它仍然会小于10MB,因此查询费用没有区别
存储该表格的情况与此类似 - 没有可见的费用变化 p>
您可以查看更多关于定价的信息 here
I use JOINs to replace country and product IDs in import and export data with actual country and products names stored in separate tables. In the data source table (data), there are two columns with country IDs, for origin and destination, both of which I am replacing with country names.
The code I have come up with refers to the country_names table twice – as country_names, and country_names2, – which doesn’t seem to be very elegant. I expected to be able to refer to the table just once, by a single name. I would be grateful if someone pointed me to a more elegant and maybe more efficient way to achieve the same result.
SELECT
country_names.name AS origin,
country_names2.name AS dest,
product_names.name AS product,
SUM(data.export_val) AS export_val,
SUM(data.import_val) AS import_val
FROM
OEC.year_origin_destination_hs92_6 AS data
JOIN
OEC.products_hs_92 AS product_names
ON
data.hs92 = product_names.hs92
JOIN
OEC.country_names AS country_names
ON
data.origin = country_names.id_3char
JOIN
OEC.country_names AS country_names2
ON
data.dest = country_names2.id_3char
WHERE
data.year > 2012
AND data.export_val > 1E8
GROUP BY
origin,
dest,
product
The table to convert product IDs to product names has 6K+ rows. Here is a small sample:
id hs92 name
63215 3215 Ink
2130110 130110 Lac
21002 1002 Rye
2100200 100200 Rye
52706 2706 Tar
20902 902 Tea
42203 2203 Beer
42302 2302 Bran
178703 8703 Cars
The table to convert country IDs to country names (which is the table I have to JOIN on twice) has 264 rows for all countries in the world. (id_3char is the column used.) Here is a sample:
id id_3char name
euchi chi Channel Islands
askhm khm Cambodia
eublx blx Belgium-Luxembourg
eublr blr Belarus
eumne mne Montenegro
euhun hun Hungary
asmng mng Mongolia
nabhs bhs Bahamas
afsen sen Senegal
And here is a sample of data from the import and export data table with a total of 205M rows that has the two columns origin and dest that I am making a join on:
year origin dest hs92 export_val import_val
2009 can isr 300410 2152838.47 3199.24
1995 chn jpn 590190 275748.65 554154.24
2000 deu gmb 100610 1573508.44 1327.0
2008 deu jpn 540822 10000.0 202062.43
2010 deu ukr 950390 1626012.04 159423.38
2006 esp prt 080530 2470699.19 125291.33
2006 grc ind 844859 8667.0 3182.0
2000 ltu deu 630399 6018.12 5061.96
2005 usa zaf 290219 2126216.52 34561.61
1997 ven ecu 281122 155347.73 1010.0
I think you already have it done such that it can be considered good enough to just use as is :o)
Meantime, If for some reason you really-really
want to avoid two joins on that country table - what you can do is to materialize below select statement into let's say `OEC.origin_destination_pairs` table
SELECT
o.id_3char o_id_3char,
o.name o_name,
d.id_3char d_id_3char,
d.name d_name
FROM `OEC.country_names` o
CROSS JOIN `OEC.country_names` d
Then you can just join on that new table as below
SELECT
country_names.o_name AS origin,
country_names.d_name AS dest,
product_names.name AS product,
SUM(data.export_val) AS export_val,
SUM(data.import_val) AS import_val
FROM OEC.year_origin_destination_hs92_6 AS data
JOIN OEC.products_hs_92 AS product_names
ON data.hs92 = product_names.hs92
JOIN OEC.origin_destination_pairs AS country_names
ON data.origin = country_names.o_id_3char
AND data.dest = country_names2.d_id_3char
WHERE data.year > 2012
AND data.export_val > 1E8
GROUP BY
origin,
dest,
product
The motivation behind above is cost of storing and querying in your particular case
Your `OEC.country_names` table is just about 10KB in size
Each time you query it you pay as if it is 10MB (Charges are rounded to the nearest MB, with a minimum 10 MB data processed per table referenced by the query, and with a minimum 10 MB data processed per query.)
So, if you will materialize above mentioned table - it will still be less than 10MB so no difference in querying charges
Similar situation with storing that table - no visible changes in charges
You can check more about pricing here
这篇关于在两列上加入同一个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!