Hive - LIKE运算符 [英] Hive - LIKE Operator
问题描述
我无法弄清楚我是如何处理这个问题的:
这些是我的数据:
表1:表格2:
品牌产品销售
Sony Sony ABCD 1233
Apple Sony 1233
Google Sony aaaa 1233
IBM Apple 123 1233
等Apple 345 1233
IBM 13123 1233
是否有可能过滤查询,我有一张桌子,其中有品牌和总销售额?
我的想法是:
从table1中选择table1.brand,sum(table2.sold)
join table2
on(table1.brand LIKE'%table2.product%')
group by table.1.brand
这是我的想法,但我总是得到一个错误
最大的问题是Like运算符还是有其他解决方案吗?
我看到两个问题:首先,配置单元中的JOIN只处理平等条件,在那里工作。
https ://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
只有等式连接,外连接,Hive支持左半连接。 Hive不支持非平等条件的连接条件,因为很难表达map / reduce作业等条件。
第二,我也看到类似声明本身的一个问题:'%table2.product%'被解释为:字面意思是字符串'%table2.product%'。另外,即使这样做是为了达到目的,它会尝试在品牌内部寻找table2.product,当你似乎想要另一种方式时。为了获得您想要的评估,您需要将通配符添加到table1.brand的内容;为了实现这一点,你需要将通配符连接到你的表达式中。
table2.product LIKE concat('%',table1。品牌,'%'))
通过这样做,您的喜欢将评估字符串'%Sony% ','%Apple%'等等而不是'%table2.product%'。
你想要的是布兰登贝尔的查询,我把它合并到这个答案:
pre $ lt; code> SELECT table1.brand,SUM(table2.sold)
FROM table1,table2
WHERE table2.product LIKE concat('%',table1.brand,'%')
GROUP BY table1.brand;
I can not figure out how I deal with that problem:
These are my Data:
Table1: Table2:
BRAND PRODUCT SOLD
Sony Sony ABCD 1233
Apple Sony adv 1233
Google Sony aaaa 1233
IBM Apple 123 1233
etc. Apple 345 1233
IBM 13123 1233
Is it possible to filter the query that I have a table where stands the brand and the total solds? My idea is:
Select table1.brand, sum(table2.sold) from table1
join table2
on (table1.brand LIKE '%table2.product%')
group by table.1.brand
That was my idea, but i always get an Error
The biggest problem is the Like-Operator or is there any other solution?
I see two issues: First of all, JOINs in hive only work with equality conditions, that like isn't going to work there.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.
Instead, that wants to go into a where clause.
Secondly, I also see a problem with the like statement itself: '%table2.product%' is being interpreted as literally the string '%table2.product%'. Additionally, even if this was doing what was intended, it would try to look for table2.product inside of brand, when you seem to want it the other way. To get the evaluation you intended, you need to add the wildcard to the contents of table1.brand; to accomplish this, you want to concatenate your wildcards into your expression.
table2.product LIKE concat('%',table1.brand,'%'))
By doing this, your like will evaluate for strings '%Sony%', '%Apple%'...etc instead of '%table2.product%'.
What you want is Brandon Bell's query, which I've merged into this answer:
SELECT table1.brand, SUM(table2.sold)
FROM table1, table2
WHERE table2.product LIKE concat('%', table1.brand, '%')
GROUP BY table1.brand;
这篇关于Hive - LIKE运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!