使用reducebykey时出错:int对象不可订阅 [英] Error using reducebykey: int object is unsubscriptable
问题描述
在执行以下脚本时,出现错误"int对象不可订阅" :
element.reduceByKey( lambda x , y : x[1]+y[1])
with元素是键值RDD,值是元组.输入示例:
(A, (toto , 10))
(A, (titi , 30))
(5, (tata, 10))
(A, (toto, 10))
我知道reduceByKey
函数采用(K,V)元组,并对所有值应用一个函数以获得减法的最终结果.
类似于 ReduceByKey Apache 中给出的示例. >
请帮忙吗?
下面是一个示例,将说明正在发生的事情.
让我们考虑一下,当您在使用某些功能f
的列表上调用reduce
时会发生什么:
reduce(f, [a,b,c]) = f(f(a,b),c)
如果以您的示例f = lambda u, v: u[1] + v[1]
为例,那么上面的表达式将分解为:
reduce(f, [a,b,c]) = f(f(a,b),c) = f(a[1]+b[1],c)
但是a[1] + b[1]
是整数,所以没有__getitem__
方法,因此会出错.
通常,更好的方法(如下所示)是使用map()
首先提取所需格式的数据,然后应用reduceByKey()
.
包含您的数据的MCVE
element = sc.parallelize(
[
('A', ('toto' , 10)),
('A', ('titi' , 30)),
('5', ('tata', 10)),
('A', ('toto', 10))
]
)
您可以几乎使用更复杂的reduce函数来获得所需的输出:
def add_tuple_values(a, b):
try:
u = a[1]
except:
u = a
try:
v = b[1]
except:
v = b
return u + v
print(element.reduceByKey(add_tuple_values).collect())
除了会导致:
[('A', 50), ('5', ('tata', 10))]
为什么?因为键'5'
只有一个值,所以没有什么可减少的.
由于这些原因,最好先调用map
.要获得所需的输出,可以执行以下操作:
>>> print(element.map(lambda x: (x[0], x[1][1])).reduceByKey(lambda u, v: u+v).collect())
[('A', 50), ('5', 10)]
更新1
这是另一种方法:
您可以在reduce
函数中创建tuple
,然后调用map
提取所需的值. (基本上颠倒map
和reduce
的顺序.)
print(
element.reduceByKey(lambda u, v: (0,u[1]+v[1]))
.map(lambda x: (x[0], x[1][1]))
.collect()
)
[('A', 50), ('5', 10)]
注释
- 每个键至少有2条记录,使用
add_tuple_values()
将为您提供正确的输出.
I'm getting an error "int object is unsubscriptable" while executing the following script :
element.reduceByKey( lambda x , y : x[1]+y[1])
with element is an key-value RDD and the value is a tuple. Example input:
(A, (toto , 10))
(A, (titi , 30))
(5, (tata, 10))
(A, (toto, 10))
I understand that the reduceByKey
function takes (K,V) tuples and apply a function on all the values to get the final result of the reduce.
Like the example given in ReduceByKey Apache.
Any help please?
Here is an example that will illustrate what's going on.
Let's consider what happens when you call reduce
on a list with some function f
:
reduce(f, [a,b,c]) = f(f(a,b),c)
If we take your example, f = lambda u, v: u[1] + v[1]
, then the above expression breaks down into:
reduce(f, [a,b,c]) = f(f(a,b),c) = f(a[1]+b[1],c)
But a[1] + b[1]
is an integer so there is no __getitem__
method, hence your error.
In general, the better approach (as shown below) is to use map()
to first extract the data in the format that you want, and then apply reduceByKey()
.
A MCVE with your data
element = sc.parallelize(
[
('A', ('toto' , 10)),
('A', ('titi' , 30)),
('5', ('tata', 10)),
('A', ('toto', 10))
]
)
You can almost get your desired output with a more sophisticated reduce function:
def add_tuple_values(a, b):
try:
u = a[1]
except:
u = a
try:
v = b[1]
except:
v = b
return u + v
print(element.reduceByKey(add_tuple_values).collect())
Except that this results in:
[('A', 50), ('5', ('tata', 10))]
Why? Because there's only one value for the key '5'
, so there is nothing to reduce.
For these reasons, it's best to first call map
. To get your desired output, you could do:
>>> print(element.map(lambda x: (x[0], x[1][1])).reduceByKey(lambda u, v: u+v).collect())
[('A', 50), ('5', 10)]
Update 1
Here's one more approach:
You could create tuple
s in your reduce
function, and then call map
to extract the value you want. (Essentially reverse the order of map
and reduce
.)
print(
element.reduceByKey(lambda u, v: (0,u[1]+v[1]))
.map(lambda x: (x[0], x[1][1]))
.collect()
)
[('A', 50), ('5', 10)]
Notes
- Had there been at least 2 records for each key, using
add_tuple_values()
would have given you the correct output.
这篇关于使用reducebykey时出错:int对象不可订阅的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!