使用reducebykey时出错:int对象不可订阅 [英] Error using reducebykey: int object is unsubscriptable

查看：171 发布时间：2020/9/4 3:25:19 python apache-spark pyspark

本文介绍了使用reducebykey时出错:int对象不可订阅的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在执行以下脚本时，出现错误"int对象不可订阅" :

element.reduceByKey( lambda x , y : x[1]+y[1])

with元素是键值RDD，值是元组.输入示例:

(A, (toto , 10))
(A, (titi , 30))
(5, (tata, 10))
(A, (toto, 10))

我知道reduceByKey函数采用(K，V)元组，并对所有值应用一个函数以获得减法的最终结果. 类似于 ReduceByKey Apache 中给出的示例. >

请帮忙吗?

解决方案

下面是一个示例，将说明正在发生的事情.

让我们考虑一下，当您在使用某些功能f的列表上调用reduce时会发生什么:

reduce(f, [a,b,c]) = f(f(a,b),c)

如果以您的示例f = lambda u, v: u[1] + v[1]为例，那么上面的表达式将分解为:

reduce(f, [a,b,c]) = f(f(a,b),c) = f(a[1]+b[1],c)

但是a[1] + b[1]是整数，所以没有__getitem__方法，因此会出错.

通常，更好的方法(如下所示)是使用map()首先提取所需格式的数据，然后应用reduceByKey().

包含您的数据的MCVE

element = sc.parallelize(
    [
        ('A', ('toto' , 10)),
        ('A', ('titi' , 30)),
        ('5', ('tata', 10)),
        ('A', ('toto', 10))
    ]
)

您可以几乎使用更复杂的reduce函数来获得所需的输出:

def add_tuple_values(a, b):
    try:
        u = a[1]
    except:
        u = a
    try:
        v = b[1]
    except:
        v = b
    return u + v

print(element.reduceByKey(add_tuple_values).collect())

除了会导致:

[('A', 50), ('5', ('tata', 10))]

为什么?因为键'5'只有一个值，所以没有什么可减少的.

由于这些原因，最好先调用map.要获得所需的输出，可以执行以下操作:

>>> print(element.map(lambda x: (x[0], x[1][1])).reduceByKey(lambda u, v: u+v).collect())
[('A', 50), ('5', 10)]

更新1

这是另一种方法:

您可以在reduce函数中创建tuple，然后调用map提取所需的值. (基本上颠倒map和reduce的顺序.)

print(
    element.reduceByKey(lambda u, v: (0,u[1]+v[1]))
        .map(lambda x: (x[0], x[1][1]))
        .collect()
)
[('A', 50), ('5', 10)]

注释

每个键至少有2条记录，使用add_tuple_values()将为您提供正确的输出.

I'm getting an error "int object is unsubscriptable" while executing the following script :

element.reduceByKey( lambda x , y : x[1]+y[1])

with element is an key-value RDD and the value is a tuple. Example input:

(A, (toto , 10))
(A, (titi , 30))
(5, (tata, 10))
(A, (toto, 10))

I understand that the reduceByKey function takes (K,V) tuples and apply a function on all the values to get the final result of the reduce. Like the example given in ReduceByKey Apache.

Any help please?

解决方案

Here is an example that will illustrate what's going on.

Let's consider what happens when you call reduce on a list with some function f:

reduce(f, [a,b,c]) = f(f(a,b),c)

If we take your example, f = lambda u, v: u[1] + v[1], then the above expression breaks down into:

reduce(f, [a,b,c]) = f(f(a,b),c) = f(a[1]+b[1],c)

But a[1] + b[1] is an integer so there is no __getitem__ method, hence your error.

In general, the better approach (as shown below) is to use map() to first extract the data in the format that you want, and then apply reduceByKey().

A MCVE with your data

element = sc.parallelize(
    [
        ('A', ('toto' , 10)),
        ('A', ('titi' , 30)),
        ('5', ('tata', 10)),
        ('A', ('toto', 10))
    ]
)

You can almost get your desired output with a more sophisticated reduce function:

def add_tuple_values(a, b):
    try:
        u = a[1]
    except:
        u = a
    try:
        v = b[1]
    except:
        v = b
    return u + v

print(element.reduceByKey(add_tuple_values).collect())

Except that this results in:

[('A', 50), ('5', ('tata', 10))]

Why? Because there's only one value for the key '5', so there is nothing to reduce.

For these reasons, it's best to first call map. To get your desired output, you could do:

>>> print(element.map(lambda x: (x[0], x[1][1])).reduceByKey(lambda u, v: u+v).collect())
[('A', 50), ('5', 10)]

Update 1

Here's one more approach:

You could create tuples in your reduce function, and then call map to extract the value you want. (Essentially reverse the order of map and reduce.)

print(
    element.reduceByKey(lambda u, v: (0,u[1]+v[1]))
        .map(lambda x: (x[0], x[1][1]))
        .collect()
)
[('A', 50), ('5', 10)]

Notes

Had there been at least 2 records for each key, using add_tuple_values() would have given you the correct output.

这篇关于使用reducebykey时出错:int对象不可订阅的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用reducebykey时出错:int对象不可订阅 [英] Error using reducebykey: int object is unsubscriptable

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用reducebykey时出错:int对象不可订阅 [英] Error using reducebykey: int object is unsubscriptable

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭