计数小于一行中的值的所有元素 [英] Count of all element less than the value in a row
问题描述
给定一个数据框
value
-----
0.3
0.2
0.7
0.5
有没有一种方法可以为每一行构建一个列,该列包含该行中小于或等于行值的元素的计数?具体来说,
is there a way to build a column that contains, for each row, the count of the element in that row that are less or equal than the row value? Specifically,
value count_less_equal
-------------------------
0.3 2
0.2 1
0.7 4
0.5 3
我可以按值列分组,但我不知道如何过滤行中小于该值的所有值.
I could groupBy the value column but I don't know how to filter all values in the row that are less that that value.
我在想,也许可以复制第一列,然后创建一个过滤器,以便为 col1
中的每个值找到 col2
中值的计数小于 col1
值.
I was thinking, maybe it's possible to duplicate the first column, then create a filter so that for each value in col1
one finds the count of values in col2
that are less than col1
value.
col1 col2
-------------------------
0.3 0.3
0.2 0.2
0.7 0.7
0.5 0.5
推荐答案
你可以使用 self join 并在 t1.user_id>=t2.user_id
上执行 join 来得到想要的结果.
you can use self join and perform join on t1.user_id>=t2.user_id
to get the desired result.
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('SO')\
.getOrCreate()
sc= spark.sparkContext
df = sc.parallelize([
([0.3]), ([0.2]), ([0.7]), ([0.5])
]).toDF(["value"])
df.show()
# +-------+
# |user_id|
# +-------+
# | 0.3|
# | 0.2|
# | 0.7|
# | 0.5|
# +-------+
df.createTempView("table")
spark.sql('select t1.value, count(*) as count from table t1 join table t2 on t1.value>=t2.value group by t1.value order by value').show()
# +-----+-----+
# |value|count|
# +-----+-----+
# | 0.2| 1|
# | 0.3| 2|
# | 0.5| 3|
# | 0.7| 4|
# +-----+-----+
这篇关于计数小于一行中的值的所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!