计数小于一行中的值的所有元素 [英] Count of all element less than the value in a row

查看:20
本文介绍了计数小于一行中的值的所有元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个数据框

value
-----
0.3
0.2
0.7
0.5

有没有一种方法可以为每一行构建一个列,该列包含该行中小于或等于行值的元素的计数?具体来说,

is there a way to build a column that contains, for each row, the count of the element in that row that are less or equal than the row value? Specifically,

value   count_less_equal
-------------------------
0.3     2
0.2     1
0.7     4
0.5     3

我可以按值列分组,但我不知道如何过滤行中小于该值的所有值.

I could groupBy the value column but I don't know how to filter all values in the row that are less that that value.

我在想,也许可以复制第一列,然后创建一个过滤器,以便为 col1 中的每个值找到 col2 中值的计数小于 col1 值.

I was thinking, maybe it's possible to duplicate the first column, then create a filter so that for each value in col1 one finds the count of values in col2 that are less than col1 value.

col1   col2
-------------------------
0.3     0.3
0.2     0.2
0.7     0.7
0.5     0.5

推荐答案

你可以使用 self join 并在 t1.user_id>=t2.user_id 上执行 join 来得到想要的结果.

you can use self join and perform join on t1.user_id>=t2.user_id to get the desired result.

    from pyspark.sql import SparkSession

    spark = SparkSession.builder \
        .appName('SO')\
        .getOrCreate()

    sc= spark.sparkContext

    df = sc.parallelize([
        ([0.3]), ([0.2]), ([0.7]), ([0.5])

    ]).toDF(["value"])

    df.show()

    # +-------+
    # |user_id|
    # +-------+
    # |    0.3|
    # |    0.2|
    # |    0.7|
    # |    0.5|
    # +-------+


    df.createTempView("table")

    spark.sql('select t1.value, count(*) as count from table t1 join table t2 on t1.value>=t2.value  group by t1.value order by value').show()

    # +-----+-----+
    # |value|count|
    # +-----+-----+
    # |  0.2|    1|
    # |  0.3|    2|
    # |  0.5|    3|
    # |  0.7|    4|
    # +-----+-----+

这篇关于计数小于一行中的值的所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆