Julia DataFrames 中的高效自定义排序? [英] Efficient custom ordering in Julia DataFrames?

查看：17 发布时间：2022/1/23 20:08:03 dataframe julia

本文介绍了Julia DataFrames 中的高效自定义排序?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有一种快速的方法来为 Julia DataFrames 上的 sort/sort! 指定自定义顺序?

Is there a quick method for specifying a custom order to sort/sort! on Julia DataFrames?

julia> using DataFrames

julia> srand(1);

julia> df = DataFrame(x = rand(10), y = rand([:high, :med, :low], 10))
10×2 DataFrames.DataFrame
│ Row │ x          │ y    │
├─────┼────────────┼──────┤
│ 1   │ 0.236033   │ med  │
│ 2   │ 0.346517   │ high │
│ 3   │ 0.312707   │ high │
│ 4   │ 0.00790928 │ med  │
│ 5   │ 0.488613   │ med  │
│ 6   │ 0.210968   │ med  │
│ 7   │ 0.951916   │ low  │
│ 8   │ 0.999905   │ low  │
│ 9   │ 0.251662   │ high │
│ 10  │ 0.986666   │ med  │

julia> sort!(df, cols=[:y])
10×2 DataFrames.DataFrame
│ Row │ x          │ y    │
├─────┼────────────┼──────┤
│ 1   │ 0.346517   │ high │
│ 2   │ 0.312707   │ high │
│ 3   │ 0.251662   │ high │
│ 4   │ 0.951916   │ low  │
│ 5   │ 0.999905   │ low  │
│ 6   │ 0.236033   │ med  │
│ 7   │ 0.00790928 │ med  │
│ 8   │ 0.488613   │ med  │
│ 9   │ 0.210968   │ med  │
│ 10  │ 0.986666   │ med  │

我希望 y 列首先以 :low 排序，然后是 :med 和 :high.这样做的最佳方法是什么?我知道我可以做到以下几点:


I would like to have the y column ordered with :low first, followed by :med and :high. What would be the best way of doing this? I know I can do the following:
julia> subdfs = []
0-element Array{Any,1}

julia> for val in [:low, :med, :high]
           push!(subdfs, df[df[:y] .== val, :])
       end

julia> vcat(subdfs...)
10×2 DataFrames.DataFrame
│ Row │ x          │ y    │
├─────┼────────────┼──────┤
│ 1   │ 0.951916   │ low  │
│ 2   │ 0.999905   │ low  │
│ 3   │ 0.236033   │ med  │
│ 4   │ 0.00790928 │ med  │
│ 5   │ 0.488613   │ med  │
│ 6   │ 0.210968   │ med  │
│ 7   │ 0.986666   │ med  │
│ 8   │ 0.346517   │ high │
│ 9   │ 0.312707   │ high │
│ 10  │ 0.251662   │ high │

有没有办法在不分配内存的情况下做到这一点，因为在我的实际示例中，df 非常大?
Is there a way to do this without allocating memory since in my actual example, df is quite large?
推荐答案
可以定义比较函数:
lmhlt(x, y) = x == :low && y != :low || x == :med && y == :high

然后使用
sort!(df, lt=lmhlt)

但是，这仍然会分配内存.不过，它应该低于您当前的版本.
However, this still allocates memory. It should be less than your current version though.

                        这篇关于Julia DataFrames 中的高效自定义排序?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Julia DataFrames 中的高效自定义排序? [英] Efficient custom ordering in Julia DataFrames?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Julia DataFrames 中的高效自定义排序? [英] Efficient custom ordering in Julia DataFrames?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭