朱莉娅 |数据框 |替换缺失值 [英] Julia | DataFrame | Replacing missing Values

查看:15
本文介绍了朱莉娅 |数据框 |替换缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于 DataFrame 中的列,我们如何用 0.0 替换 missing 值?

How can we replace missing values with 0.0 for a column in a DataFrame?

推荐答案

有几种不同的方法可以解决这个问题(适用于 Julia 1.x):

There are a few different approaches to this problem (valid for Julia 1.x):

可能最简单的方法是使用基础 Julia 中的 replace!replace.这是一个带有 replace! 的示例:

Probably the easiest approach is to use replace! or replace from base Julia. Here is an example with replace!:

julia> using DataFrames

julia> df = DataFrame(x = [1, missing, 3])
3×1 DataFrame
│ Row │ x       │
│     │ Int64⍰  │
├─────┼─────────┤
│ 1   │ 1       │
│ 2   │ missing │
│ 3   │ 3       │

julia> replace!(df.x, missing => 0);

julia> df
3×1 DataFrame
│ Row │ x      │
│     │ Int64⍰ │
├─────┼────────┤
│ 1   │ 1      │
│ 2   │ 0      │
│ 3   │ 3      │

但是,请注意,此时列 x 的类型仍然允许缺失值:

However, note that at this point the type of column x still allows missing values:

julia> typeof(df.x)
Array{Union{Missing, Int64},1}

当打印出数据帧时,这也由列 xInt64 后面的问号表示.您可以使用 disallowmissing! (来自 DataFrames.jl 包):

This is also indicated by the question mark following Int64 in column x when the data frame is printed out. You can change this by using disallowmissing! (from the DataFrames.jl package):

julia> disallowmissing!(df, :x)
3×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 0     │
│ 3   │ 3     │

或者,如果您按如下方式使用 replace(不带感叹号),则输出将已经不允许缺失值:

Alternatively, if you use replace (without the exclamation mark) as follows, then the output will already disallow missing values:

julia> df = DataFrame(x = [1, missing, 3]);

julia> df.x = replace(df.x, missing => 0);

julia> df
3×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 0     │
│ 3   │ 3     │

Base.ismissing 与逻辑索引

您可以使用带有逻辑索引的 ismissing 为数组中所有缺失的条目分配一个新值:

Base.ismissing with logical indexing

You can use ismissing with logical indexing to assign a new value to all missing entries of an array:

julia> df = DataFrame(x = [1, missing, 3]);

julia> df.x[ismissing.(df.x)] .= 0;

julia> df
3×1 DataFrame
│ Row │ x      │
│     │ Int64⍰ │
├─────┼────────┤
│ 1   │ 1      │
│ 2   │ 0      │
│ 3   │ 3      │

Base.coalesce

另一种方法是使用coalesce:

julia> df = DataFrame(x = [1, missing, 3]);

julia> df.x = coalesce.(df.x, 0);

julia> df
3×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 0     │
│ 3   │ 3     │

DataFramesMeta

replacecoalesce 都可以与 DataFramesMeta.jl 包:

DataFramesMeta

Both replace and coalesce can be used with the @transform macro from the DataFramesMeta.jl package:

julia> using DataFramesMeta

julia> df = DataFrame(x = [1, missing, 3]);

julia> @transform(df, x = replace(:x, missing => 0))
3×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 0     │
│ 3   │ 3     │

julia> df = DataFrame(x = [1, missing, 3]);

julia> @transform(df, x = coalesce.(:x, 0))
3×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 0     │
│ 3   │ 3     │

其他文档

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆