Julia DataFrames.jl - 使用 NA (NAException) 过滤数据 [英] Julia DataFrames.jl - filter data with NA's (NAException)
问题描述
我不确定如何在 Julia DataFrames 中处理 NA
.
I am not sure how to handle NA
within Julia DataFrames.
例如使用以下 DataFrame:
For example with the following DataFrame:
> import DataFrames
> a = DataFrames.@data([1, 2, 3, 4, 5]);
> b = DataFrames.@data([3, 4, 5, 6, NA]);
> ndf = DataFrames.DataFrame(a=a, b=b)
我可以成功对列:a
> ndf[ndf[:a] .== 4, :]
但如果我在 :b
上尝试相同的操作,我会收到错误 NAException("cannot index an array with a DataArray contains NA values")
.
but if I try the same operation on :b
I get an error NAException("cannot index an array with a DataArray containing NA values")
.
> ndf[ndf[:b] .== 4, :]
NAException("cannot index an array with a DataArray containing NA values")
while loading In[108], in expression starting on line 1
in to_index at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:85
in getindex at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:210
in getindex at /Users/abisen/.julia/v0.3/DataFrames/src/dataframe/dataframe.jl:268
这是因为存在 NA 值.
Which is because of the presence of NA value.
我的问题是通常应该如何处理带有 NA
的 DataFrames?我可以理解 >
或 <
对 NA
的操作将是 undefined
但 ==
应该可以工作(不是吗?).
My question is how should DataFrames with NA
should typically be handled? I can understand that >
or <
operation against NA
would be undefined
but ==
should work (no?).
推荐答案
你想要的行为是什么?如果您想做这样的选择,您可以设置条件(不是 NAN)AND(等于 4).如果第一个测试失败,那么第二个测试永远不会发生.
What's your desired behavior here? If you want to do selections like this you can make the condition (not a NAN) AND (equal to 4). If the first test fails then the second one never happens.
using DataFrames
a = @data([1, 2, 3, 4, 5]);
b = @data([3, 4, 5, 6, NA]);
ndf = DataFrame(a=a, b=b)
ndf[(!isna(ndf[:b]))&(ndf[:b].==4),:]
在某些情况下,您可能只想删除某些列中带有 NA 的所有行
In some cases you might just want to drop all rows with NAs in certain columns
ndf = ndf[!isna(ndf[:b]),:]
这篇关于Julia DataFrames.jl - 使用 NA (NAException) 过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!