Julia DataFrames.jl-使用NA过滤数据(NAException) [英] Julia DataFrames.jl - filter data with NA's (NAException)
问题描述
我不确定如何在Julia DataFrames中处理NA
.
I am not sure how to handle NA
within Julia DataFrames.
例如,使用以下DataFrame:
For example with the following DataFrame:
> import DataFrames
> a = DataFrames.@data([1, 2, 3, 4, 5]);
> b = DataFrames.@data([3, 4, 5, 6, NA]);
> ndf = DataFrames.DataFrame(a=a, b=b)
我可以在列:a
> ndf[ndf[:a] .== 4, :]
但是如果我在:b
上尝试相同的操作,则会收到错误NAException("cannot index an array with a DataArray containing NA values")
.
but if I try the same operation on :b
I get an error NAException("cannot index an array with a DataArray containing NA values")
.
> ndf[ndf[:b] .== 4, :]
NAException("cannot index an array with a DataArray containing NA values")
while loading In[108], in expression starting on line 1
in to_index at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:85
in getindex at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:210
in getindex at /Users/abisen/.julia/v0.3/DataFrames/src/dataframe/dataframe.jl:268
这是因为存在NA值.
Which is because of the presence of NA value.
我的问题是通常应如何处理带有NA
的DataFrame?我可以理解,针对NA
的>
或<
操作将为undefined
,但==
应该可以工作(否?).
My question is how should DataFrames with NA
should typically be handled? I can understand that >
or <
operation against NA
would be undefined
but ==
should work (no?).
推荐答案
您在此处的期望行为是什么?如果要进行这样的选择,则可以使条件(不是NAN)为AND(等于4).如果第一个测试失败,则第二个测试永远不会发生.
What's your desired behavior here? If you want to do selections like this you can make the condition (not a NAN) AND (equal to 4). If the first test fails then the second one never happens.
using DataFrames
a = @data([1, 2, 3, 4, 5]);
b = @data([3, 4, 5, 6, NA]);
ndf = DataFrame(a=a, b=b)
ndf[(!isna(ndf[:b]))&(ndf[:b].==4),:]
在某些情况下,您可能只想删除某些列中带有NA的所有行
In some cases you might just want to drop all rows with NAs in certain columns
ndf = ndf[!isna(ndf[:b]),:]
这篇关于Julia DataFrames.jl-使用NA过滤数据(NAException)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!