根据另一列的相应行值创建行的子集? [英] Creating Subsets of rows based upon the corresponding row value of another column?

查看:29
本文介绍了根据另一列的相应行值创建行的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

#让CSV包含两列年龄";和性别"其中:

#Let the CSV contain the two columns "Age" and "Gender" where:

  Age = [30, 24, 55, 61, 70, 21]

  Gender = [Male, Female, Male, Male, Male, Female]

#我希望它显示与 Gender=Male"对应的 Age 的所有值(以及值的数量).和女性"相同

#I want it to show me all the values (and the amount of the values) of Age that correspond to the Gender="Male" and the same for "Female"

  using DataFrames

#所以这就是我尝试的

julia> df= CSV.read(raw"Clocation)", DataFrame)
julia> df. Age
6-element Vector{Int64}:
30
24
55
61
70
21

#针对示例进行了调整

julia> df. Age, Gender
ERROR: UndefVarError: Gender not defined
Stacktrace:
 [1] top-level scope
   @ REPL[26]:1

#我想要的是'df.Age,Gender=Male',但这也不起作用,我真的被卡住了:(资料来源:https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/

#What I want is 'df.Age, Gender=Male', but this doesn't work either and I'm really stuck :( Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/

#有什么建议吗?先感谢您!#Edit:那我试试

#Any advice? Thank you in advance! # So then I try

julia> combine(groupby(df, :Age), :Gender=>"Male")
200×2 DataFrame
 Row │ Age    Male
     │ Int64  String7
─────┼────────────────
   1 │    18  Male
   2 │    18  Male
   3 │    18  Male
   4 │    18  Female
   5 │    19  Male
   6 │    19  Male
   7 │    19  Male
   8 │    19  Female
   9 │    19  Male
  10 │    19  Female
  11 │    19  Male
  12 │    19  Male
  13 │    20  Female
  14 │    20  Male
  15 │    20  Female
  16 │    20  Male
  17 │    20  Male
  18 │    21  Male
  19 │    21  Female
  20 │    21  Female
  21 │    21  Female
  22 │    21  Female
  23 │    22  Female
  24 │    22  Male
  25 │    22  Female
  26 │    23  Female
  27 │    23  Female
  28 │    23  Female
  ⋮  │   ⋮       ⋮
 173 │    57  Male
 174 │    57  Female
 175 │    58  Female
 176 │    58  Male
 177 │    59  Male
 178 │    59  Male
 179 │    59  Male
 180 │    59  Male
 181 │    60  Male
 182 │    60  Female
 183 │    60  Female
 184 │    63  Male
 185 │    63  Female
 186 │    64  Male
 187 │    65  Female
 188 │    65  Male
 189 │    66  Female
 190 │    66  Male
 191 │    67  Male
 192 │    67  Female
 193 │    67  Male
 194 │    67  Male
 195 │    68  Female
 196 │    68  Female
 197 │    68  Male
 198 │    69  Male
 199 │    70  Male
 200 │    70  Male
      144 rows omitted

#现在我很困惑资料来源:https://discourse.julialang.org/t/how-to-count-the-number-of-categories-present-in-a-column-of-a-dataframe/33244/3

#And now I'm just confused Source: https://discourse.julialang.org/t/how-to-count-the-number-of-categories-present-in-a-column-of-a-dataframe/33244/3

推荐答案

除了 jling 的答案是最简单的,这里还有其他选择.

Apart from the answer by jling which is a simplest one here are the alternatives.

使用groupby,您可以按分组列创建数据框行的划分:

Using groupby you can create a division of the rows of the data frame by the grouping columns:

julia> gdf = groupby(df, :Gender)
GroupedDataFrame with 2 groups based on key: Gender
First Group (4 rows): Gender = "Male"
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male
⋮
Last Group (2 rows): Gender = "Female"
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    24  Female
   2 │    21  Female

julia> gdf[("Male",)]
4×2 SubDataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> gdf[("Female",)]
2×2 SubDataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    24  Female
   2 │    21  Female

如果你只想要一个子集,你也可以使用 filtersubset(它们做类似的事情,但语法不同):

If you would want only one subset you can also use filter or subset (that do a similar thing but with a different syntax):

julia> filter(:Gender => ==("Male"), df)
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> subset(df, :Gender => ByRow(==("Male")))
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

最后你可以考虑使用 DataFramesMeta.jl,它可能更容易理解:

Finally you can consider using DataFramesMeta.jl that probably is a bit easier to understand:

julia> using DataFramesMeta

julia> @subset(df, :Gender .== "Male")
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

julia> @rsubset(df, :Gender == "Male") # "r" prefix stands for "row" so you do not need to broadcast the operation
4×2 DataFrame
 Row │ Age    Gender
     │ Int64  String
─────┼───────────────
   1 │    30  Male
   2 │    55  Male
   3 │    61  Male
   4 │    70  Male

这篇关于根据另一列的相应行值创建行的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆