根据另一列的相应行值创建行的子集? [英] Creating Subsets of rows based upon the corresponding row value of another column?
问题描述
#让CSV包含两列年龄";和性别"其中:
#Let the CSV contain the two columns "Age" and "Gender" where:
Age = [30, 24, 55, 61, 70, 21]
Gender = [Male, Female, Male, Male, Male, Female]
#我希望它显示与 Gender=Male"对应的 Age 的所有值(以及值的数量).和女性"相同
#I want it to show me all the values (and the amount of the values) of Age that correspond to the Gender="Male" and the same for "Female"
using DataFrames
#所以这就是我尝试的
julia> df= CSV.read(raw"Clocation)", DataFrame)
julia> df. Age
6-element Vector{Int64}:
30
24
55
61
70
21
#针对示例进行了调整
julia> df. Age, Gender
ERROR: UndefVarError: Gender not defined
Stacktrace:
[1] top-level scope
@ REPL[26]:1
#我想要的是'df.Age,Gender=Male',但这也不起作用,我真的被卡住了:(资料来源:https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/
#What I want is 'df.Age, Gender=Male', but this doesn't work either and I'm really stuck :( Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/
#有什么建议吗?先感谢您!#Edit:那我试试
#Any advice? Thank you in advance! # So then I try
julia> combine(groupby(df, :Age), :Gender=>"Male")
200×2 DataFrame
Row │ Age Male
│ Int64 String7
─────┼────────────────
1 │ 18 Male
2 │ 18 Male
3 │ 18 Male
4 │ 18 Female
5 │ 19 Male
6 │ 19 Male
7 │ 19 Male
8 │ 19 Female
9 │ 19 Male
10 │ 19 Female
11 │ 19 Male
12 │ 19 Male
13 │ 20 Female
14 │ 20 Male
15 │ 20 Female
16 │ 20 Male
17 │ 20 Male
18 │ 21 Male
19 │ 21 Female
20 │ 21 Female
21 │ 21 Female
22 │ 21 Female
23 │ 22 Female
24 │ 22 Male
25 │ 22 Female
26 │ 23 Female
27 │ 23 Female
28 │ 23 Female
⋮ │ ⋮ ⋮
173 │ 57 Male
174 │ 57 Female
175 │ 58 Female
176 │ 58 Male
177 │ 59 Male
178 │ 59 Male
179 │ 59 Male
180 │ 59 Male
181 │ 60 Male
182 │ 60 Female
183 │ 60 Female
184 │ 63 Male
185 │ 63 Female
186 │ 64 Male
187 │ 65 Female
188 │ 65 Male
189 │ 66 Female
190 │ 66 Male
191 │ 67 Male
192 │ 67 Female
193 │ 67 Male
194 │ 67 Male
195 │ 68 Female
196 │ 68 Female
197 │ 68 Male
198 │ 69 Male
199 │ 70 Male
200 │ 70 Male
144 rows omitted
#And now I'm just confused Source: https://discourse.julialang.org/t/how-to-count-the-number-of-categories-present-in-a-column-of-a-dataframe/33244/3
推荐答案
除了 jling 的答案是最简单的,这里还有其他选择.
Apart from the answer by jling which is a simplest one here are the alternatives.
使用groupby
,您可以按分组列创建数据框行的划分:
Using groupby
you can create a division of the rows of the data frame by the grouping columns:
julia> gdf = groupby(df, :Gender)
GroupedDataFrame with 2 groups based on key: Gender
First Group (4 rows): Gender = "Male"
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
⋮
Last Group (2 rows): Gender = "Female"
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 24 Female
2 │ 21 Female
julia> gdf[("Male",)]
4×2 SubDataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
julia> gdf[("Female",)]
2×2 SubDataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 24 Female
2 │ 21 Female
如果你只想要一个子集,你也可以使用 filter
或 subset
(它们做类似的事情,但语法不同):
If you would want only one subset you can also use filter
or subset
(that do a similar thing but with a different syntax):
julia> filter(:Gender => ==("Male"), df)
4×2 DataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
julia> subset(df, :Gender => ByRow(==("Male")))
4×2 DataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
最后你可以考虑使用 DataFramesMeta.jl,它可能更容易理解:
Finally you can consider using DataFramesMeta.jl that probably is a bit easier to understand:
julia> using DataFramesMeta
julia> @subset(df, :Gender .== "Male")
4×2 DataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
julia> @rsubset(df, :Gender == "Male") # "r" prefix stands for "row" so you do not need to broadcast the operation
4×2 DataFrame
Row │ Age Gender
│ Int64 String
─────┼───────────────
1 │ 30 Male
2 │ 55 Male
3 │ 61 Male
4 │ 70 Male
这篇关于根据另一列的相应行值创建行的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!