从宽到长的数据表转换,在列和行中具有变量 [英] wide to long data table transformation with variables in columns and rows
本文介绍了从宽到长的数据表转换,在列和行中具有变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个csv,其中有多个表,行和列中都存储有变量。
关于此csv:
I have a csv with multiple tables with variables stored in both rows and columns.
About this csv:
- 我想从宽到长
- 一个csv中有多个数据帧
- 每个数据帧的变量类型
> df3
V1 V2 V3 V4 V5 V6 V7 V8
1 nyc 123 main st month 1 2 3 4 5
2 nyc 123 main st x 58568 567567 567909 35876 56943
3 nyc 123 main st y 5345 3673 3453 3467 788
4 nyc 123 main st z 53223 563894 564456 32409 56155
5
6 la 63 main st month 1 2 3 4 5
7 la 63 main st a 87035 7467456 3363 863 43673
8 la 63 main st b 345 456 345 678 345
9 la 63 main st c 86690 7467000 3018 185 43328
10
11 sf 953 main st month 1 2 3 4 5
12 sf 953 main st x 457456 3455 345345 56457 3634
13 sf 953 main st b 5345 3673 3453 3467 788
14 sf 953 main st z 452111 -218 341892 52990 2846
> df4
18 city address month x y z a b c
19 nyc 123 main st 1 58568 5345 53223 null null null
20 nyc 123 main st 2 567567 3673 563894 null null null
21 nyc 123 main st 3 567909 3453 564456 null null null
22 nyc 123 main st 4 35876 3467 32409 null null null
23 nyc 123 main st 5 56943 788 56155 null null null
24 la 63 main st 1 null null null 87035 345 86690
25 la 63 main st 2 null null null 7467456 456 7467000
26 la 63 main st 3 null null null 3363 345 3018
27 la 63 main st 4 null null null 863 678 185
28 la 63 main st 5 null null null 43673 345 43328
29 sf 953 main st 1 457456 null 452111 null 5345 null
30 sf 953 main st 2 3455 null -218 null 3673 null
31 sf 953 main st 3 345345 null 341892 null 3453 null
32 sf 953 main st 4 56457 null 52990 null 3467 null
33 sf 953 main st 5 3634 null 2846 null 788 null
顶部是我拥有的数据,底部是我想要的转换。
The top is the data I have, the bottom is the transformation I want.
我最擅长R,但是我正在练习Python,所以任何方法都行。
I'm most comfortable in R but I'm practicing Python, so any approach works.
推荐答案
的df有正确的列名,请在读入数据后插入列名。
It would help first if you had proper column names for your df, please insert column names once you read in the data.
我使用了以下库, dplyr
和 stringr
进行此分析,并重命名了前三列:
I have use the following libraries, dplyr
and stringr
for this analysis and also renamed the first 3 columns:
df <- data.frame(stringsAsFactors=FALSE,
city = c("nyc", "nyc", "nyc"),
address = c("123 main st", "123 main st", "123 main st"),
month = c("x", "y", "z"),
X1 = c(58568L, 5345L, 53223L),
X2 = c(567567L, 3673L, 563894L),
X3 = c(567909L, 3453L, 564456L),
X4 = c(35876L, 3467L, 32409L),
X5 = c(56943L, 788L, 56155L)
)
df %>% gather(Type, Value, -c(city:month)) %>%
spread(month, Value) %>%
mutate(month = str_sub(Type, 2, 2)) %>%
select(-Type) %>%
select(c(city, address, month, x:z))
city address month x y z
1 nyc 123 main st 1 58568 5345 53223
2 nyc 123 main st 2 567567 3673 563894
3 nyc 123 main st 3 567909 3453 564456
4 nyc 123 main st 4 35876 3467 32409
5 nyc 123 main st 5 56943 788 56155
这篇关于从宽到长的数据表转换,在列和行中具有变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文