tidyr:Pivot_wider 用数据类型替换值 [英] tidyr:Pivot_wider replace values with data type
问题描述
我有一个数据框,行和列中的变量都包含变量,因此我尝试使用数据透视宽整理数据.我的数据如下所示:
I have a data frame with variables in the rows and the columns that both contain variables, so I am trying to use pivot wide tidy the data. My data looks like the following:
head(df)
# A tibble: 6 x 4
State Year Var X
<chr> <dbl> <chr> <dbl>
1 ALABAMA 2001 APPALACHIAN REGIONAL COMMISSION (ARC) 3048031
2 ALABAMA 2001 CORPORATION FOR NATIONAL AND COMMUNITY SERVICE (CNCS) 1765835
3 ALABAMA 2001 DEPARTMENT OF AGRICULTURE (USDA) 282530429
4 ALABAMA 2001 DEPARTMENT OF COMMERCE (DOC) 17838084
5 ALABAMA 2001 DEPARTMENT OF DEFENSE (DOD) 21160159
6 ALABAMA 2001 DEPARTMENT OF EDUCATION (ED) 174634348
其中 state 是实体,Year 是时间维度,Var 是我尝试旋转的变量列表,X 是每个变量的值列表.当我使用以下代码时:
Where state is the entity, Year is the time dimension, Var is a list of the variables I am trying to pivot, and X is a list of values for each variable. When I use the following code:
library(tidyverse)
df %<>%
pivot_wider(names_from = Var, values_from = X)
R 返回一条警告消息,指出:
R returns a warning message stating that:
Warning message:
Values in `X` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(X = list)` to suppress this warning.
* Use `values_fn = list(X = length)` to identify where the duplicates arise
* Use `values_fn = list(X = summary_fun)` to summarise duplicates
我的数据用数据替换了所有的值,如下所示.
and my data replaces all of the values with the data, as shown below.
head(df)
# A tibble: 6 x 35
State Year `APPALACHIAN RE~ `CORPORATION FO~ `DEPARTMENT OF ~ `DEPARTMENT OF ~ `DEPARTMENT OF ~ `DEPARTMENT OF ~ `DEPARTMENT OF ~ `DEPARTMENT OF ~
<chr> <dbl> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>>
1 ALAB~ 2001 [1] [1] [1] [1] [1] [1] [1] [1]
2 ALAS~ 2001 [0] [1] [1] [1] [1] [1] [1] [1]
3 ARIZ~ 2001 [0] [1] [1] [1] [1] [1] [1] [1]
4 ARKA~ 2001 [0] [1] [1] [1] [1] [1] [1] [1]
5 CALI~ 2001 [0] [1] [1] [1] [1] [1] [1] [1]
6 COLO~ 2001 [0] [1] [1] [1] [1] [1] [1] [1]
# ... with 25 more variables: `DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT (HUD)` <list<dbl>>, `DEPARTMENT OF JUSTICE (DOJ)` <list<dbl>>, `DEPARTMENT OF
# LABOR (DOL)` <list<dbl>>, `DEPARTMENT OF THE INTERIOR (DOI)` <list<dbl>>, `DEPARTMENT OF TRANSPORTATION (DOT)` <list<dbl>>, `ENVIRONMENTAL PROTECTION
# AGENCY (EPA)` <list<dbl>>, `FEDERAL EMERGENCY MANAGEMENT AGENCY (FEMA)` <list<dbl>>, `INSTITUTE OF MUSEUM AND LIBRARY SERVICES (IMLS)` <list<dbl>>,
# `NATIONAL AERONAUTICS AND SPACE ADMINISTRATION (NASA)` <list<dbl>>, `NATIONAL ENDOWMENT FOR THE ARTS (NEA)` <list<dbl>>, `NATIONAL ENDOWMENT FOR THE
# HUMANITIES (NEH)` <list<dbl>>, `NATIONAL SCIENCE FOUNDATION (NSF)` <list<dbl>>, `SMALL BUSINESS ADMINISTRATION (SBA)` <list<dbl>>, `FEDERAL MEDIATION
# AND CONCILIATION SERVICE (FMCS)` <list<dbl>>, `NATIONAL ARCHIVES AND RECORDS ADMINISTRATION (NARA)` <list<dbl>>, `AGENCY FOR INTERNATIONAL DEVELOPMENT
# (USAID)` <list<dbl>>, `JAPAN-UNITED STATES FRIENDSHIP COMMISSION (JUSFC)` <list<dbl>>, `UNITED STATES INSTITUTE OF PEACE (USIP)` <list<dbl>>, `CORPS OF
# ENGINEERS - CIVIL WORKS (USACE)` <list<dbl>>, `DEPARTMENT OF STATE (DOS)` <list<dbl>>, `NATIONAL LABOR RELATIONS BOARD (NLRB)` <list<dbl>>, `NUCLEAR
# REGULATORY COMMISSION (NRC)` <list<dbl>>, `SOCIAL SECURITY ADMINISTRATION (SSA)` <list<dbl>>, `SELECTIVE SERVICE SYSTEM (SSS)` <list<dbl>>,
# `NA` <list<dbl>>
我想知道为什么要从枢轴中删除原始值,以及我能做些什么来阻止这种情况发生.
I am wondering why the original values are being erased from the pivot, and also what I can do to stop this from happening.
推荐答案
由于存在重复项,我们可能需要一个序列列.按 'State'、'Year'、'Var' 分组,使用 row_number()
创建一个序列列,然后应用 pivot_wider
We may need a sequence column as there are duplicates. Grouped by 'State', 'Year', 'Var', create a sequence column with row_number()
and then apply the pivot_wider
library(dplyr)
library(tidyr)
df %>%
group_by(State, Year, Var) %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = Var, values_from = X)
这篇关于tidyr:Pivot_wider 用数据类型替换值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!