选择列序列并创建变量 [英] Selecting column sequences and creating variables

查看:119
本文介绍了选择列序列并创建变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有方法通过序列选择特定的列,并从中创建新的变量。所以例如,如果我有8个列和n个观察结果,那么我如何创建4个依次选择2行的变量?我的数据集比这个大得多,我有1416个变量,每个都有62个观察(我已经粘贴到下面的电子表格的链接,第一列和第二行代表名称)。我想从这个命名为网站1-12创建新的数据框。所以现场1 = df [,1:117];网站2 = df [,119:237]等。



我正在计划将这个代码用于具有更多变量的未来数据集,所以某些形式的循环或序列函数如果任何人能够明白如何实现这一点,那么这个方法会非常有效吗?



https://www.dropbox.com/s/p1a5cu567lxntmw/MyData.csv?dl=0



提前谢谢。



James



ps @nrussell我已经复制并粘贴了下面提到的代码的输出,作为一系列数字,如显示。


dput(z [,1:10])
结构(list( 1 = c(0,0,0,0,0,0,0,0,0.0311410340342049,
0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,
0,0,0.0207444023791158,0,0,0,0,0,0,0,0,0,0,0 ,0,0.0312971643732546,
0,0,0,0,0,0,0,0,0,3.076287494579976,0,0,0,0,0,0
0,0),.. ....... 10 = c(0,0,0,0,0.119280313679916,
0,0,0.301029995663981,0,0,0, 0,0,0,0,0,0,0.7015681882079494,
0.136831816210901,0,0,0,0.036363632421801,0,0,0,0547327264843602,
0,0,0,0,0.0231561535126139, 0,0,0.0903089986991944,0,
0,0.0752574989159953,0.1599368821233872,0.0272640716982664,
0.0177076468037636,0,0,0.120411998265592,0,0,0,0,0.00322532138211408,
0.0250858329719984,0, 0,0,0.119280313679916,0,0.172922500085254,
0.225772496747986,0,0,0,0.0954242509439325,0)),.Names = c(1,
2 3,4,5,6,7,8,9,10),class =data.frame,row.names = c(NA,
-62L))



解决方案

我们可以 / code>数据集('df')与'1416'列相等的大小'118'列通过创建一个分组索引与 gl

  lst<  -  setNames(lapply(split(1:ncol(df))as.numeric(gl(ncol(df))118,
ncol(df)))),function(i)df [,i]),paste0('site',1:12))

或者您可以创建lst而不使用 split

  lst<  -  setNames(lapply(seq(1,ncol(df),by = 118),
function(i)df [i:(i + 117) ]),paste0('site',1:12))

如果我们需要创建12个数据集在全球环境中的对象 list2env 是一个选项(我宁愿在lst本身内工作)

  list2env(lst,envir = .GlobalEnv)

使用一个小数据集('df1')与'8'列

  lst1<  -  setNames(lapply(split(1:ncol(df1))as.numeric(gl(ncol(df1),
2,ncol(df1)))),函数(i)df1 [,i]),paste0('site',1:4))
list2env(lst1,envir = .GlobalEnv)

头(site1,3)
#V1 V2
#1 6 12
#2 4 7
#3 14 14

头(site4,3)
#V7 V8
#1 10 2
#2 5 4
#3 5 0
/ pre>

数据



  set.seed(24)
df1< - as.data.frame(matrix(sample(0:20,8 * 10,replace = TRUE),ncol = 8))


I was wondering if there was a way to select specific columns via a sequence and create new variables from this.

So for example, if I had 8 columns with n observations, how could I create 4 variables that selects 2 rows sequentially? My dataset is much larger than this and I have 1416 variables with 62 observations each (I have pasted a link to the spreadsheet below, whereby the first column and row represent names). I would like to create new dataframes from this named as sites 1-12. So site 1 = df[,1:117]; site 2 = df [,119:237] etc.

I am planning on using this code for future datasets with even more variables so some form of loop or sequence function would be very effective if anyone could shed any light on how to achieve this?

https://www.dropbox.com/s/p1a5cu567lxntmw/MyData.csv?dl=0

Thank you in advance.

James

p.s @nrussell I have copied and pasted the output of the code you mentioned below, it follows on as a series of numbers like those displayed.

dput(z[ , 1:10]) structure(list(1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0.0311410340342049, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0207444023791158, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0312971643732546, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0376287494579976, 0, 0, 0, 0, 0, 0, 0),......... 10 = c(0, 0, 0, 0, 0.119280313679916, 0, 0, 0.301029995663981, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.715681882079494, 0.136831816210901, 0, 0, 0, 0.0273663632421801, 0, 0, 0, 0.0547327264843602, 0, 0, 0, 0, 0.0231561535126139, 0, 0, 0.0903089986991944, 0, 0, 0.0752574989159953, 0.159368821233872, 0.0272640716982664, 0.0177076468037636, 0, 0, 0.120411998265592, 0, 0, 0, 0, 0.0322532138211408, 0.0250858329719984, 0, 0, 0, 0.119280313679916, 0, 0.172922500085254, 0.225772496747986, 0, 0, 0, 0.0954242509439325, 0)), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame", row.names = c(NA, -62L))

解决方案

We could split the dataset ('df') with '1416' columns to equal size '118' columns by creating a grouping index with gl

 lst <- setNames(lapply(split(1:ncol(df), as.numeric(gl(ncol(df), 118,
            ncol(df)))), function(i) df[,i]), paste0('site', 1:12))

Or you can create the 'lst' without using the split

 lst <- setNames(lapply(seq(1, ncol(df), by = 118), 
            function(i) df[i:(i+117)]), paste0('site', 1:12))

If we need to create 12 dataset objects in the global environment, list2env is an option (I would prefer to work within the 'lst' itself)

 list2env(lst, envir=.GlobalEnv)

Using a small dataset ('df1') with '8' columns

  lst1 <- setNames(lapply(split(1:ncol(df1), as.numeric(gl(ncol(df1), 
         2, ncol(df1)))), function(i) df1[,i]), paste0('site', 1:4))
  list2env(lst1, envir=.GlobalEnv)

  head(site1,3)
  #  V1 V2
  #1  6 12
  #2  4  7
  #3 14 14

 head(site4,3)
 #  V7 V8
 #1 10  2
 #2  5  4
 #3  5  0

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 8*10, replace=TRUE), ncol=8))

这篇关于选择列序列并创建变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆