在data.frame中具有多个值的列 [英] column with multiple values in data.frame

查看:161
本文介绍了在data.frame中具有多个值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在R中创建一个data.frame,其中一些列具有多个值(所有行的变量相同)。例如,这里是一个包含两列(汽车和价格)的数据框,请注意,每列的列价格都有三个值。

 汽车价格

F 1000,2000,3000

GM 2000,500,1000

第二个问题:



现在我想对价格列中的每个值应用相同的功能,我该怎么办那?假设我想创建另一个列,价格列的值翻倍。

解决方案

data.frame s只是列表,因此也可以是列表 code>列表 s。

  cars<  -  c(FORD,GM)
price< - list(c 1000,2000,3000),c(2000,500,1000))
myDF< - data.frame(cars = cars,price = cbind(price))

myDF
#汽车价格
#1 FORD 1000,2000,3000
#2 GM 2000,500,1000






然后执行给定行中价格的所有值的函数:


 #一次执行所有价格
表示(unlist(myDF $ price))
#[1] 1583.333

#在每行每个PRICES集上执行:
lapply(myDF $ price,mean)
#[[1]]
#[1] 2000

#[[2]]
#[1] 1166.667



< hr>

就是说,我会推荐这种方法。



它变得很麻烦,通常有更好的方法来完成同样的目标



另一种方法是简单地使用价格表作为数据集,并根据汽车列命名元素:

 名称(价格)<  - 汽车
价格
#$ FORD
#[1] 1000 2000 3000

#$ GM
#[1] 2000 500 1000

在这种情况下,您的* ply语句将具有已经分配给他们的汽车的名称,并且稍后会打字:

  lapply(price,意思是
#$ FORD
#[1] 2000

#$ GM
#[1] 1166.667






Al替代方法是使用一个长的 data.frame data.table

 #转换为长: 
myDF< - data.frame(cars= rep(cars,times = lapply(price,length)),price= unlist(price,use.names = FALSE))
myDF

然后你可以使用by参数来执行所有价格的函数在一组中:

  by(data = myDF $ price,INDICIES = myDF $ cars,FUN = mean)

#或使用:
with(myDF,by(price,cars,mean))

这是一样的方法,但是使用 data.table (其内置了)

  library(data.table)
myDT < - data.table(myDF,key =cars)
myDT [,mean(price),by = cars]

#cars V1
#1:FORD 1501.250
#2:GM 1166.667


I would like to make a data.frame in R with some columns having multiple values (same number of variables for all rows). For example, here is a data frame with two columns (cars and price), note that column price has three values for each row.

cars price

F    1000,2000,3000

GM   2000, 500, 1000

The second question:

Now I want to apply the same function to each value in the price column, how can I do that? Let's say I want to create another column with doubled values of price column.

解决方案

data.frames are simply lists, and as such, they can also be lists of lists.

cars <- c("FORD", "GM")
price  <- list( c(1000, 2000, 3000),  c(2000, 500, 1000))
myDF <- data.frame(cars=cars, price=cbind(price))

myDF
#    cars            price
#  1 FORD 1000, 2000, 3000
#  2   GM  2000, 500, 1000


then to execute a function on all values of price in a given row:

# execute on ALL PRICES at once
mean(unlist(myDF$price))
#  [1] 1583.333

# execute on each set of PRICES per row: 
lapply(myDF$price, mean)
#  [[1]]
#  [1] 2000 
#    
#  [[2]]
#  [1] 1166.667


That being said, I would recomend against this approach.

It gets cummbersome and there are usually better ways to accomplish the same goal.

One alternate method is to simply use the price list as your dataset and name the elemens according to the cars column:

names(price) <- cars
price
#  $FORD
#  [1] 1000 2000 3000
#    
#  $GM
#  [1] 2000  500 1000

In this case, your *ply statements would have the names of the cars already assigned to them and it would be slightly less typing:

lapply(price, mean)
#  $FORD
#  [1] 2000
#  
#  $GM
#  [1] 1166.667


Al alternate method is to use a long data.frame or data.table:

# transforming to long: 
myDF <- data.frame("cars"=rep(cars, times=lapply(price, length)), "price"=unlist(price, use.names=FALSE))
myDF

Then you can use the by argument to execute functions across all prices in a group:

by(data=myDF$price, INDICIES=myDF$cars, FUN=mean)

# or using with:
with(myDF, by(price, cars, mean))

Here is the same approach, but using data.table (which has by built in)

library(data.table)
myDT <- data.table(myDF, key="cars")
myDT[, mean(price), by=cars]

#     cars       V1
#  1: FORD 1501.250
#  2:   GM 1166.667

这篇关于在data.frame中具有多个值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆