插入“空” R中数据帧中的行(填满) [英] Insert "empty" rows (filling up) in data frame in R

查看:151
本文介绍了插入“空” R中数据帧中的行(填满)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题解决,解决方案添加在发布的底部!

我想知道如何填写一个数据框在现有行之间插入行(不追加到最后)。

I'd like to know how to "fill" a data frame by inserting rows in between existing rows (not appending to the end).

我的情况如下:


  • 我有一个数据集约1700个案例和650个变量

  • 某些变量有可能的答案类别从0到100(问题是:有多少百分比... - >人们可以从0到100填写)

  • 现在我想显示一个变量的分布(让我们称之为 var ) geom_area()。

  • I have a data set with about 1700 cases and 650 variables
  • Certain variables have possible answer categories from 0 to 100 (question was: "How many percent..." -> people could fill in from 0 to 100)
  • Now I want to show the distribution of one of those variables (let's call it var) in a geom_area().

问题:

1)我需要一个X -axis范围从0到100

1) I need an X-axis ranging from 0 to 100

2)选择了 var 中的所有可能的百分比值,例如我有30次答案 20%,但没有答案19%。对于x轴,这意味着x位置19的y值为0,x位置20处的y值为30。

2) Not all possible percentage values in var were chosen, for instance I have 30 times the answer "20%", but no answer "19%". For the x-Axis this means, the y-Value at x-position 19 is "0", the y-value at x-position 20 is "30".

为了准备我的数据(这个变量),用ggplot绘制它,我通过表函数来转换它:

To prepare my data (this one variable) for plotting it with ggplot, I transformend it via the table function:

dummy <- as.data.frame(table(var))

现在我有一列Var1答案类别和列Freq与每个答案类别的计数。

Now I have a column "Var1" with the answer categories and a column "Freq" with the counts of each answer categorie.

总共有57行,这意味着44个可能的答案(来自0〜100%)没有说明。

In total, I have 57 rows, which means that 44 possible answers (values from 0 to 100 percent) were not stated.

我的数据框的例子,Var1包含给定的答案Freq的计数:

Example (of my dataframe), "Var1" contains the given answers, "Freq" the counts:

     Var1 Freq
1     0    1
2     1   16
3     2   32
4     3   44
5     4   14
...
15   14    1
16   15  169 # <-- See next row and look at "Var1"
17   17    2 # <-- "16%" was never given as answer

现在我的问题是:如何创建一个新的数据帧,在第16行之后插入一行(Var1= 15),我可以将Var1设置为16,将Freq设置为0?

Now my question is: How can I create a new data frame which inserts a row after row 16 (with "Var1"=15) where I can set "Var1" to 16 and "Freq" to 0?

     Var1 Freq
...
15   14    1
16   15  169
17   16    0 # <-- This line I like to insert
18   17    2

我已经尝试过这样的东西:

I've already tried something like this:

dummy_x <- NULL
dummy_y <- NULL

for (k in 0:100) {
  pos <- which(dummy$Var1==k)
  if (!is.null(pos)) {
    dummy_x <- rbind(dummy_x, c(k))
    dummy_y <- rbind(dummy_y, dummy$Freq[pos])
  }
  else {
    dummy_x <- rbind(dummy_x, c(k))
    dummy_y <- rbind(dummy_y, 0)
  }
}

newdataframe <- data.frame(cbind(dummy_x), cbind(dummy_y))

这会导致dummy_x有101个值(从0到101,正确)的错误,但dummy_y只包含56行?

which results in the error that dummy_x has 101 values (from 0 to 101, correct), but dummy_y only contains 56 rows?

T他的结果应该如下所示:

The result should be plotted like this:

plot(ggplot(newdataframe, aes(x=Var1, y=Freq)) +
   geom_area(fill=barcolors, alpha=0.3) +
   geom_line() +
   labs(title=fragetitel, x=NULL, y=NULL))

提前感谢
丹尼尔

Thanks in advance, Daniel

这个问题

plotFreq <- function(var, ftitle=NULL, fcolor="blue") {
# create data frame from frequency table of var
# to get answer categorie and counts in separate columns
dummyf <- as.data.frame(table(var))
# rename to "x-axis" and "y-axis"
names(dummyf) <- c("xa", "ya")
# transform $xa from factor to numeric
dummyf$xa <- as.numeric(as.character(dummyf$xa))
# get maximum x-value for graph
maxval <- max(dummyf$xa)
# Create a vector of zeros 
frq <- rep(0,maxval)
# Replace the values in freq for those indices which equal dummyf$xa
# by dummyf$ya so that remaining indices are ones which you 
# intended to insert 
frq[dummyf$xa] <- dummyf$ya
# create new data frame
newdf <- as.data.frame(cbind(var = 1:maxval, frq))
# print plot
ggplot(newdf, aes(x=var, y=frq)) +
  # fill area
  geom_area(fill=fcolor, alpha=0.3) +
  # outline
  geom_line() +
  # no additional labels on x- and y-axis
  labs(title=ftitle, x=NULL, y=NULL)
}


推荐答案

我认为这是一个更简单的解决方案。循环是没有必要的。想法是创建所需结果大小的向量,所有值都设置为零,然后从频率表中替换非零值的适当值。

I think this is much simpler solution. Looping is not necessary. Idea is to create a vector of size of desired result, with all values set to zero and then replace appropriate value with non zero values from frequency table.

> #Let's create sample data
> set.seed(12345)
> var <- sample(100, replace=TRUE)
> 
> 
> #Lets create frequency table
> x <- as.data.frame(table(var))
> x$var <- as.numeric(as.character(x$var))
> head(x)
  var Freq
1   1    3
2   2    1
3   4    1
4   5    2
5   6    1
6   7    2
> #Create a vector of 0s 
> freq <- rep(0, 100)
> #Replace the values in freq for those indices which equal x$var  by x$Freq so that remaining 
> #indices are ones which you intended to insert 
> freq[x$var] <- x$Freq
> head(freq)
[1] 3 1 0 1 2 1
> #cbind data together 
> freqdf <- as.data.frame(cbind(var = 1:100, freq))
> head(freqdf)
  var freq
1   1    3
2   2    1
3   3    0
4   4    1
5   5    2
6   6    1

这篇关于插入“空” R中数据帧中的行(填满)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆