在R中,合并两个数据框,填空 [英] In R, Merge two data frames, fill down the blanks

查看:26
本文介绍了在R中,合并两个数据框,填空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这两个数据框:

Say I have these two data frames:

big.table <- data.frame("idx" = 1:100)

small.table <- data.frame("idx" = sample(1:100, 10), "color" = sample(colors(),10))

我想像这样将它们合并在一起:

I want to merge them together like this:

merge(small.table, big.table, by = "idx", all.y=TRUE)

idx           color
1     1            <NA>
2     2            <NA>
3     3         salmon2
4     4            <NA>
5     5            <NA>
6     6            <NA>
...
20   20            <NA>
21   21            <NA>
22   22           blue4
23   23          grey99
24   24            <NA>
25   25            <NA>
26   26            <NA>
...

现在我需要填充表格下方颜色"列中的值,以便将所有 NA 设置为表格中之前的值.

Now I need to fill the values in the 'color' column down the table so that all the NAs are set to values that come before in the table.

注意:该问题涉及从计算机程序生成的日志文件,而不是任何标准日志格式.此日志文件中的行块属于在块的第一行中标识的进程".我在日志文件的相关行中提取了信息,其中大部分属于一个进程,并创建了一个包含该信息(行号、时间戳等)的数据表.现在我需要在这个表中填写与 small.table 中每一行相对应的进程"名称,该表有一个行号.

NOTES: The problem involves a log file generated from a computer program, not in any standard log format. Blocks of lines in this log file belong to a 'process' that is identified in the first line of the block. I've pulled out information in the relevant lines of the log file, most of which belong to a process, and created a data table containing that information (the line number, time stamp, etc.). Now I need to fill into this table the 'process' names that correspond to each line from a small.table which has a line number.

对于 big.table 顶部的行,可能没有过程"(上例中的颜色).这些行应该保留为 NA.

There might not be a 'process' (color in the example above) for the lines at the top of the big.table. Those lines should remain NA.

一旦第一个进程"开始,该进程起始行和下一个进程之间的每一行都属于第一个进程.当第二个进程开始时,该进程起始行和下一个进程起始行之间的每一行都属于第二个进程.等等.流程行永远不会与我收集到日志文件数据框中的其他行相同.

Once the first 'process' starts, every line between that process start line and the next belongs to the first process. When the second process starts, every line between that process start line and the next process start line belongs to the second process. And so on. The process lines are never the same line number as the other lines that I've collected into my log file data frame.

我的计划是将 big.table 创建为所有日志行号的序列,并将小表合并到其中.然后我可以填写"进程名称并将大表合并到日志文件中,只保留日志文件,其中包含所有内容.

My plan is to create the big.table to be a sequence of all log line numbers and merge the small table to it. Then I can "fill down" the process name and merge the big table to the log file keeping only the log file with everything joined to it.

我愿意接受其他方法.

推荐答案

听起来你需要 na.locf 从包 zoo(代表最后一次观察结转)):

It sounds like you need na.locf from the package zoo (stands for last observation carried forward):

library(zoo)
tbl <- merge(small.table, big.table, by = "idx", all.y=TRUE)
tbl$color2 <- na.locf(tbl$color,na.rm = FALSE)

这篇关于在R中,合并两个数据框,填空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆