在R中,合并两个数据框,填充空白 [英] In R, Merge two data frames, fill down the blanks

查看:447
本文介绍了在R中,合并两个数据框,填充空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有以下两个数据帧:

Say I have these two data frames:

big.table <- data.frame("idx" = 1:100)

small.table <- data.frame("idx" = sample(1:100, 10), "color" = sample(colors(),10))

我想像这样将它们合并在一起

I want to merge them together like this:

merge(small.table, big.table, by = "idx", all.y=TRUE)

idx           color
1     1            <NA>
2     2            <NA>
3     3         salmon2
4     4            <NA>
5     5            <NA>
6     6            <NA>
...
20   20            <NA>
21   21            <NA>
22   22           blue4
23   23          grey99
24   24            <NA>
25   25            <NA>
26   26            <NA>
...

现在,我需要在表格的颜色"列中填充值,以便将所有NA设置为表格中之前的值.

Now I need to fill the values in the 'color' column down the table so that all the NAs are set to values that come before in the table.

注意: 问题涉及从计算机程序生成的日志文件,而不是任何标准日志格式.此日志文件中的行块属于过程",该过程"在该块的第一行中标识.我已经在日志文件的相关行中提取了信息,其中大多数都属于一个进程,并创建了一个包含该信息的数据表(行号,时间戳等).现在,我需要在该表中填写与具有行号的small.table中的每一行相对应的进程"名称.

NOTES: The problem involves a log file generated from a computer program, not in any standard log format. Blocks of lines in this log file belong to a 'process' that is identified in the first line of the block. I've pulled out information in the relevant lines of the log file, most of which belong to a process, and created a data table containing that information (the line number, time stamp, etc.). Now I need to fill into this table the 'process' names that correspond to each line from a small.table which has a line number.

big.table顶部的行可能没有过程"(在上面的示例中为颜色).这些行应保持不适用.

There might not be a 'process' (color in the example above) for the lines at the top of the big.table. Those lines should remain NA.

一旦第一个进程"开始,该进程的起始行和下一个起始行之间的每一行都属于第一个进程.当第二个处理开始时,该处理开始线和下一个处理开始线之间的每一行都属于第二个处理.等等.处理行永远不会与我收集到日志文件数据框中的其他行相同.

Once the first 'process' starts, every line between that process start line and the next belongs to the first process. When the second process starts, every line between that process start line and the next process start line belongs to the second process. And so on. The process lines are never the same line number as the other lines that I've collected into my log file data frame.

我的计划是将big.table创建为所有日志行号的序列,并将小表合并到其中.然后,我可以填写"进程名称,并将大表合并到日志文件中,从而仅保留所有已连接的日志文件.

My plan is to create the big.table to be a sequence of all log line numbers and merge the small table to it. Then I can "fill down" the process name and merge the big table to the log file keeping only the log file with everything joined to it.

我愿意接受其他方法.

推荐答案

听起来您需要 zoo 包中的na.locf(代表上一次进行的观察):

It sounds like you need na.locf from the package zoo (stands for last observation carried forward):

library(zoo)
tbl <- merge(small.table, big.table, by = "idx", all.y=TRUE)
tbl$color2 <- na.locf(tbl$color,na.rm = FALSE)

这篇关于在R中,合并两个数据框,填充空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆