如何在不丢失R或VBA中唯一数据的情况下合并重复的行? [英] How do I combine duplicate rows without losing unique data in R or VBA?

查看:64
本文介绍了如何在不丢失R或VBA中唯一数据的情况下合并重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在excel中有一个表,其中有唯一的列,但有很多重复的行.重复项由"uniqueID"列衡量,该列是存储为字符串的电子邮件.行可以具有相同的uniqueID,但其他列中的数据丢失,或者同一列中的数据不同.

I have a table in excel where there's unique columns but many duplicate rows. Duplicates is measured by the column "uniqueID" which is an email stored as a string. Rows can have the same uniqueID but with missing data in other columns, or different data in the same column.

我希望能够合并和合并这些重复的行,例如,如果相同的uniqueID对字符串的响应相同,则将合并并连接在一起,以免数据丢失.所有数据都是字符串.

I want to be able to combine and merge these duplicate rows such if the same uniqueID has the same response for the strings will be combined and concatenated such that data won't be lost. All data are strings.

我已经在R和dplyr中尝试了Aggregate函数,但没有成功,主要是因为我仍然不确定这两个函数如何工作.

I've tried the Aggregate function in R and dplyr with but with no success, mostly because I'm still unsure of how those 2 functions work.

输入:

uniqueID, favFruits, favVeggie, State, favColor
john@mail.com, NULL, carrots, CA, Green
jill@mail.com, apples, NULL, FL, NULL
john@mail.com, grapes, beets, CA, Red
jill@mail.com, cherries, beans, FL, Blue
jill@mail.com, pineapple, beans, FL, Blue 
john@mail.com, grapes, beets, CA, Yellow

输出:

uniqueID, favFruits, favVeggie, State, favColor
john@mail.com, grapes, (carrots, beets), CA, (Green, Red, Yellow)
jill@mail.com, (apples, cherries, pineapple), beans, FL, Blue

注意:

在这种意义上,"NULL"只是一个空白的excel单元格.它没有命名为NULL或其他任何名称.完整的数据集共有〜30列和〜20000行.每列中的()"用来表示一个包含两个值的单元格,而不是在单元格中带有括号.

"NULL" in this sense is just a blank excel cell. It isn't named NULL or anything. Full dataset has ~30 columns total and ~20000 rows. The "()" in each column is there to signify one cell containing both values, rather than having parenthesis inside the cells.

推荐答案

我将采用Dave2e的答案,并将其更进一步,并删除所有像这样的NULL:

I would take Dave2e's answer and take it a step further and remove all the NULLs like this:

library(tidyverse)

input <- tibble::tribble(
          ~uniqueID,  ~favFruits, ~favVeggie, ~State, ~favColor,
    "john@mail.com",      "NULL",  "carrots",   "CA",   "Green",
    "jill@mail.com",    "apples",     "NULL",   "FL",    "NULL",
    "john@mail.com",    "grapes",    "beets",   "CA",     "Red",
    "jill@mail.com",  "cherries",    "beans",   "FL",    "Blue",
    "jill@mail.com", "pineapple",    "beans",   "FL",    "Blue",
    "john@mail.com",    "grapes",    "beets",   "CA",  "Yellow"
    )


output <- input %>% 
    mutate_all(list(~str_replace(., "NULL", NA_character_))) %>% 
    group_by(uniqueID) %>% 
    summarise_all(list(~toString(unique(na.omit(.)))))

output

# A tibble: 2 x 5
  uniqueID      favFruits                   favVeggie      State favColor          
  <chr>         <chr>                       <chr>          <chr> <chr>             
1 jill@mail.com apples, cherries, pineapple beans          FL    Blue              
2 john@mail.com grapes                      carrots, beets CA    Green, Red, Yellow

这篇关于如何在不丢失R或VBA中唯一数据的情况下合并重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆