如何计算列中的值并将它们与特定行匹配? [英] How do I count values in a column and match them with a specific row?

查看:53
本文介绍了如何计算列中的值并将它们与特定行匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集,其中 ID 和电子邮件对应于一个唯一的人.其余列代表由该人/行命名的人.例如,ID 为 1 且电子邮件地址为 alex@gmail.com 的人在被问到问题时名为 Pete、Jane 和 Tim.

I have the dataset that looks like this, where ID and emails correspond to a unique person. The remaining columns represent people named by that person/row. For example, a person with ID 1 and email address alex@gmail.com named Pete, Jane, and Tim when asked a question.

id email          john_b alex_a pete jane tim
1  alex@gmail.com NA     NA     1    1    1
2  pete@yahoo.com NA     1      1    NA   NA
3  jane@q.com     NA     NA     1    NA   1
4  bea@mail.co    NA     1      1    NA   NA
5  tim@q.com      NA     NA     1    NA   1

我需要新数据集看起来像这样,其中新的列提名表示该人/行在数据集的其余部分中被命名的次数.例如,Pete 被 5 人命名,并在提名列中的相关电子邮件地址行中获得 5 名.Jane 曾被提名过一次(由 alex@gmail.com 命名),并在提名栏中的 Jane 电子邮件地址行获得 1 分.

I need the new dataset to look like this, where a new column nomination represents the number of times that person/row was named in the rest of the dataset. For example, Pete was named by 5 people and gets 5 in the nomination column, on the row with the relevant email address. Jane was named once (by alex@gmail.com) and gets 1 in the nomination column, on the row with Jane's email address.

id email          john_b alex_a pete jane tim nomination
1  alex@gmail.com NA     NA     1    1    1   0 
2  pete@yahoo.com NA     1      1    NA   NA  5
3  jane@q.com     NA     NA     1    NA   1   1
4  bea@mail.co    NA     1      1    NA   NA  0
5  tim@q.com      NA     NA     1    NA   1   3

我有一种感觉,我需要在这里结合使用 case-when 和 grepl,但我无法理解它.

I have a sense that I need a combination of case-when and grepl here, but can't wrap my head around it.

感谢您的帮助!

推荐答案

我终于想出了一个代码,希望能让你达到你所期望的.但是,我想不出任何方法将 bea@mail.cojohn_b 匹配.这肯定需要比我更聪明的头脑,但如果我能想到什么,我会在这里更新我的代码:

Hi I finally came up with a code that I hope to get you to what you expect. However, I could not think of any way to match bea@mail.co to john_b. It takes a mind far more brighter than mine for sure but if I could think of anything, I would update my codes here:

library(dplyr)
library(tidyr)
library(stringr)

df <- tribble(
 ~email,          ~john_b, ~alex_a, ~pete, ~jane, ~tim,
  "alex@gmail.com", NA,     NA,     1,    1,    1,
  "pete@yahoo.com", NA ,    1,      1,    NA,   NA,
  "jane@q.com",     NA  ,   NA,     1,    NA,   1,
  "bea@mail.co",    NA,     1,      1,    NA,   NA,
  "tim@q.com",      NA ,    NA,     1,    NA,   1
)

# First we count the number of times each person is named
nm <- df %>%
  summarise(across(john_b:tim, ~ sum(.x, na.rm = TRUE))) %>%
  pivot_longer(everything(), names_to = "names", values_to = "nominations")
nm

# A tibble: 5 x 2
  names  nominations
  <chr>        <dbl>
1 john_b           0
2 alex_a           2
3 pete             5
4 jane             1
5 tim              3

然后我们尝试将每个姓名与其对应的电子邮件部分匹配.这里唯一的问题是我之前提到的 john_b.

Then we try to partially match every names with their corresponding emails. Here the only problem is john_b as I mentioned before.

nm2 <- nm %>%
  rowwise() %>%
  mutate(emails = map(names, ~ df$email[str_detect(df$email, str_sub(.x, 1L, 4L))])) %>%
  unnest(cols = c(emails))

nm2

# A tibble: 4 x 3
  names  nominations emails        
  <chr>        <dbl> <chr>         
1 alex_a           2 alex@gmail.com
2 pete             5 pete@yahoo.com
3 jane             1 jane@q.com    
4 tim              3 tim@q.com  

最后我们通过emails连接这两个数据框:

And in the end we join these two data frames by emails:

df %>%
  full_join(nm2, by = c("email" = "emails"))

# A tibble: 5 x 8
  email          john_b alex_a  pete  jane   tim names  nominations
  <chr>          <lgl>   <dbl> <dbl> <dbl> <dbl> <chr>        <dbl>
1 alex@gmail.com NA         NA     1     1     1 alex_a           2
2 pete@yahoo.com NA          1     1    NA    NA pete             5
3 jane@q.com     NA         NA     1    NA     1 jane             1
4 bea@mail.co    NA          1     1    NA    NA NA              NA
5 tim@q.com      NA         NA     1    NA     1 tim              3

如果您愿意,也可以省略 names 列.我只是将其保留下来,以便您可以将它们放在一起进行比较.如果您可以对 john 的电子邮件进行一些修改,它们就会完美匹配.

You can also omit the column names if you like. I just leave it their so that you can compare them together. If you could make some modification on john's email they would have perfectly matched.

这篇关于如何计算列中的值并将它们与特定行匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆