从数据框列表中的同一列中查找所有重复值并将其转换为NULL [英] Find and convert to NULL all duplicated values from the same column in a list of dataframes

查看:45
本文介绍了从数据框列表中的同一列中查找所有重复值并将其转换为NULL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表 BELGIAN_COAST_list ,其中包含数百个15栏X 1000行的数据帧( df1 df2 ,...).这些数据框的最后一列称为 Chemicals ,其中包含一些字符,例如 Sulfate Ammonia .但是此列 Chemicals 的许多行在每个数据框中都重复(由于测量设备的技术问题).

I have a list BELGIAN_COAST_list containing hundreds of data frames (df1, df2, ...) of 15 columns X 1000 rows. The last column of each of these data frames is called Chemicals and contains some characters such as Sulfate or Ammonia. But many rows of this column Chemicals are duplicated within each dataframe (due to a technical issue with the measuring device).

我希望将重复的字符转换为 NULL ,以便它们在列表的每个df整列中仅出现一次.

I wish to convert the duplicated characters to NULL so that they just appear once in the entire column for each df of my list.

我试图取消列出我的 BELGIAN_COAST_list ,然后

BELGIAN_COAST$Chemicals[duplicated(BELGIAN_COAST$chemicals)] <- ""  

在这种情况下,字符在合并的整个数据框中只会出现一次.我希望它们在我的 BELGIAN_COAST_list 的每个数据帧( df1 $ Chemicals df2 $ Chemicals ,...)中出现一次.因此,我需要将数据保存在df列表中.

In this case, the characters will only appear once in the merged entire data frame. I want that they appear once in each dataframe (df1$Chemicals, df2$Chemicals, ...) of my BELGIAN_COAST_list. Therefore I need to keep my data in a list of df.

有人有什么主意吗?

推荐答案

在基础 R 中:

lapply(BELGIAN_COAST_list, function(x) {
  dups <- duplicated(x[, ncol(x)]) 
  x[dups, ncol(x)] <- NA_character_ 
  x})

这是在最后一列的位置完成的.如果要按名称调用该列,则可以将 ncol(x)更改为"Chemicals" .

This is done positionally, by the last column. If you want to call the column by name then you can change ncol(x) to "Chemicals".

使用 tidyverse :

library(tidyverse)

purrr::map(BELGIAN_COAST_list, ~ dplyr::mutate(., across(last_col(), ~ ifelse(duplicated(.), NA_character_, .))))

再次按列名进行调用,将 last_col()更改为 Chemicals :请注意此处缺少引号.

Again to call by column name change last_col() to Chemicals: note the lack of quotation marks here.

在任一情况下,如果 Chemicals 是数字,则使用 NA 代替 NA_character _ .

In either event, if Chemicals is numeric then use NA instead of NA_character_.

这篇关于从数据框列表中的同一列中查找所有重复值并将其转换为NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆