如何使用R从列的每个单元格中删除重复的逗号分隔字符值 [英] How to remove duplicate comma separated character values from each cell of a column using R

查看:41
本文介绍了如何使用R从列的每个单元格中删除重复的逗号分隔字符值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含2列ID和产品的数据框,如下所示:

I have a data-frame with 2 columns ID and Product as below :

ID  Product
A   Clothing, Clothing Food, Furniture, Furniture
B   Food,Food,Food, Clothing
C   Food, Clothing, Clothing

每个ID仅需要唯一的产品,例如:

I need to have only unique products for each ID, for example :

ID  Product
A   Clothing, Food, Furniture
B   Food, Clothing
C   Food, Clothing

我该如何使用R

推荐答案

如果数据集中有多个定界符,则一种方法是使用所有定界符拆分"Product"列,并获得 unique ,然后将其粘贴在一起( toString ),并按"ID"分组.在这里,我们使用 data.table 方法.

If there are multiple delimiters in the dataset, one way would be to split the 'Product' column using all the delimiters, get the unique and then paste it together (toString) grouped by 'ID'. Here we use data.table methods.

library(data.table)
setDT(df1)[, list(Product= toString(unique(strsplit(Product, 
            ',\\s*|\\s+')[[1]]))), by = ID]
#   ID                   Product
#1:  A Clothing, Food, Furniture
#2:  B            Food, Clothing
#3:  C            Food, Clothing

这篇关于如何使用R从列的每个单元格中删除重复的逗号分隔字符值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆