根据条件更改列中所有字符串的值 [英] Change value of all strings in column based on condition
问题描述
R的新手,我对数据清理有疑问。
New-ish to R, I have a question about data cleaning.
我有一个列,其中包含汽车的驱动类型-四轮,全轮,2轮等
I have a column that contains what type of drive a car is - four wheel, all wheel, 2 wheel etc
问题是没有标准化,因此某些行具有4 WHEEL驱动器,4wd,4WD,四轮驱动器等
The problem is there is no standardization, so some rows have 4 WHEEL drive, 4wd, 4WD, Four - Wheel - Drive, etc
第一步很容易,这就是大写内容,但我遇到的困难是将每个值更改为4WD之类的标准,而不必重新编码每个唯一的驱动器。
The first step is easy, which is to uppercase everything but the step I'm having trouble with is changing each value to a standard, like 4WD, without having to recode each unique drive.
如果值LIKE / CONTAINS的 FOUR更改为 4WD,则类似于For列中的每个值。
Something like For Each value in column, if value LIKE/CONTAINS "FOUR" change to "4WD".
我已经研究了recode以及stringdist和mutate,但我找不到合适的方法。当我键入它时,听起来好像需要循环,但不确定确切的语法。
I've researched recode and stringdist and mutate but I can't find a fit. When I typed it out it sounds like I need a loop but not sure the exact syntax.
如果该解决方案可以与tidyverse一起使用,那就太好了!
If the solution could work with the tidyverse that would be great!
推荐答案
欢迎使用StackOverflow!我已经回答了您的问题,但是将来,请提供一小部分数据样本,以便我们更轻松地解决您的问题。值得深思:如何制作可重复的例子
Welcome to StackOverflow! I've answered your question, but in the future, please include a small sample of your data so it's easier for us to solve your problem. Food for thought: How to make a reproducible example
require(plyr)
require(dplyr)
# Since you haven't provided a data sample, I'm going to assume your dataframe is named "DF" and your column's name is "Drive"
# Set everything to lowercase to pare down uniqueness
DF <- mutate(DF, Drive = replace(Drive, Drive, tolower(Drive)))
# You'll need one line like this for each replacement. Of the following form:
# <column_name> = replace(<column_name>, <condition>, <new value>)
DF <- mutate(DF, Drive = replace(Drive, Drive == "4 wheel drive", "4WD"))
这篇关于根据条件更改列中所有字符串的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!