如何将字符串拆分为不同的变量？ [英] How to split a string into different variables?

查看：92 发布时间：2020/10/15 21:33:09 r data-analysis data-cleaning

本文介绍了如何将字符串拆分为不同的变量？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试分析 Airbnb 和便利设施列中的商品的大型数据集列出列出的设施。

I'm trying to analyze a large data set for listings on Airbnb and in the amenities column, it lists out the amenities that the listing has.

例如，

{"Wireless Internet","Air conditioning",Kitchen,Heating,"Fire 
extinguisher",Essentials,Shampoo,Hangers}

和

{TV,"Wireless Internet","Air conditioning",Kitchen,"Elevator in 
building",Heating,"Suitable for events","Smoke detector","Carbon monoxide 
detector","First aid kit",Essentials,Shampoo,"Lock on bedroom 
door",Hangers,"Hair dryer",Iron,"Laptop friendly workspace","translation 
missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}

我要解决两个问题：

我想将字符串分成不同的列，例如将会有一个标题为 TV 的列。如果字符串包含 TV ，则相应单元格中的条目将为1，否则为0。我该怎么办？

I would like to split the string into different columns, e.g. there will be a column with a title TV. If the string contains TV, the entry in the corresponding cell will be 1 and 0 otherwise. How can I do this?

如何删除缺少翻译的变量：..... ？

推荐答案

这是一种同时使用<$ c从 data.table 包中的$ c> dcast（），如此答案，但也解决了数据清理的乏味但重要的细节。

Here is an approach which uses also dcast() from the data.table package as in this answer but addresses also the tedious but important details of data cleaning.

library(data.table)

# read data file, returning one column
raw <- fread("AirBnB.csv", header = FALSE, sep = "\n", col.names = "amenities")
# add column with row numbers
raw[, rn := seq_len(.N)]
# remove opening and closing curly braces
raw[, amenities := stringr::str_replace_all(amenities, "^\\{|\\}$", "")]

# split amenities, thereby reshaping from wide to long format
long <- raw[, strsplit(amenities, ",", fixed = TRUE), by = rn]
# remove double quotes and leading and trailing whitespace
long[, V1 := stringr::str_trim(stringr::str_replace_all(V1, '["]', ""))]

# reshape from long to wide format, omitting rows which contain "translation missing..."
dcast(long[!V1 %like% "^translation missing"], rn ~ V1, length, value.var = "rn", fill = 0)
#   rn Air conditioning Carbon monoxide detector Elevator in building Essentials
#1:  1                1                        0                    0          1
#2:  2                1                        1                    1          1
#   Fire extinguisher First aid kit Hair dryer Hangers Heating Iron Kitchen
#1:                 1             0          0       1       1    0       1
#2:                 0             1          1       1       1    1       1
#   Laptop friendly workspace Lock on bedroom door Shampoo Smoke detector
#1:                         0                    0       1              0
#2:                         1                    1       1              1
#   Suitable for events TV Wireless Internet
#1:                   0  0                 1
#2:                   1  1                 1

数据文件

OP仅提供了两个数据样本，这些样本已复制到名为<$ c的数据文件中$ c> AirBnB.csv ：

{"Wireless Internet","Air conditioning",Kitchen,Heating,"Fire extinguisher",Essentials,Shampoo,Hangers}
{TV,"Wireless Internet","Air conditioning",Kitchen,"Elevator in building",Heating,"Suitable for events","Smoke detector","Carbon monoxide detector","First aid kit",Essentials,Shampoo,"Lock on bedroom door",Hangers,"Hair dryer",Iron,"Laptop friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}

这篇关于如何将字符串拆分为不同的变量？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将字符串拆分为不同的变量？ [英] How to split a string into different variables?

问题描述

推荐答案

数据文件

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将字符串拆分为不同的变量？ [英] How to split a string into different variables?

问题描述

推荐答案

数据文件

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭