写入抓取数据的csv文件时如何拆分项目名称 [英] How to split item names when writing csv file of scraped data
问题描述
我有兴趣创建一个csv或类似的Excel兼容文件,并使用R使用从网络上抓取的数据进行数据存储.到目前为止,我已通过以下操作存储了数据:
I am interested in creating a csv or similar Excel-compliant file with data that I scraped from the web by using R. So far I stored the data by doing this:
require(textreadr)
spiegel <- read_html("http://www.spiegel.de/schlagzeilen/")
headlines <- html_nodes(spiegel, ".headline-date")
mydata <- html_text(headlines)
变量"mydata"现在包含以下内容:
The variable "mydata" now contains the following:
[1] "(Wirtschaft, 00:00)" "(Kultur, 23:42)" "(Sport, 23:38)" "(Politik, 23:16)"
[5] "(Sport, 22:29)" "(Panorama, 21:56)" "(Sport, 21:39)" "(Sport, 21:25)"
[9] "(Sport, 20:23)" "(Politik, 20:21)" "(Politik, 20:09)" "(Wissenschaft, 19:41)"
当我现在使用write.csv时,我想创建两列,第一列应包含"Wirtschaft,Sport等"之类的类别.第二个时间.有人可以告诉我在这种情况下该怎么做吗?
When I use write.csv now I want to create two columns, the first one should contain the categories like "Wirtschaft, Sport, etc." and the second one the time. Can someone tell me how to do this specifically in this case?
推荐答案
删除括号,转换为小标题(其名称将从列开始称为value
),然后使用separate
将其分为两列.最后写出来.将stdout()
替换为您的文件名.
Remove the parentheses, convert to a tibble (whose since column will be called value
) and use separate
to split that into two columns. Finally write it out. Replace stdout()
with your filename.
Lines <- c("(Wirtschaft, 00:00)", "(Kultur, 23:42)") # test data
library(dplyr)
library(tidyr)
library(tibble)
Lines %>%
gsub("[()]", "", .) %>%
as.tibble %>%
separate(value, into = c("Name", "Time"), sep = ", ") %>%
write.csv(stdout(), row.names = FALSE)
给予:
"Name","Time"
"Wirtschaft","00:00"
"Kultur","23:42"
这篇关于写入抓取数据的csv文件时如何拆分项目名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!