写入抓取数据的csv文件时如何拆分项目名称 [英] How to split item names when writing csv file of scraped data

查看:87
本文介绍了写入抓取数据的csv文件时如何拆分项目名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣创建一个csv或类似的Excel兼容文件,并使用R使用从网络上抓取的数据进行数据存储.到目前为止,我已通过以下操作存储了数据:

I am interested in creating a csv or similar Excel-compliant file with data that I scraped from the web by using R. So far I stored the data by doing this:

require(textreadr)
spiegel <- read_html("http://www.spiegel.de/schlagzeilen/")
headlines <- html_nodes(spiegel, ".headline-date")
mydata <- html_text(headlines)

变量"mydata"现在包含以下内容:

The variable "mydata" now contains the following:

[1] "(Wirtschaft, 00:00)"       "(Kultur, 23:42)"           "(Sport, 23:38)"            "(Politik, 23:16)"         
  [5] "(Sport, 22:29)"            "(Panorama, 21:56)"         "(Sport, 21:39)"            "(Sport, 21:25)"           
  [9] "(Sport, 20:23)"            "(Politik, 20:21)"          "(Politik, 20:09)"          "(Wissenschaft, 19:41)"

当我现在使用write.csv时,我想创建两列,第一列应包含"Wirtschaft,Sport等"之类的类别.第二个时间.有人可以告诉我在这种情况下该怎么做吗?

When I use write.csv now I want to create two columns, the first one should contain the categories like "Wirtschaft, Sport, etc." and the second one the time. Can someone tell me how to do this specifically in this case?

推荐答案

删除括号,转换为小标题(其名称将从列开始称为value),然后使用separate将其分为两列.最后写出来.将stdout()替换为您的文件名.

Remove the parentheses, convert to a tibble (whose since column will be called value) and use separate to split that into two columns. Finally write it out. Replace stdout() with your filename.

Lines <- c("(Wirtschaft, 00:00)", "(Kultur, 23:42)") # test data

library(dplyr)
library(tidyr)
library(tibble)

Lines %>% 
      gsub("[()]", "", .) %>%
      as.tibble %>%
      separate(value, into = c("Name", "Time"), sep = ", ") %>%
      write.csv(stdout(), row.names = FALSE)

给予:

"Name","Time"
"Wirtschaft","00:00"
"Kultur","23:42"

这篇关于写入抓取数据的csv文件时如何拆分项目名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆