将数据框内的每个列表转换为普通列 [英] Converting each list within a dataframe to a normal column

查看:70
本文介绍了将数据框内的每个列表转换为普通列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从网络上的多个来源中产生了一个数据帧,这些数据帧被预先清除,然后使用

I produce a data frame from several sources from the web which are cleaned beforehand and then selected with

cleans <- ls() 
cleans <- cleans[grepl("Clean_News", cleans)]

我将它们绑定在一起的第一次尝试是受Stack Overflow解决方案的启发:

My first attempt to bind them together was inspired by a solution on Stack Overflow:

All_News <- mapply(get, grep("Clean_News", ls(), value=T))
All_News <- data.frame(t(All_News))
All_News <- as.data.frame(All_News)

但是,这对我来说是个问题,因为结果是一个数据框,其中每一列都是一个整数列表或字符。
因此,我的主要问题是如何将数据框中的每个列表转换为df中的普通列。我在Stack Overflow上尝试了许多手工函数,但没有一个对我有用(由于我的经验不足,我想...)。
df的格式为

However, this is a problem for me, since the result is a dataframe, where each column is a list of ints or characters. So, my main question is how to convert each list within the dataframe to a normal column within the df. I tried many hand-made functions on Stack Overflow, but none worked for me (due to my inexperience, I guess...). The df has the form

All_News <- data.frame(a=I(list(1,1:2,1:3)), b=I(list(4:6,7:9,10:11)))

或者,我尝试了以下有效的方法:

Alternatively, I tried the following, which works:

All_News <- do.call(rbind, lapply(cleans, get))

但是有一个巨大的缺点,就是我没有成功获得数据框的名称作为行名/或数据框的第一列...
因此,我的第二个问题是如何将单个数据框的名称附加到巨大df的每一行,而不是像下面的代码行一样的id。

But has the huge disadvantage that I did not succeed in getting the names of the data frames as rownames / or first column into the data frame... So, my second question would be how to attach the names of the single data frames to each row of the huge df, instead of an id like the line of code below.

t2 <- rbindlist(lapply(cleans, get), idcol = "id") 

这并不太好,因为我需要将所有数据帧的名称重复x次,因为标识符,例如而且,由于这是一个具有数千个网页的自动化过程,因此我事先不知道每个数据框中的行数。数据如下:

This does not much good since I need the names of all data frames x -times repeatedly as an identifier, e.g. AND since this is an automated process with thousands of webpages, I do not know beforehand the number of rows in each data frame. The data looks like:

 news1 data1 data2
 news1 data5 data6
 news2 data3 data4
 and so on.

我尝试过以下方法

nr <- length(cleans)
names <- rep(cleans, nr)
names <- sort(names)

但是没有太大的成功。

推荐答案

我们可以通过遍历数据集的列来完成此操作,这些列为 unlist list

We can do this by looping through the columns of dataset, unlist the list columns

lst <- lapply(All_News, unlist)

然后,使 list 元素的长度相同,方法是对那些基于最大长度( max(lengths(lst)))并将其转换为 data.frame

then, make the lengths of the list element same by padding NA at the end for those having less elements based on the maximum length (max(lengths(lst))) and convert it to data.frame

data.frame(lapply(lst, `length<-`, max(lengths(lst))))

这篇关于将数据框内的每个列表转换为普通列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆