单独的字段内容 [英] Separate contents of field

查看:51
本文介绍了单独的字段内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确信这非常简单,我认为这是使用单独和收集的情况.

I'm sure this is very simple, and I think it's a case of using separate and gather.

我在数据框,作者列表和发布搜索的编辑导出中有一个字段.它包含出版物的作者.显然,它可以包含一个作者或一个作者的协作.

I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors.

例如,这只是可用选项的选择:

For example this is just a selection of the options available:

Author
Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P.

我想做的是创建所有作者的单个列表,这样我就可以得到类似的东西

What I'd like to do is create a single list of ALL authors so that I'd have something like

Author
Drijgers RL
Verhey FR
Leentjens AF
Kahler S
Aalten P

我该怎么做? 我以为会是这样

How do I do that? I thought it would be something like

authSpread<-authorlist%>%separate(Author,sep =",",extra ="drop")

但是它不起作用. 如果我输入="NA" 我只在第一列中列出了第一批作者. 我想做的是在excel中将文本复制到列功能,您可以在其中指定要分割的字符,并将单元格的内容强制转换/扩展到新的单元格.然后将它们重新聚集到一列中. 我不知道作者的最大数量,因此也不知道以编程方式划分(或如何标记它们)的列数.

But it's not working. If I put into = "NA" I get just the first authors listed in a single column. What I'd like to do is replicate the text to columns function in excel, where you can specify the character to split at and the contents of the cell are cast/spread to new cells. And then regather them into one column. I don't know the maximum number of authors, and therefore don't know the number of columns to split by (or how to label them) programatically.

澄清 我不知道是否要制作一个较长的数据帧然后收集-因为我不知道会生成多少个字段.这是明智的事情吗? 我想我可以将,"分隔符的输出写到一个列表中,然后将该列表的内容写为单个数据帧. 听起来更有效率吗?

clarification I don't know if I want to make a long dataframe wide AND then gather - because I don't know how many fields would be generated. Is this a sensible thing? I would think I could write the output of the separate by "," to a list and then write the contents of that list as single data frame. Does that sound more efficient?

推荐答案

您正在寻找separate_rows.

输入:

df <- data.frame(authors = c("Drijgers RL, Verhey FR, Leentjens AF, Köhler S, Aalten P."))

                                                     authors
1 Drijgers RL, Verhey FR, Leentjens AF, Köhler S, Aalten P.

功能:

library(tidyverse)

df %>% separate_rows(authors, sep = ", ")

输出:

       authors
1  Drijgers RL
2    Verhey FR
3 Leentjens AF
4    Köhler S
5    Aalten P.

您可以将它们保存在这样的列表中:

You can save them in a list like that:

authors_list <- df %>% separate_rows(authors, sep = ", ") %>% pull(authors)

输出:

[1] "Drijgers RL"  "Verhey FR"    "Leentjens AF" "Köhler S"    "Aalten P."   

如果您的列表中有多篇文章的作者,并且只希望出现独特的内容,只需在末尾添加unique():

If you have authors of multiple articles in your list and you want only unique occurences, just add unique() at the end:

authors_list <- df %>% separate_rows(authors, sep = ", ") %>% pull(authors) %>% unique()

这篇关于单独的字段内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆