在R中拆分字符串并生成频率表 [英] Splitting Strings and Generating Frequency Tables in R
问题描述
我在 R 数据框中有一列公司名称,其内容如下:
I have a column of firm names in an R dataframe that goes something like this:
"ABC Industries"
"ABC Enterprises"
"123 and 456 Corporation"
"XYZ Company"
以此类推.我正在尝试为出现在此列中的每个单词生成频率表,因此,例如:
And so on. I'm trying to generate frequency tables of every word that appears in this column, so for example, something like this:
Industries 10
Corporation 31
Enterprise 40
ABC 30
XYZ 40
我是 R 的新手,所以我想知道一种解决此问题的好方法.我应该拆分字符串并将每个不同的单词放入新的列中吗?有没有一种方法可以将一个单词的多单词行拆分为多个行?
I'm relatively new to R, so I was wondering of a good way to approach this. Should I be splitting the strings and placing every distinct word into a new column? Is there a way to split up a multi-word row into multiple rows with one word?
推荐答案
如果您愿意,可以单线执行:
If you wanted to, you could do it in a one-liner:
R> text <- c("ABC Industries", "ABC Enterprises",
+ "123 and 456 Corporation", "XYZ Company")
R> table(do.call(c, lapply(text, function(x) unlist(strsplit(x, " ")))))
123 456 ABC and Company
1 1 2 1 1
Corporation Enterprises Industries XYZ
1 1 1 1
R>
在这里,我使用strsplit()
破坏每个条目的介绍性组件;这将返回一个列表(在列表内).我使用do.call()
,因此只需将所有结果列表连接到一个向量中,即可table()
汇总.
Here I use strsplit()
to break each entry intro components; this returns a list (within a list). I use do.call()
so simply concatenate all result lists into one vector, which table()
summarises.
这篇关于在R中拆分字符串并生成频率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!