如何在R的数据帧中缩写长名称? [英] How to abbreviate long names in a dataframe for R?

查看:117
本文介绍了如何在R的数据帧中缩写长名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个具有很长名称(超过25个字符)的数据框。我正在尝试使用所有这些组织的名称制作条形图(带有点状),但是名称被截断了,因为它们太长了。我已经尝试过以下边距:

  plot_ly(x =数字,y = org_name,类型='bar ')%>%
布局(margin = list(l = 150))

它可以工作,但条形图看起来不太好,因此我尝试做的另一种选择是将长度超过25个字符的任何组织名称缩写。但是,我很难这么做。我尝试缩写的一种方法是创建一个名为abbrv的新列,使用子字符串获取组织名称的前25个字符,然后执行 ...,然后将其放入列中。对于不超过25个组织的名称,我只需要在abbrv列中输入一个NA,如下所示:

 对于(i在dataframe.name $ org_name中){
if(nchar(i)> 25){
dataframe.name $ abbrv<-paste0(substring(i,0,25), ...)
}
else {
dataframe.name $ abbrv<- NA
}

用这种方法的唯一方法是现在有了abbrv列(如果可行),如果组织,我将如何确保密谋显示abbrv列名称大于25个字符,如果没有,则显示正常的组织名称。



无论如何,我已经谈论了很多,但这是我尝试过的一种方法这样做,但是由于abbrv列为该列中的所有行都加上了 NA,因此不管它的组织名称有多长,它都不起作用。我尝试做的另一种方法是使用替换功能,例如:

  for(dataframe.name $ org_name){ 
if(nchar(i)> 25){
dataframe.name [i] .replace(
to_replace = i,
value =缩写(i)

}

但是我也遇到了错误。此时,我什至不知道该怎么做以及如何在数据框中缩写长名称?我真的迷路了,对如何做以及如何精确地缩写长名感到困惑。如果有人可以帮助我,那就太好了!谢谢。



*******编辑*******



所以我现在正在使用以下代码:

  for(i in 1:nrow(dfname)){
if(nchar( dfname $ orgname [i])> 25){
dfname $ abbrv.column<-substring(dfname $ orgname [i],0,25)
}
else {
dfname $ abbrv.column<-dfname $ orgname
}
}

这不是很有效,因为所有条目都是相同的组织名称

解决方案

dataframe.name $ abbr 是数据框中所有缩写的向量,而不仅仅是单个名称。



这是将 dataframe.name $ abbr 中的所有条目都设置为 NA 的原因;数据框中的姓氏为25个字符或更少,因此 dataframe.name $ abbr 中的所有条目均分配为 NA



@ brettljausn有一个不错的建议:完全取消 NA s,并且仅在字符数超过25的地方截断。 / p>

像这样的东西应该可以治疗:

  dataframe.name $ abbrv<-substring(dataframe.name $ org_name,0,25)

我会尝试使用缩写首先:

  dataframe.name $ abbrv<-缩写(dataframe.name $ org_name)


I'm working with a dataframe that has really long names that is more than 25 characters. I'm trying to make a bar graph (with plotly) with all of these organizations name, but the names get cut off because they're super long. I've already tried to the margins like the following:

plot_ly(x = number, y = org_name, type = 'bar') %>% 
layout(margin = list(l = 150))

It works but the bar graph doesn't look nice so the alternative I'm trying to do is abbreviate any organization's name that are longer than 25 characters. However, I'm having a hard time doing so. One way I tried to abbreviate it is to create a new column called abbrv, use substring to get the first 25 characters of the organization name and then do "...", and then put it in the column. While for the organization's name that isn't greater than 25, I would just put an NA in the abbrv column like the following:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
 dataframe.name$abbrv <- paste0(substring(i, 0, 25), "...")
 }
 else{
  dataframe.name$abbrv <- "NA"
}

The only thing with this way is now that I have the abbrv column (if it works), how will I make sure that plotly displays the abbrv column if the organization name is greater than 25 characters and if it doesn't then it displays the normal organization name.

Anyways, I talked enough about that, but that was one approach I tried to do, but it doesn't quite work since the abbrv column puts "NA" for ALL of the rows in the column, no matter how long the organization's names are. Another approach I was trying to do is use the replace function such as:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
   dataframe.name[i].replace(
     to_replace=i,
     value= abbreviate(i)
   )
}

But I get errors for that one as well. At this point, I'm not even sure what to do and how to abbreviate the long names in my dataframe? I'm really lost and confused on what to do and how to exactly abbreviate the long names. If anyone can help me out, that'll be great! Thanks.

*******Edit*******

So now I'm using this code:

for(i in 1:nrow(dfname)){
 if(nchar(dfname$orgname[i]) > 25){
   dfname$abbrv.column <- substring(dfname$orgname[i], 0, 25)
 }  
 else{
   dfname$abbrv.column <- dfname$orgname
 }
}

This isn't quite working though because all of the entries are the same organization name

解决方案

dataframe.name$abbr is a vector of all abbreviations in the dataframe, not just a single name.

It is the reason all entries in dataframe.name$abbr are being set to NA; the last name is in the dataframe is 25 characters or less, so all entries in dataframe.name$abbr are assigned NA.

@brettljausn has a decent suggestion: just do away with the NAs completely and only truncate where the character count exceeds 25.

Something like this should work a treat:

dataframe.name$abbrv <- substring( dataframe.name$org_name, 0, 25 )

I would try to use abbreviate first though:

dataframe.name$abbrv <- abbreviate( dataframe.name$org_name )

这篇关于如何在R的数据帧中缩写长名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆