无法使用R删除txt文件中的空白行 [英] can't remove blank lines in txt file with R

查看：236 发布时间：2020/11/21 19:06:54 r text gsub blank-line

本文介绍了无法使用R删除txt文件中的空白行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用R进行文本分析，需要将句子的第一个字母转换为小写，同时将其他大写字母保持原样.所以我使用了命令

I am doing a text analysis with R and needed to convert the first letters of the sentences into lowercase while keeping the other capitalized words the way they are. So I used the command

     x <- gsub("(\\..*?[A-Z])", '\\L\\1', x, perl=TRUE)

有效，但部分有效.问题是，对于文本分析，我不得不将pdf文件转换为txt格式，现在txt文件包含很多空行(分页符，可能返回)，因此我使用的命令不会将大写字母转换为出现在新行上.我试图在 gsub 中使用具有多个\ s，\ r，\ n的不同组合来消除空行，但没有任何效果.当我执行tm-package的inspect(x)时，输出以以下方式显示:

which worked, but partially. The problem is that for the text analysis I had to convert the pdf files into txt format and now the txt files contain a lot of empty lines (page breaks, returns possibly), and therefore the command I used does not convert the capital letters that appear on the new lines. I was trying to eliminate the empty lines using different combinations in gsub with multiple \s, with \r, \n but nothing works. When I do the inspect(x) of the tm-package, the output looks in the following way:

[346]                                                                                                                                                                                                                                                  
[347]    Thank you.                                                                                                                                                                                                                                    
[348]                                                                                                                                                                                                                                                  
[349]    Vice President of Investor Relations                                                                                                                                                                                               
[350]

如果有人能帮助我，我将不胜感激！

I would be grateful if anyone could help me!

推荐答案

鉴于您的输出，空行似乎是字符向量中的单独字符串.您需要使用grep过滤掉这些内容:

Given your output, the empty lines appear to be separate character strings in a character vector. You need to filter those out using grep:

empty_lines = grepl('^\\s*$', x)
x = x[! empty_lines]

然后您可以执行后续分析，但是您可能仍然需要先将行连接起来才能得到单个字符串:

Then you can perform your subsequent analysis, but you probably still need to concatenate the lines first to get a single character string:

x = paste(x, collapse = '\n')

这篇关于无法使用R删除txt文件中的空白行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法使用R删除txt文件中的空白行 [英] can't remove blank lines in txt file with R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

无法使用R删除txt文件中的空白行 [英] can&#39;t remove blank lines in txt file with R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

无法使用R删除txt文件中的空白行 [英] can't remove blank lines in txt file with R

登录关闭