为什么\n 换行符被引入c() 以及在哪里? [英] Why and where are \n newline characters getting introduced to c()?

查看:58
本文介绍了为什么\n 换行符被引入c() 以及在哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人能帮助我理解为什么我在 R 中创建的字符串向量中会出现错误的 \n 字符.

Hoping someone can help me understand why errant \n characters are showing up in a vector of strings that I'm creating in R.

尝试导入和清理固定宽度格式的非常宽的数据文件(http://www.state.nj.us/education/学校/成就/2012/njask6/,数据运行的文本文件").按照加州大学洛杉矶分校教程使用read.fwf和this 优秀的 SO 问题在导入后给出列名.

Trying to import and clean up a very wide data file that's in fixed width format (http://www.state.nj.us/education/schools/achievement/2012/njask6/, 'Text file for data runs'). Followed the UCLA tutorial on using read.fwf and this excellent SO question to give the columns names after import.

因为文件真的很宽,所以列标题很长 - 总共不到 29,800 个字符.我将它们作为一个简单的字符串向量传入:

Because the file is really wide, the column headers are LONG - all together, just under 29,800 characters. I'm passing them in as a simple vector of strings:

column_names <- c(...)

我不会在这里给你那些丑陋的垃圾桶,但我把整个东西都丢在了 pastebin 上.

I'll spare you the ugly dump here but I dropped the whole thing on pastebin.

当我注意到我的一些子集返回 0 行时,正在清理和转换一些变量以进行分析.在对它感到困惑之后(我是不是拼错了什么?),它意识到我的列标题中以某种方式引入了一堆 '\n' 换行符.

Was cleaning up and transforming some of the variables for analysis when I noticed that some of my subsets were returning 0 rows. After puzzling over it (did I misspell something?) it realized that somehow a bunch of '\n' newline characters had been introduced into my column headers.

如果我遍历我创建的 column_names 向量

If I loop over the column_names vector that I created

for (i in 1:length(column_names)) {
  print(column_names[i])
}

我看到第 81 行中间的第一个换行符 -

I see the first newline character in the middle of the 81st line -

特殊\n教育科学编号注册科学

SPECIAL\nEDUCATION SCIENCE Number Enrolled Science

我尝试解决此问题的途径:

Avenues that I tried to resolve this:

1) 是否与我的环境有关?我在 R 中使用常规脚本编辑器,并且我的行 do 换行 - 但我屏幕上的中断与 \n 字符的位置不匹配,这对我来说表明它不是R 脚本编辑器.

1) Is it something about my environment? I'm using the regular script editor in R, and my lines do wrap - but the breaks on my screen don't match the placement of the \n characters, which to me suggests that it's not the R script editor.

2) 是否有 GUI 设置?做了一些搜索,但什么也没找到.

2) Is there a GUI setting? Did some searching, but couldn't find anything.

3) 有模式吗?似乎每 4000 个字符就会插入换行符.阅读了一些关于 R/S 原语的书,试图弄清楚这是否与基本的 R 数据结构有关,但很快就被我搞糊涂了.

3) Is there a pattern? Seems like the newline characters get inserted about every 4000 characters. Did some reading on R/S primitives to try to figure out if this had something to do with basic R data structures, but was pretty quickly in over my head.

我尝试将长字符串分解为较短的块,然后将它们组合起来,这似乎解决问题.

I tried breaking up the long string into shorter chunks, and then subsequently combining them, and that seemed to solve the problem.

column_names.1 <- c(...)
column_names.2 <- c(...)
column_names_combined <- c(column_names.1, column_names.2)

所以我有一个即时的解决方法,但很想知道这里实际发生了什么.

so I have an immediate workaround, but would love to know what's actually going on here.

某些帖子处理字符向量问题建议我运行内存配置文件:

Some of the posts that had to do with problems with character vectors suggested that I run memory profile:

  memory.profile()
        NULL      symbol    pairlist     closure environment     promise 
           1        9572      220717        4734        1379        5764 
    language     special     builtin        char     logical     integer 
       63932         165        1550       18935       10302       30428 
      double     complex   character         ...         any        list 
        2039           1       60058           0           0       20059 
  expression    bytecode externalptr     weakref         raw          S4 
           1       16553         725         150         151        1162 

我在 Windows 7(企业版、SP 1、8 gigs RAM)上运行 R 2.15.1(64 位)R.谢谢!

I'm running R 2.15.1 (64-bit) R on Windows 7 (Enterprise, SP 1, 8 gigs RAM). Thanks!

推荐答案

我怀疑这是一个错误.相反,您似乎遇到了控制台的已知限制.正如它在 第 1.8 节中所说的那样 -R 简介的 R 命令、区分大小写等:

I doubt this is a bug. Instead, it looks like you're running into a known limitation of the console. As it says in Section 1.8 - R commands, case sensitivity, etc. of An Introduction to R:

在控制台输入的命令行被限制[3]为大约 4095 个字节(不是字符).

Command lines entered at the console are limited[3] to about 4095 bytes (not characters).

[3] 有些控制台不允许你输入更多,有些控制台会默默地丢弃多余的部分,有些会将其用作下一行的开始.

[3] some of the consoles will not allow you to enter more, and amongst those which do some will silently discard the excess and some will use it as the start of the next line.

要么将命令放在一个文件中并 source 它,要么通过在适当的点(逗号之间)插入您自己的换行符将代码分成多行.例如:

Either put the command in a file and source it, or break the code into multiple lines by inserting your own newlines at appropriate points (between commas). For example:

column_names <-
  c("County Code/DFG/Aggregation Code", "District Code", "School Code",
    "County Name", "District Name", "School Name", "DFG", "Special Needs",
    "TOTAL POPULATION TOTAL POPULATION Number Enrolled LAL", ...)

这篇关于为什么\n 换行符被引入c() 以及在哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆