R 跳过/dev/stdin 中的行 [英] R skips lines from /dev/stdin

查看:19
本文介绍了R 跳过/dev/stdin 中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数字列表的文件(自己制作:for x in $(seq 10000); do echo $x; done > file).

I have a file with a list of numbers (make it for yourself: for x in $(seq 10000); do echo $x; done > file).

$> R -q -e "x <- read.csv('file', header=F); summary(x);"

> x <- read.csv('file', header=F); summary(x);
       V1       
 Min.   :    1  
 1st Qu.: 2501  
 Median : 5000  
 Mean   : 5000  
 3rd Qu.: 7500  
 Max.   :10000  

现在,人们可能期望 cat 处理文件并从 /dev/stdin 读取具有相同的输出,但事实并非如此:

Now, one might expect cating the file and reading from /dev/stdin to have the same output, but it does not:

$> cat file | R -q -e "x <- read.csv('/dev/stdin', header=F); summary(x);"
> x <- read.csv('/dev/stdin', header=F); summary(x);
       V1       
 Min.   :    1  
 1st Qu.: 3281  
 Median : 5520  
 Mean   : 5520  
 3rd Qu.: 7760  
 Max.   :10000 

使用 table(x) 显示跳过了一堆行:

Using table(x) shows that a bunch of lines were skipped:

    1  1042  1043  1044  1045  1046  1047  1048  1049  1050  1051  1052  1053 
    1     1     1     1     1     1     1     1     1     1     1     1     1 
 1054  1055  1056  1057  1058  1059  1060  1061  1062  1063  1064  1065  1066 
    1     1     1     1     1     1     1     1     1     1     1     1     1
 ...

看起来 R 对 stdin 做了一些有趣的事情,因为这个 Python 将正确打印文件中的所有行:

It looks like R is doing something funny with stdin, as this Python will properly print all the lines in the file:

cat file | python -c 'with open("/dev/stdin") as f: print f.read()'

<小时>

这个问题似乎相关,但更多的是关于跳过格式错误的 CSV 文件中的行,而我的输入只是一个数字列表.


This question seems related, but it is more about skipping lines in a malformed CSV file, whereas my input is just a list of numbers.

推荐答案

head --bytes=4K file |tail -n 3

产生这个:

1039
1040
104

这表明 R 在/dev/stdin 上创建了一个大小为 4KB 的输入缓冲区,并在初始化期间填充它.当您的 R 代码随后读取/dev/stdin 时,它会在此时在文件中启动:

This suggests that R creates an input buffer on /dev/stdin, of size 4KB, and fills it during initialisation. When your R code then reads /dev/stdin, it starts in file at this point:

   1
1042
1043
...

实际上,如果在文件中将 1041 行替换为 1043,则 table(x) 中会得到3"而不是1":

Indeed, if in file you replace the line 1041 by 1043, you get a "3" instead of "1" in the table(x):

3  1042  1043  1044  1045  1046  1047  1048  1049  1050  1051  1052  1053 
1     1     1     1     1     1     1     1     1     1     1     1     1 
...

table(x)中的第一个1实际上是1041的最后一位.文件的前 4KB 已被吃掉.

The first 1 in table(x) is actually the last digit of 1041. The first 4KB of file have been eaten.

这篇关于R 跳过/dev/stdin 中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆