R 跳过/dev/stdin 中的行 [英] R skips lines from /dev/stdin
问题描述
我有一个包含数字列表的文件(自己制作:for x in $(seq 10000); do echo $x; done > file
).
I have a file with a list of numbers (make it for yourself: for x in $(seq 10000); do echo $x; done > file
).
$> R -q -e "x <- read.csv('file', header=F); summary(x);"
> x <- read.csv('file', header=F); summary(x);
V1
Min. : 1
1st Qu.: 2501
Median : 5000
Mean : 5000
3rd Qu.: 7500
Max. :10000
现在,人们可能期望 cat
处理文件并从 /dev/stdin
读取具有相同的输出,但事实并非如此:
Now, one might expect cat
ing the file and reading from /dev/stdin
to have the same output, but it does not:
$> cat file | R -q -e "x <- read.csv('/dev/stdin', header=F); summary(x);"
> x <- read.csv('/dev/stdin', header=F); summary(x);
V1
Min. : 1
1st Qu.: 3281
Median : 5520
Mean : 5520
3rd Qu.: 7760
Max. :10000
使用 table(x)
显示跳过了一堆行:
Using table(x)
shows that a bunch of lines were skipped:
1 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053
1 1 1 1 1 1 1 1 1 1 1 1 1
1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066
1 1 1 1 1 1 1 1 1 1 1 1 1
...
看起来 R 对 stdin
做了一些有趣的事情,因为这个 Python 将正确打印文件中的所有行:
It looks like R is doing something funny with stdin
, as this Python will properly print all the lines in the file:
cat file | python -c 'with open("/dev/stdin") as f: print f.read()'
<小时>
这个问题似乎相关,但更多的是关于跳过格式错误的 CSV 文件中的行,而我的输入只是一个数字列表.
This question seems related, but it is more about skipping lines in a malformed CSV file, whereas my input is just a list of numbers.
推荐答案
head --bytes=4K file |tail -n 3
产生这个:
1039
1040
104
这表明 R 在/dev/stdin 上创建了一个大小为 4KB 的输入缓冲区,并在初始化期间填充它.当您的 R 代码随后读取/dev/stdin 时,它会在此时在文件中启动:
This suggests that R creates an input buffer on /dev/stdin, of size 4KB, and fills it during initialisation. When your R code then reads /dev/stdin, it starts in file at this point:
1
1042
1043
...
实际上,如果在文件中将 1041
行替换为 1043
,则 table(x) 中会得到3"而不是1"代码>:
Indeed, if in file you replace the line 1041
by 1043
, you get a "3" instead of "1" in the table(x)
:
3 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053
1 1 1 1 1 1 1 1 1 1 1 1 1
...
table(x)
中的第一个1
实际上是1041
的最后一位.文件的前 4KB 已被吃掉.
The first 1
in table(x)
is actually the last digit of 1041
. The first 4KB of file have been eaten.
这篇关于R 跳过/dev/stdin 中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!