Raku:有没有一种超级快速的方法可以将数组转换为字符串,而无需使用空格分隔元素? [英] Raku : is there a SUPER fast way to turn an array into a string without the spaces separating the elements?

查看:190
本文介绍了Raku:有没有一种超级快速的方法可以将数组转换为字符串,而无需使用空格分隔元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将成千上万个二进制字节的字符串(每个长约一个兆字节)转换为ASC字符串.这是我一直在做的,而且似乎太慢了:

I need to convert thousands of binary byte strings, each about a megabyte long, into ASC strings. This is what I have been doing, and seems too slow:

sub fileToCorrectUTF8Str ($fileName) { # binary file
    my $finalString = "";
    my $fileBuf = slurp($fileName, :bin);    
    for @$fileBuf { $finalString = $finalString ~ $_.chr; };    
    return $finalString;
}

〜@ b将@b转换为字符串,所有元素都用空格分隔,但这不是我想要的.如果@b =< a b c d>; 〜@ b是"a b c d";但我只想"abcd",而且我想做的很快.

~@b turns @b into string with all elements separated by space, but this is not what I want. If @b = < a b c d >; the ~@b is "a b c d"; but I just want "abcd", and I want to do this REALLY fast.

那么,最好的方法是什么?由于最终的字符串是按顺序构造的,因此我不能真正使用hyper进行并行处理.或者可以吗?

So, what is the best way? I can't really use hyper for parallelism because the final string is constructed sequentially. Or can I?

推荐答案

TL; DR 在旧的rakudo上,.decode的速度大约是100倍.

TL;DR On an old rakudo, .decode is about 100X times as fast.

采用更长的格式以匹配您的代码:

In longer form to match your code:

sub fileToCorrectUTF8Str ($fileName) { # binary file
  slurp($fileName, :bin).decode
}

性能说明

首先,这是我为测试编写的内容:

Performance notes

First, here's what I wrote for testing:

# Create million and 1 bytes long file:
spurt 'foo', "1234\n6789\n" x 1e5 ~ 'Z', :bin;

# (`say` the last character to check work is done)
say .decode.substr(1e6) with slurp 'foo', :bin;

# fileToCorrectUTF8Str 'foo' );

say now - INIT now;

在TIO.run的2018.12 rakudo中,上述.decode的权重约为每百万字节文件.05秒,而不是您的解决方案的5秒.

On TIO.run's 2018.12 rakudo, the above .decode weighs in at about .05 seconds per million byte file instead of about 5 seconds for your solution.

您当然可以/应该在系统上和/或使用更高版本的rakudo进行测试.我希望两者之间的差异保持不变,但是随着时间的流逝,绝对时间会显着改善. [1]

You could/should of course test on your system and/or using later versions of rakudo. I would expect the difference to remain in the same order, but for the absolute times to improve markedly as the years roll by.[1]

为什么它快100倍?

首先,在Buf/Blob上的@明确强制raku将以前的单个项目( a 缓冲区)视为复数事物(元素的 list 又称为多个项目 s ).这意味着高级迭代,对于一百万个元素的缓冲区,它立即是一百万高级迭代/运算,而不仅仅是一个高级运算.

Well, first, @ on a Buf / Blob explicitly forces raku to view the erstwhile single item (a buffer) as a plural thing (a list of elements aka multiple items). That means high level iteration which, for a million element buffer, is immediately a million high level iterations/operations instead of just one high level operation.

第二,使用.decode不仅避免了迭代,而且每个文件一次只会产生相对较慢的方法调用开销,而在进行迭代时,每个文件可能会有一百万次.chr调用.方法调用是(至少在语义上) 后期绑定 ,它是<相对于例如调用 sub 而不是 method (sub通常是 early bound ).

Second, using .decode not only avoids iteration but only incurs relatively slow method call overhead once per file whereas when iterating there are potentially a million .chr calls per file. Method calls are (at least semantically) late-bound which is in principle relatively costly compared to, for example, calling a sub instead of a method (subs are generally early bound).

全部都说:

  • 记住空洞 [1] .例如,rakudo的标准类会生成方法缓存,并且可能编译器还是直接内联该方法,因此方法调用方面的开销可以忽略不计.

  • Remember Caveat Empty[1]. For example, rakudo's standard classes generate method caches, and it's plausible the compiler just in-lines the method anyway, so it's possible there is negligible overhead for the method call aspect.

另请参见文档的性能页,尤其是

See also the doc's Performance page, especially Use existing high performance code.

更新.请参阅Liz ++的评论.

Update See Liz++'s comment.

如果您尝试在BufBlob(或等价物,例如在其上使用~前缀)上使用.Str,则会出现异常.当前的消息是:

If you try to use .Str on a Buf or Blob (or equivalent, such as using the ~ prefix on it) you'll get an exception. Currently the message is:

Cannot use a Buf as a string, but you called the Str method on it

Buf/Blob.Str文档当前说:

要转换为Str,您需要使用.decode.

可以说是LTA,错误消息不能暗示同一件事.

It's arguably LTA that the error message doesn't suggest the same thing.

然后,在决定要采取的措施之前(如果有的话),我们需要考虑人们可以从错误的事物中学习什么,以及如何从错误的事物中学习,包括有关错误的信号,例如错误消息,以及它们目前实际上正在学习的内容和方法,并且使我们的反应偏向于建立正确的文化和基础设施.

Then again, before deciding what to do about this, if anything, we need to consider what, and how, folk could learn from anything that goes wrong, including signals about it, such as error messages, and also what and how they do in fact currently learn, and bias our reactions toward building the right culture and infrastructure.

特别是,如果人们可以轻松地在他们看到的错误消息与详细讨论该错误的在线讨论之间建立联系,则需要考虑这些问题,并且也许应该鼓励和/或简化这些事情.

In particular, if folk can easily connect between an error message they see, and online discussion that elaborates on it, that needs to be taken into account and perhaps encouraged and/or made easier.

例如,现在有这样的SO覆盖了此问题并带有错误消息,因此Google很可能会在这里找到某人.依靠它可能比更改错误消息更合适.否则可能不会.更改很容易...

For example, there's now this SO covering this issue with the error message in it, so a google is likely to get someone here. Leaning on that might well be a more appropriate path forward than changing the error message. Or it might not. The change would be easy...

请考虑在下面评论和/或搜索现有的 rakudo问题,以查看是否对Buf.Str正在考虑错误消息和/或是否希望打开问题以建议对其进行更改.每一块岩石移动至少都是一项伟大的运动,随着我们的集体努力变得越来越明智,.

Please consider commenting below and/or searching existing rakudo issues to see if improvement of the Buf.Str error message is being considered and/or whether you wish to open an issue to propose it be altered. Every rock moved is at least great exercise, and, as our collective effort becomes increasingly wise, improves (our view of) the mountain.

[1] 正如众所周知的拉丁语 Caveat Empty 所说,任何特定raku功能(更普遍地说是任何特定代码)的绝对性能和相对性能始终是可能会因各种因素而有所变化,这些因素包括一个人的系统的功能,运行代码时的负载以及编译器进行的任何优化.因此,例如,如果您的系统为空",则您的代码可能会运行得更快.或者,作为另一个示例,如果您等待一到三年让编译器更快运行,请

[1] As the well known Latin saying Caveat Empty goes, both absolute and relative performance of any particular raku feature, and more generally any particular code, is always subject to variation due to factors including one's system's capabilities, its load during the time it's running the code, and any optimization done by the compiler. Thus, for example, if your system is "empty", then your code may run faster. Or, as another example, if you wait a year or three for the compiler to get faster, advances in rakudo's performance continue to look promising.

这篇关于Raku:有没有一种超级快速的方法可以将数组转换为字符串,而无需使用空格分隔元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆