“功能性" Rust对性能有何影响? [英] What are the performance impacts of 'functional' Rust?

查看:75
本文介绍了“功能性" Rust对性能有何影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Exercism.io 上的Rust跟踪.我有相当多的C/C ++经验.我喜欢Rust的功能"元素,但我担心相对性能.

I am following the Rust track on Exercism.io. I have a fair amount of C/C++ experience. I like the 'functional' elements of Rust but I'm concerned about the relative performance.

我解决了运行长度编码"问题:

pub fn encode(source: &str) -> String {
    let mut retval = String::new();
    let firstchar = source.chars().next();
    let mut currentchar = match firstchar {
        Some(x) => x,
        None => return retval,
    };
    let mut currentcharcount: u32 = 0;
    for c in source.chars() {
        if c == currentchar {
            currentcharcount += 1;
        } else {
            if currentcharcount > 1 {
                retval.push_str(&currentcharcount.to_string());
            }
            retval.push(currentchar);
            currentchar = c;
            currentcharcount = 1;
        }
    }
    if currentcharcount > 1 {
        retval.push_str(&currentcharcount.to_string());
    }
    retval.push(currentchar);
    retval
}

我注意到,评分最高的答案之一看起来更像这样:

I noticed that one of the top-rated answers looked more like this:

extern crate itertools;

use itertools::Itertools;

pub fn encode(data: &str) -> String {
    data.chars()
        .group_by(|&c| c)
        .into_iter()
        .map(|(c, group)| match group.count() {
            1 => c.to_string(),
            n => format!("{}{}", n, c),
        })
        .collect()
}

我喜欢最受好评的解决方案;它简单,实用且优雅.这就是他们向我保证Rust将会实现的一切.另一方面,我的是毛骨悚然的,充满了可变的变量.你可以说我已经习惯了C ++.

I love the top rated solution; it is simple, functional, and elegant. This is what they promised me Rust would be all about. Mine on the other hand is gross and full of mutable variables. You can tell I'm used to C++.

我的问题是功能样式会对性能产生重大影响.我用相同的4MB随机数据进行了1000次编码,测试了这两个版本.我的命令性解决方案花了不到10秒的时间;功能解决方案约为2分钟30秒.

My problem is that the functional style has a SIGNIFICANT performance impact. I tested both versions with the same 4MB of random data encoded 1000 times. My imperative solution took under 10 seconds; the functional solution was ~2mins30seconds.

  • 为什么功能样式比命令式样式慢得多?
  • 功能实现是否存在一些问题,从而导致如此大的速度下降?
  • 如果我想编写高性能代码,我是否应该永远使用这种功能样式?
  • Why is the functional style so much slower than the imperative style?
  • Is there some problem with the functional implementation which is causing such a huge slowdown?
  • If I want to write high performance code, should I ever use this functional style?

推荐答案

TL; DR

在某些情况下,功能实现 可能比原始过程实现更快.

A functional implementation can be faster than your original procedural implementation, in certain cases.

为什么功能样式比命令式样式慢得多?功能实现是否存在一些问题,从而导致如此大的速度下降?

Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?

正如 Matthieu M.已经指出,要注意的重要一点是算法很重要.该算法的表达方式(过程性,命令性,面向对象,功能性,声明性)通常无关紧要.

As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.

我看到功能代码有两个主要问题:

I see two main issues with the functional code:

  • 一遍又一遍地分配多个字符串效率很低.在原始功能实现中,这是通过to_stringformat!完成的.

使用group_by会产生开销,而group_by的存在是为了提供嵌套的 iterator ,而您并不需要获取计数.

There's the overhead of using group_by, which exists to give a nested iterator, which you don't need just to get the counts.

使用更多的itertools( take_while_ref format_with )使这两种实现更加接近:

Using more of itertools (batching, take_while_ref, format_with) brings the two implementations much closer:

pub fn encode_slim(data: &str) -> String {
    data.chars()
        .batching(|it| {
            it.next()
                .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
        })
        .format_with("", |(c, count), f| match count {
            1 => f(&c),
            n => f(&format_args!("{}{}", n, c)),
        })
        .to_string()
}

4MiB随机字母数字数据的基准,使用RUSTFLAGS='-C target-cpu=native'编译:

A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native':

encode (procedural)     time:   [21.082 ms 21.620 ms 22.211 ms]

encode (fast)           time:   [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

如果您对创建自己的迭代器感兴趣,可以将过程代码与更多功能代码混合搭配:

If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:

struct RunLength<I> {
    iter: I,
    saved: Option<char>,
}

impl<I> RunLength<I>
where
    I: Iterator<Item = char>,
{
    fn new(mut iter: I) -> Self {
        let saved = iter.next(); // See footnote 1
        Self { iter, saved }
    }
}

impl<I> Iterator for RunLength<I>
where
    I: Iterator<Item = char>,
{
    type Item = (char, usize);

    fn next(&mut self) -> Option<Self::Item> {
        let c = self.saved.take().or_else(|| self.iter.next())?;

        let mut count = 1;
        while let Some(n) = self.iter.next() {
            if n == c {
                count += 1
            } else {
                self.saved = Some(n);
                break;
            }
        }

        Some((c, count))
    }
}

pub fn encode_tiny(data: &str) -> String {
    use std::fmt::Write;

    RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
        match count {
            1 => s.push(c),
            n => write!(&mut s, "{}{}", n, c).unwrap(),
        }
        s
    })
}

1 -感谢 Stargateur指出了,他们急切地希望第一个值有助于分支预测.

1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.

4MiB随机字母数字数据的基准,使用RUSTFLAGS='-C target-cpu=native'编译:

A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native':

encode (procedural)     time:   [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

encode (tiny)           time:   [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

我相信,这更清楚地显示了这两种实现之间的主要根本区别:基于迭代器的解决方案是 resumable .每次调用next时,我们都需要查看是否有以前读过的字符(self.saved).这会为程序代码中不存在的代码添加一个分支.

I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next, we need to see if there was a previous character that we've read (self.saved). This adds a branch to the code that isn't there in the procedural code.

另一方面,基于迭代器的解决方案更加灵活-我们现在可以对数据进行各种转换,或者直接写入文件而不是String,等等.可以扩展自定义迭代器也可以对通用类型(而不是char)进行操作,从而使其非常灵活.

On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String, etc. The custom iterator can be extended to operate on a generic type instead of char as well, making it very flexible.

另请参阅:

如果我想编写高性能代码,是否应该使用这种功能样式?

If I want to write high performance code, should I ever use this functional style?

我会的,直到基准测试表明这是瓶颈.然后评估为什么是瓶颈.

I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.

总是要展示你的作品,对吧?

Always got to show your work, right?

benchmark.rs

use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;

fn criterion_benchmark(c: &mut Criterion) {
    let data = rand_data(4 * 1024 * 1024);

    c.bench_function("encode (procedural)", {
        let data = data.clone();
        move |b| b.iter(|| encode_proc(&data))
    });

    c.bench_function("encode (functional)", {
        let data = data.clone();
        move |b| b.iter(|| encode_iter(&data))
    });

    c.bench_function("encode (fast)", {
        let data = data.clone();
        move |b| b.iter(|| encode_slim(&data))
    });

    c.bench_function("encode (tiny)", {
        let data = data.clone();
        move |b| b.iter(|| encode_tiny(&data))
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

lib.rs

use itertools::Itertools; // 0.8.0
use rand; // 0.6.5

pub fn rand_data(len: usize) -> String {
    use rand::distributions::{Alphanumeric, Distribution};
    let mut rng = rand::thread_rng();
    Alphanumeric.sample_iter(&mut rng).take(len).collect()
}

pub fn encode_proc(source: &str) -> String {
    let mut retval = String::new();
    let firstchar = source.chars().next();
    let mut currentchar = match firstchar {
        Some(x) => x,
        None => return retval,
    };
    let mut currentcharcount: u32 = 0;
    for c in source.chars() {
        if c == currentchar {
            currentcharcount += 1;
        } else {
            if currentcharcount > 1 {
                retval.push_str(&currentcharcount.to_string());
            }
            retval.push(currentchar);
            currentchar = c;
            currentcharcount = 1;
        }
    }
    if currentcharcount > 1 {
        retval.push_str(&currentcharcount.to_string());
    }
    retval.push(currentchar);
    retval
}

pub fn encode_iter(data: &str) -> String {
    data.chars()
        .group_by(|&c| c)
        .into_iter()
        .map(|(c, group)| match group.count() {
            1 => c.to_string(),
            n => format!("{}{}", n, c),
        })
        .collect()
}

pub fn encode_slim(data: &str) -> String {
    data.chars()
        .batching(|it| {
            it.next()
                .map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
        })
        .format_with("", |(c, count), f| match count {
            1 => f(&c),
            n => f(&format_args!("{}{}", n, c)),
        })
        .to_string()
}

struct RunLength<I> {
    iter: I,
    saved: Option<char>,
}

impl<I> RunLength<I>
where
    I: Iterator<Item = char>,
{
    fn new(mut iter: I) -> Self {
        let saved = iter.next();
        Self { iter, saved }
    }
}

impl<I> Iterator for RunLength<I>
where
    I: Iterator<Item = char>,
{
    type Item = (char, usize);

    fn next(&mut self) -> Option<Self::Item> {
        let c = self.saved.take().or_else(|| self.iter.next())?;

        let mut count = 1;
        while let Some(n) = self.iter.next() {
            if n == c {
                count += 1
            } else {
                self.saved = Some(n);
                break;
            }
        }

        Some((c, count))
    }
}

pub fn encode_tiny(data: &str) -> String {
    use std::fmt::Write;

    RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
        match count {
            1 => s.push(c),
            n => write!(&mut s, "{}{}", n, c).unwrap(),
        }
        s
    })
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn all_the_same() {
        let data = rand_data(1024);

        let a = encode_proc(&data);
        let b = encode_iter(&data);
        let c = encode_slim(&data);
        let d = encode_tiny(&data);

        assert_eq!(a, b);
        assert_eq!(a, c);
        assert_eq!(a, d);
    }
}

这篇关于“功能性" Rust对性能有何影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆