Rust vs python 程序性能结果问题 [英] Rust vs python program performance results question

查看:66
本文介绍了Rust vs python 程序性能结果问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个计算字数的程序.

I wrote a program that count words.

这是程序

use std::collections::HashMap;
use std::io;
use std::io::prelude::*;

#[derive(Debug)]
struct Entry {
    word: String,
    count: u32,
}

static SEPARATORS: &'static [char] = &[
    ' ', ',', '.', '!', '?', '\'', '"', '\n', '(', ')', '#', '{', '}', '[', ']', '-', ';', ':',
];

fn main() {
    if let Err(err) = try_main() {
        if err.kind() == std::io::ErrorKind::BrokenPipe {
            return;
        }
        // Ignore any error that may occur while writing to stderr.
        let _ = writeln!(std::io::stderr(), "{}", err);
    }
}

fn try_main() -> Result<(), std::io::Error> {
    let mut words: HashMap<String, u32> = HashMap::new();
    let stdin = io::stdin();
    for result in stdin.lock().lines() {
        let line = result?;
        line_processor(line, &mut words)
    }
    output(&mut words)?;
    Ok(())
}

fn line_processor(line: String, words: &mut HashMap<String, u32>) {
    let mut word = String::new();

    for c in line.chars() {
        if SEPARATORS.contains(&c) {
            add_word(word, words);
            word = String::new();
        } else {
            word.push_str(&c.to_string());
        }
    }
}

fn add_word(word: String, words: &mut HashMap<String, u32>) {
    if word.len() > 0 {
        if words.contains_key::<str>(&word) {
            words.insert(word.to_string(), words.get(&word).unwrap() + 1);
        } else {
            words.insert(word.to_string(), 1);
        }
        // println!("word >{}<", word.to_string())
    }
}

fn output(words: &mut HashMap<String, u32>) -> Result<(), std::io::Error> {
    let mut stack = Vec::<Entry>::new();

    for (k, v) in words {
        stack.push(Entry {
            word: k.to_string(),
            count: *v,
        });
    }

    stack.sort_by(|a, b| b.count.cmp(&a.count));
    stack.reverse();

    let stdout = io::stdout();
    let mut stdout = stdout.lock();
    while let Some(entry) = stack.pop() {
        writeln!(stdout, "{}\t{}", entry.count, entry.word)?;
    }
    Ok(())
}

将一些任意文本文件作为输入并计算单词以产生一些输出,例如:

It some arbitrary text file as input and counts words to produce some output like :

15  the
14  in
11  are
10  and
10  of
9   species
9   bats
8   horseshoe
8   is
6   or
6   as
5   which
5   their

我是这样编译的:

cargo build --release

我是这样运行的:

cat wiki-sample.txt | ./target/release/wordstats  | head -n 50

我使用的 wiki-sample.txt 文件是这里

wiki-sample.txt file I use is here

我将执行时间与 python (3.8) 版本进行了比较:

I compared execution time with the python (3.8) version which is:

import sys
from collections import defaultdict

# import unidecode

seps = set(
    [
        " ",
        ",",
        ".",
        "!",
        "?",
        "'",
        '"',
        "\n",
        "(",
        ")",
        "#",
        "{",
        "}",
        "[",
        "]",
        "-",
        ";",
        ":",
    ]
)


def out(result):
    for i in result:
        print(f"{i[1]}\t{i[0]}")


if __name__ == "__main__":
    c = defaultdict(int)

    for line in sys.stdin:
        words = line.split(" ")
        for word in words:
            clean_word = []
            for char in word:
                if char not in seps and char:
                    clean_word.append(char)
            r = "".join(clean_word)
            # r = unidecode.unidecode(r)
            if r:
                c[r] += 1

    r = sorted(list(c.items()), key=lambda x: -x[1])
    try:
        out(r)
    except BrokenPipeError as e:
        pass

我是这样运行的:

cat /tmp/t.txt | ./venv/bin/python3 src/main.py | head -n 100

  • 平均计算时间为:rust ->5',python3.8 ->19'
  • python 版本(我认为)不太优化(整条线上的拆分需要额外的 O(n))
  • 这是一个单线程进程,一个非常简单的程序
  • 大部分计算时间都在字循环处理中,输出几乎是即时的.
  • 我还删除了删除重音符号的库代码,以便更接近两种语言的标准库.
  • 问题:rust 执行仅"执行是否正常?~3-4 倍更好?

    Question : Is it normal that rust performs "only" ~3-4 times better ?

    我还想知道我是否在这里遗漏了一些东西,因为我发现仅"的计算时间很长.100Mb 数据.我不认为(天真地)有一些处理使用较低的大 O,我可能错了.

    I am also wondering if I am not missing something here because I find computation time is quite long for "only" 100Mb data. I don't think (naively) there is some processing with a lower big O for this, I might be wrong.

    我习惯于将一些 python 代码与 go、java 或 vlang 中的一些等效代码进行比较,并且我经常将这些工作台的速度提高到 20 倍到 100 倍.

    I am used to compare some python code to some equivalent in go, java or vlang and I often have something like 20x to 100x factor speed for these benches.

    也许 cpython 擅长这种处理,也许我错过了 rust 程序中的一些东西(我对 rust 很陌生)以使其更有效率.

    Maybe cpython is good at this kind of processing, maybe I miss something in rust program (I am very new to rust) to make it more efficient.

    我害怕在测试中错过一些重要的东西,但有没有想过这个?

    I am frighten to miss something big in my tests, but any thought about this ?

    遵循人们的建议,我现在有以下版本:

    following folks advices, I have now version below:

    use std::collections::HashMap;
    use std::io;
    use std::io::prelude::*;
    
    #[derive(Debug)]
    struct Entry<'a> {
        word: &'a str, // word: String,
        count: u32,
    }
    
    static SEPARATORS: &'static [char] = &[
        ' ', ',', '.', '!', '?', '\'', '"', '\n', '(', ')', '#', '{', '}', '[', ']', '-', ';', ':',
    ];
    
    fn main() {
        if let Err(err) = try_main() {
            if err.kind() == std::io::ErrorKind::BrokenPipe {
                return;
            }
            // Ignore any error that may occur while writing to stderr.
            let _ = writeln!(std::io::stderr(), "{}", err);
        }
    }
    
    fn try_main() -> Result<(), std::io::Error> {
        let mut words: HashMap<String, u32> = HashMap::new();
        let stdin = io::stdin();
        for result in stdin.lock().lines() {
            let line = result?;
            line_processor(line, &mut words)
        }
        output(&mut words)?;
        Ok(())
    }
    
    fn line_processor(line: String, words: &mut HashMap<String, u32>) {
        let mut l = line.as_str();
        loop {
            if let Some(pos) = l.find(|c: char| SEPARATORS.contains(&c)) {
                let (head, tail) = l.split_at(pos);
                add_word(head.to_owned(), words);
                l = &tail[1..];
            } else {
                break;
            }
        }
    }
    
    fn add_word(word: String, words: &mut HashMap<String, u32>) {
        if word.len() > 0 {
            let count = words.entry(word).or_insert(0);
            *count += 1;
        }
    }
    
    fn output(words: &mut HashMap<String, u32>) -> Result<(), std::io::Error> {
        let mut stack = Vec::<Entry>::new();
    
        for (k, v) in words {
            stack.push(Entry {
                word: k.as_str(), // word: k.to_string(),
                count: *v,
            });
        }
    
        stack.sort_by(|a, b| a.count.cmp(&b.count));
    
        let stdout = io::stdout();
        let mut stdout = stdout.lock();
        while let Some(entry) = stack.pop() {
            writeln!(stdout, "{}\t{}", entry.count, entry.word)?;
        }
        Ok(())
    }
    
    

    现在在我的电脑上需要 2.6'.这比 python 版本好得多,快了近 10 倍,python 版本非常好,但仍然不是我所期望的(这不是真正的问题).可能还有其他一些我暂时没有想到的优化.

    Which takes arount 2.6' on my computer now. This is way better and almost 10 times faster than python version which is very better but still not what I expected (that is not a real problem). There might be some other optimisations that I does not have in mind for now.

    推荐答案

    add_word() 中,您可以通过 word (.to_string()).

    In add_word() you circumvent the borrowing problems with new copies of word (.to_string()).

    对于所有要递增的计数器,您只需访问一次.

    You could just access once for all the counter you want to increment.

    let count = words.entry(word).or_insert(0);
    *count += 1;
    

    您还可以通过直接在行上作为 &str 工作来避免 line_processor() 中的许多字符串重新分配.

    You could also avoid many string reallocations in line_processor() by working directly on the line as a &str.

    let mut l = line.as_str();
    loop {
        if let Some(pos) = l.find(|c: char| SEPARATORS.contains(&c)) {
            let (head, tail) = l.split_at(pos);
            add_word(head.to_owned(), words);
            l = &tail[1..];
        } else {
            break;
        }
    }
    

    当涉及到 output() 函数时,执行字符串的新副本以初始化 Entry 结构.我们可以将 Entry 改为

    When it comes to the output() function new copies of the strings are performed in order to initialise the Entry struct. We could change Entry to

    #[derive(Debug)]
    struct Entry<'a> {
        word: &'a str,  // word: String,
        count: u32,
    }
    

    然后只处理原始字符串中的 &str(在 words 中).

    and then only work on the &str inside the original strings (in words).

    stack.push(Entry {
        word: k.as_str(), // word: k.to_string(),
        count: *v,
    });
    

    此外,如果我们反转排序标准,就可以避免排序向量的就地反转.

    Moreover, the inplace reverse of the sorted vector can be avoided if we invert the sorting criterion.

    stack.sort_by(|a, b| a.count.cmp(&b.count));
    // stack.reverse();
    

    我想这些是这个例子中的主要瓶颈.

    I guess these are the main bottlenecks in this example.

    在我的计算机上,使用 <wiki-sample.txt >/dev/null 进行计时可提供以下加速:

    On my computer, timing with <wiki-sample.txt >/dev/null gives these speedups:

    original -->  × 1 (reference)
    using l.find()+l.split_at() --> × 1.48
    using words.entry() --> × 1.25
    using both l.find()+l.split_at() and words.entry() --> × 1.73
    using all the preceding and &str in Entry and avoiding reverse --> x 2.05
    

    这篇关于Rust vs python 程序性能结果问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆