可能的优化或使用并行计算 [英] Optimization possible or use parallel computing

查看:139
本文介绍了可能的优化或使用并行计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个问题,我需要找到等于一个数字的权力总数。例如:

输入
100 2 会产生<$ c $的输出因为 100 = 10 ^ 2 = 6 ^ 2 + 8 ^ 2 = 1 ^ 2 + 3 ^ 2 + 4 ^ 2 + 5 ^ 2 + 7 ^ 2因此,c> 3 并且 100 3 的输入将产生1的输出,因为 100 = 1 ^ 3 + 2 ^ 3 + 3 ^ 3 + 4 ^ 3



所以我解决这个问题的函数是:

  findNums :: Int  - > Int  - > Int 
findNums a b = length [xs | xs< - (下降1.)子序列[pow x b | x < - [1..c]],foldr(+)(head xs)(tail xs)== a]其中c = root a b 0

root :: Int - > Int - > Int - > Int
root n a i
| pow i a< = n&& pow(i + 1)a> n = i
|否则=根n a(i + 1)

pow :: Int - > Int - > Int
pow_0 = 1
pow na = n * pow n(a - 1)

我找到了所有可能的值,这些值都可以在我的数字集合中加起来到所需的数字。然后我在该列表中找到所有可能的子列表,并查看其中有多少加起来为所需的数字。这是有效的,但由于它是一个详尽的搜索,所以需要很长时间才能输入 800 2 。是否有可能优化序列,使得只有可信的子序列被返回?或者为这类问题寻找并行计算更好?

解决方案

让我们浏览一下几个方面。 / p>

标杆管理



首先:让我们确保我们实际上在进行改进!为此,我们需要一些基准。 标准包非常适合这一点。我们还会确保使用优化进行编译(所有对GHC的调用都是 -O2 )。以下是设置基准的简单方法:

  import Criterion.Main 

- 您的代码(nf(uncurry findNums)(100,2))
,benchfindNums 800 2(nf(unfry findNums)(100,2))
main = defaultMain
[ uncurry findNums)(800,2))
]

nf(findNums 100)2 ,但我选择这种方式,这样我们就不会通过预先计算 100
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'$'
1.000R²(1.000R².. 1.000R²)
表示762.5 ns(760.4 ns .. 765.3 ns)
std dev 7.706 ns(6.378 ns .. 10.59 ns)

基准800 2
时间29.17 s(28.28 s .. 29.87 s)
1.000R²(1.000R².. 1.000R²)
表示29.26 s(29.08 s .. 29.35 s)
std dev 159.2 ms(0.0 s .. 165.2 ms)
由离群值引入的变化:19%(适度膨胀)



使用库



现在,低悬的成果是使用现有的事物实现,并希望他们的作者做得比我们更好。为此,我们将使用标准函数(^)而不是 pow arithmoi 包而不是> integerRoot 。另外,我们会将一个严格的 foldl 换成懒惰的 foldr 。为了我自己的理智,我也将这条很长的路线改为短路线。现在完整的结果如下所示:

  import Criterion.Main 
import Data.List
import Math .NumberTheory.Powers

sum':: Num a => [a] - > a
sum'= foldl'(+)0

findNums :: Int - > Int - > Int
findNums a b =长度
[xs
| xs< - 下降1。子序列$ [x ^ b | x< - [1..c]]
,sum'xs == a
]其中c = integerRoot ba

main = defaultMain
[bench] 100 2(nf(uncurry findNums)(100,2))
,替补800 2(nf(uncurry findNums)(800,2))
]


$ b

现在的基准测试结果如下:

 基准100 2 
时间722.8 ns(721.3 ns .. 724.3 ns)
1.000R²(1.000R².. 1.000R²)
表示722.6 ns(721.4 ns .. 724.1 ns)
std dev 4.440 ns(3.738 ns .. 5.674 ns)

基准线800 2
时间17.16 s(16.93 s .. 17.64 s)
1.000R²(1.000R² .. 1.000R²)
mean 17.05 s(16.99 s .. 17.15 s)
std dev 88.10 ms(0.0 s .. 94.58 ms)

稍微低一点点,花费很少。很好!

更好的算法



子序列的一个重大问题 sum'[x,y,z]>一个,我们仍然看所有以 [x,y,z] 开头的更长的子序列。给定子序列返回类型的结构,我们可以做的事情不多;所以让我们设计一个给我们更多结构的实现。我们将构建一棵树,其中从根节点到任何内部节点的路径都会给我们一个子序列。

  import Data.Tree 

subsequences :: [a] - > Forest a
subsequences [] = []
subsequences(x:xs)= Node x rest:rest其中
rest =子序列xs
pre
$ b

(仅仅为了好玩,由于激进的子树共享,这会产生指数级大的语义树,其空间使用量非常小 - 与原始列表的空间大致相同。 )这种表现最酷的是,如果我们中断搜索,我们会截断大量无趣的结果。这可以通过为列表实现类似于 takeWhile 的实现:

  takeWhileTree :: Monoid m => (m  - > Bool) - >森林m  - >森林m 
takeWhileTree谓词= goForest mempty其中
goForest m森林=森林>> = goTree m
goTree m(节点m'children)=
[节点m(goForest (m·m')个孩子)|谓词m']

让我们试试看。现在完整的代码是:

  import Criterion.Main 
import Data.Foldable
import Data.Monoid
import Data.Tree
import Math.NumberTheory.Powers

subsequencesTree :: [a] - > Forest a
subsequencesTree [] = []
subsequencesTree(x:xs)= Node x rest:rest其中
rest = subsequencesTree xs

takeWhileTree :: Monoid m => (m - > Bool) - >森林m - >森林m
takeWhileTree predicate = goForest mempty其中
goForest m forest = forest>> = goTree m
goTree m(节点m'children)= let m''= m<> ; m'in
[Node m'(goForest m''children)|谓词m'']

leaves :: Forest a - > [[a]]
leaves [] = [[]]
leaves forest = do
节点x孩子< - 森林
xs < - 留下孩子
return(x:xs)

findNums :: Int - > Int - > Int
findNums a b =长度
[xs
| xs < - 离开
。 takeWhileTree(<= Sum a)
。 subsequencesTree
$ [Sum(x ^ b)| x< - [1..c]]
,fold xs == Sum a
]其中c = integerRoot ba

main = defaultMain
[bench] 100 2(nf(uncurry findNums)(100,2))
,替补800 2(nf(uncurry findNums)(800,2))
]

这看起来像很多工作,但从时间上看,它确实值得回报:

 基准100 2 
时间16.67μs(16.53μs。16.77μs)
0.999R²(0.999R².. 1.000R²)
mean 16.60μs(16.52μs。16.72μs)
std dev 325.4 ns(270.5 ns .. 444.1 ns)异常值导致的
差异:17%(适度膨胀)

基准800 2
时间22.59毫秒(22.26毫秒.. 22.89毫秒)
0.999R²(0.999R².. 1.000R²)
平均值22.44毫秒(22.34毫秒.. 22.57毫秒)
std dev 260.3μs(191.6μs.. 332.2μs)

findNums 800 2 中,这是一个大约1000的加速因子。



并行化



我通过使用 concat parMap code> takeWhileTree 而不是(>> =),这样树的单独分支将同时进行探索。在任何情况下,并行化的开销远远超过拥有多个线程的好处。好的,我们在开始的时候就把这个基准放在了位置上!

I have this problem where I need to find the number of sums of powers that are equal to a number. So for example:

An input of 100 2 would yield an output of 3 because 100 = 10^2 = 6^2 + 8^2 = 1^2 + 3^2 + 4^2 + 5^2 + 7^2 and an input of 100 3 would yield an output of 1 because 100 = 1^3 + 2^3 + 3^3 + 4^3

So my function for solving this problem is:

findNums :: Int -> Int -> Int
findNums a b = length [xs | xs <- (drop 1 .) subsequences [pow x b | x <- [1..c]], foldr (+) (head xs) (tail xs) == a] where c = root a b 0 

root :: Int -> Int -> Int -> Int
root n a i
    | pow i a <= n && pow (i+1) a > n = i
    | otherwise = root n a (i+1)

pow :: Int -> Int -> Int
pow _ 0 = 1
pow n a = n * pow n (a - 1)

I find all the possible values that are able to be in my set of numbers that will add up to the desired number. Then I find all possible sublists inside that list and see how many of those add up to the desired number. This works but since it is an exhaustive search it takes a long time for inputs like 800 2. Is it possible to optimize the sequences such that only the "plausible" subsequences are returned? Or is it better to look at parallel computation for this sort of problem?

解决方案

Let's take a tour through a few things.

Benchmarking

First up: let's make sure we're actually making improvements as we go! For that, we'll need some benchmarks. The criterion package is perfect for this. We'll also make sure to compile with optimizations (so -O2 on all calls to GHC). Here's how simple setting up a benchmark can be:

import Criterion.Main

-- your code goes here

main = defaultMain
    [ bench "findNums 100 2" (nf (uncurry findNums) (100, 2))
    , bench "findNums 800 2" (nf (uncurry findNums) (800, 2))
    ]

One could also implement the benchmark as nf (findNums 100) 2, but I choose this way so that we can't "cheat" by precomputing a lookup table for 100, thus pushing all the work into the benchmark setup rather than the part where the benchmark is actually run. Here's the result for the original implementation:

benchmarking 100 2
time                 762.7 ns   (757.4 ns .. 768.5 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 762.5 ns   (760.4 ns .. 765.3 ns)
std dev              7.706 ns   (6.378 ns .. 10.59 ns)

benchmarking 800 2
time                 29.17 s    (28.28 s .. 29.87 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 29.26 s    (29.08 s .. 29.35 s)
std dev              159.2 ms   (0.0 s .. 165.2 ms)
variance introduced by outliers: 19% (moderately inflated)

Use libraries

Now, the low-hanging fruit is to use existing implementations of things and hope their authors did something better than us. To that end, we'll use the standard function (^) instead of pow, and integerRoot from the arithmoi package instead of root. Additionally, we'll swap out the lazy foldr for a strict foldl. For my own sanity, I also reformatted the very long line into short ones. The full result now looks like this:

import Criterion.Main
import Data.List
import Math.NumberTheory.Powers

sum' :: Num a => [a] -> a
sum' = foldl' (+) 0

findNums :: Int -> Int -> Int
findNums a b = length
    [ xs
    | xs <- drop 1 . subsequences $ [x ^ b | x <- [1..c]]
    , sum' xs == a
    ] where c = integerRoot b a

main = defaultMain
    [ bench "100 2" (nf (uncurry findNums) (100, 2))
    , bench "800 2" (nf (uncurry findNums) (800, 2))
    ]

Benchmark results look like this now:

benchmarking 100 2
time                 722.8 ns   (721.3 ns .. 724.3 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 722.6 ns   (721.4 ns .. 724.1 ns)
std dev              4.440 ns   (3.738 ns .. 5.674 ns)

benchmarking 800 2
time                 17.16 s    (16.93 s .. 17.64 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.05 s    (16.99 s .. 17.15 s)
std dev              88.10 ms   (0.0 s .. 94.58 ms)

A little under twice as fast with very little effort. Nice!

Better algorithm

A significant problem with subsequences is that, even if we compute that sum' [x,y,z] > a, we still look at all the longer subsequences that start with [x,y,z]. Given the structure of subsequences' return type, there's not much we can do about that; so let's design an implementation that gives us a bit more structure. We'll build a tree, where paths from the root to any internal node give us a subsequence.

import Data.Tree

subsequences :: [a] -> Forest a
subsequences [] = []
subsequences (x:xs) = Node x rest : rest where
    rest = subsequences xs

(Just for fun, this produces exponentially large semantic trees with very small space usage -- roughly as much space as the original list -- due to aggressive subtree sharing.) What's cool about this representation is if we break off the search, we cut off huge swaths of uninteresting results. This can be realized by implementing something like takeWhile for lists:

takeWhileTree :: Monoid m => (m -> Bool) -> Forest m -> Forest m
takeWhileTree predicate = goForest mempty where
    goForest m forest = forest >>= goTree m
    goTree   m (Node m' children) =
        [Node m (goForest (m <> m') children) | predicate m']

Let's give it a try. Complete code is now:

import Criterion.Main
import Data.Foldable
import Data.Monoid
import Data.Tree
import Math.NumberTheory.Powers

subsequencesTree :: [a] -> Forest a
subsequencesTree [] = []
subsequencesTree (x:xs) = Node x rest : rest where
    rest = subsequencesTree xs

takeWhileTree :: Monoid m => (m -> Bool) -> Forest m -> Forest m
takeWhileTree predicate = goForest mempty where
    goForest m forest = forest >>= goTree m
    goTree   m (Node m' children) = let m'' = m <> m' in
        [Node m' (goForest m'' children) | predicate m'']

leaves :: Forest a -> [[a]]
leaves [] = [[]]
leaves forest = do
    Node x children <- forest
    xs <- leaves children
    return (x:xs)

findNums :: Int -> Int -> Int
findNums a b = length
    [ xs
    | xs <- leaves
          . takeWhileTree (<= Sum a)
          . subsequencesTree
          $ [Sum (x ^ b) | x <- [1..c]]
    , fold xs == Sum a
    ] where c = integerRoot b a

main = defaultMain
    [ bench "100 2" (nf (uncurry findNums) (100, 2))
    , bench "800 2" (nf (uncurry findNums) (800, 2))
    ]

This looks like a lot of work, but from the timings, it really pays off:

benchmarking 100 2
time                 16.67 μs   (16.53 μs .. 16.77 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 16.60 μs   (16.52 μs .. 16.72 μs)
std dev              325.4 ns   (270.5 ns .. 444.1 ns)
variance introduced by outliers: 17% (moderately inflated)

benchmarking 800 2
time                 22.59 ms   (22.26 ms .. 22.89 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 22.44 ms   (22.34 ms .. 22.57 ms)
std dev              260.3 μs   (191.6 μs .. 332.2 μs)

That's a speedup factor of about 1000 on findNums 800 2.

Parallelization

I had a go at parallelizing this by using concat and parMap in takeWhileTree instead of (>>=), so that separate branches of the tree would be explored in parallel. In every case the overhead of parallelizing far outweighed the benefit of having several threads. Good thing we put that benchmark in place at the beginning!

这篇关于可能的优化或使用并行计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆