具有约束的专业化 [英] Specialization with Constraints
问题描述
我有问题让GHC专门研究一个带有类约束的函数。我在这里有一个我的问题的最小例子: Foo.hs 和 Main.hs 。这两个文件会被编译(GHC 7.6.2, ghc -O3 Main
)并运行。
注意:
Foo.hs
真的被剥离下来。如果你想知道为什么需要约束,你可以在这里看到更多的代码。如果我将代码放入单个文件或进行其他很小的更改,GHC只需将 plusFastCyc
的调用内联。这不会在真实代码中发生,因为即使标记为 INLINE
,GHC内联也会使 plusFastCyc
过大。关键是专门调用 plusFastCyc
,而不是内联它。 plusFastCyc
在真实代码中的很多地方被调用,所以即使我强制GHC这样做,复制这样一个大函数也是不理想的。
感兴趣的代码是 Foo.hs
中的 plusFastCyc
,转载于此处:
{ - #INLINEABLE plusFastCyc# - }
{ - #SPECIALIZE plusFastCyc ::
forall m。 (Factored m Int)=>
(FastCyc(VT U.Vector m)Int) - >
(FastCyc(VT U.Vector m)Int) - >
(FastCyc(VT U.Vector m)Int)# - }
- 虽然接下来的专业化使'fcTest'变得更快,
- 对于我在我的真实程序中,因为幻像类型M被指定为
- { - #SPECIALIZE plusFastCyc ::
- FastCyc(VT U.Vector M)Int - >
- FastCyc(VT U.Vector M)Int - >
- FastCyc(VT U.Vector M)Int# - }
plusFastCyc ::(Num(t r))=> (FastCyc t r) - > (FastCyc t r) - > (FastCyc tr)
plusFastCyc(PowBasis v1)(PowBasis v2)= PowBasis $ v1 + v2
< Main.hs
文件有两个驱动程序: vtTest
,运行时间约为3秒, fcTest
,当用-O3使用 forall
'd专业化进行编译时,其运行时间大约为83秒。
核心显示对于 vtTest
test,附加代码专用于在 Int
s等中的 Unboxed
向量,而泛型向量代码用于 fcTest
。
在第10行,您可以看到GHC确实编写了一个专门版本的 plusFastCyc
,与167行的通用版本相比。
我认为这条规则应该在第270行触发。( main6
调用 iterate main8 y
,所以 main8
是其中 plusFastCyc
应该是专用的。)
我的目标是通过专门化 plusFastCyc $ c,使
与 vtTest
一样快$ C>。我发现了两种方法:
- Explicity调用
inline
fromGHC.Exts
位于fcTest
。 - 移除
Factored m Int
约束于plusFastCyc
。
选项1并不令人满意,因为在实际的代码库中 plusFastCyc
是一个经常使用的操作和一个 大型函数,所以不应该在每个使用。相反,GHC应该调用 plusFastCyc
的专用版本。选项2并不是真正的选择,因为我需要实际代码中的约束。
我尝试了各种使用(而不是使用) INLINE
, INLINABLE
和 SPECIALIZE
,但似乎没有任何效果。 (编辑:我可能已经删除了太多 plusFastCyc
以使我的示例变小,因此 INLINE
可能会导致函数被内联,这在我的真实代码中不会发生,因为 plusFastCyc
非常大。)在这个特定的例子中,我没有任何 match_co:需要更多病例
或 RULE: LHS太复杂了,不能解除
(和 here )警告,尽管在最小化示例之前,我得到了许多 match_co
警告。据推测,问题是规则中的 Factored m Int
约束条件;如果我对该约束进行了更改, fcTest
的运行速度与 vtTest
一样快。
我正在做什么GHC只是不喜欢?为什么GHC不会专注于 plusFastCyc
,我该如何制作它?
这个问题在GHC 7.8.2中仍然存在,所以这个问题仍然是相关的。
GHC还为 SPECIALIZE
一个类型实例声明提供了一个选项。我试着用(扩展的) Foo.hs
代码加入以下内容:
实例(Num r,V.Vector vr,Factored mr)=> Num(VT vmr)其中
{ - #SPECIALIZE实例(Factored m Int => Num(VT U.Vector m Int))# - }
VT x + VT y = VT $ V.zipWith (+)xy
然而,这一改变没有达到理想的加速。实现这种性能改进的是手动为具有相同函数定义的类型 VT U.Vector m Int
添加专用实例,如下所示:
instance(Factored m Int)=> Num(VT U.Vector m Int)其中
VT x + VT y = VT $ V.zipWith(+)xy
这需要在 I'm having problems getting GHC to specialize a function with a class constraint. I have a minimal example of my problem here: Foo.hs and Main.hs. The two files compile (GHC 7.6.2, NOTE:
The code of interest is the The The core shows that for the My goal is to make Option 1 is unsatisfactory because in the actual code base I've tried a variety of options using (and not using) Am I doing something GHC just doesn't like? Why won't GHC specialize the UPDATE The problem persists in GHC 7.8.2, so this question is still relevant. GHC also gives an option to This change, though, did not achieve the desired speedup. What did achieve that performance improvement was manually adding a specialized instance for the type This requires adding Interestingly, in the example program, the speedup obtained with the overlapping instance remains even if you remove every 这篇关于具有约束的专业化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! LANGUAGE 中添加
OverlappingInstances
和 FlexibleInstances
code>。有趣的是,在示例程序中,即使删除了每个 SPECIALIZE
,使用重叠实例获得的加速仍然存在,并且 INLINABLE
编译指示。 ghc -O3 Main
) and run.Foo.hs
is really stripped down. If you want to see why the constraint is needed, you can see a little more code here. If I put the code in a single file or make many other minor changes, GHC simply inlines the call to plusFastCyc
. This will not happen in the real code because plusFastCyc
is too large for GHC to inline, even when marked INLINE
. The point is to specialize the call to plusFastCyc
, not inline it. plusFastCyc
is called in many places in the real code, so duplicating such a large function would not be desirable even if I could force GHC to do it.plusFastCyc
in Foo.hs
, reproduced here:{-# INLINEABLE plusFastCyc #-}
{-# SPECIALIZE plusFastCyc ::
forall m . (Factored m Int) =>
(FastCyc (VT U.Vector m) Int) ->
(FastCyc (VT U.Vector m) Int) ->
(FastCyc (VT U.Vector m) Int) #-}
-- Although the next specialization makes `fcTest` fast,
-- it isn't useful to me in my real program because the phantom type M is reified
-- {-# SPECIALIZE plusFastCyc ::
-- FastCyc (VT U.Vector M) Int ->
-- FastCyc (VT U.Vector M) Int ->
-- FastCyc (VT U.Vector M) Int #-}
plusFastCyc :: (Num (t r)) => (FastCyc t r) -> (FastCyc t r) -> (FastCyc t r)
plusFastCyc (PowBasis v1) (PowBasis v2) = PowBasis $ v1 + v2
Main.hs
file has two drivers: vtTest
, which runs in ~3 seconds, and fcTest
, which runs in ~83 seconds when compiled with -O3 using the forall
'd specialization.vtTest
test, the addition code is being specialized to Unboxed
vectors over Int
s, etc, while generic vector code is used for fcTest
.
On line 10, you can see that GHC does write a specialized version of plusFastCyc
, compared to the generic version on line 167.
The rule for the specialization is on line 225. I believe this rule should fire on line 270. (main6
calls iterate main8 y
, so main8
is where plusFastCyc
should be specialized.)fcTest
as fast as vtTest
by specializing plusFastCyc
. I've found two ways to do this:
inline
from GHC.Exts
in fcTest
.Factored m Int
constraint on plusFastCyc
.plusFastCyc
is a frequently used operation and a very large function, so it should not be inlined at every use. Rather, GHC should call a specialized version of plusFastCyc
. Option 2 is not really an option because I need the constraint in the real code.INLINE
, INLINABLE
, and SPECIALIZE
, but nothing seems to work. (EDIT: I may have stripped out too much of plusFastCyc
to make my example small, so INLINE
might cause the function to be inlined. This doesn't happen in my real code because plusFastCyc
is so large.) In this particular example, I'm not getting any match_co: needs more cases
or RULE: LHS too complicated to desugar
(and here) warnings, though I was getting many match_co
warnings before minimizing the example. Presumably, the "problem" is the Factored m Int
constraint in the rule; if I make changes to that constraint, fcTest
runs as fast as vtTest
.plusFastCyc
, and how can I make it?SPECIALIZE
a type-class instance declaration. I tried this with the (expanded) code of Foo.hs
, by putting the following: instance (Num r, V.Vector v r, Factored m r) => Num (VT v m r) where
{-# SPECIALIZE instance ( Factored m Int => Num (VT U.Vector m Int)) #-}
VT x + VT y = VT $ V.zipWith (+) x y
VT U.Vector m Int
with the same function definitions, as follows: instance (Factored m Int) => Num (VT U.Vector m Int) where
VT x + VT y = VT $ V.zipWith (+) x y
OverlappingInstances
and FlexibleInstances
in LANGUAGE
. SPECIALIZE
and INLINABLE
pragma.