practical-haskell

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Practical Haskell (GHC)

实用Haskell编程(基于GHC)

Use this skill when the task is Haskell code quality, performance, or reasoning about evaluation. Assume GHC with optimizations (
-O
/
-O2
) unless the user says otherwise.
当涉及Haskell代码质量、性能优化或求值逻辑分析时,可使用本技能。除非用户特别说明,否则默认使用开启优化(
-O
/
-O2
)的GHC编译器。

Core ideas

核心思想

  • Purity lets the compiler rewrite code safely; prefer explicit effects in
    IO
    or appropriate abstraction.
  • Lazy by default: values are evaluated when needed. That enables composition but can hide space leaks.
  • Types catch many bugs early; use them to encode intent (including
    newtype
    for domain distinctions).
  • Know what GHC emits: when performance matters, treat Core (
    -ddump-simpl
    ) as ground truth after optimization.
  • 纯函数允许编译器安全地重写代码;优先在
    IO
    或合适的抽象中显式处理副作用。
  • 默认惰性求值:值仅在需要时才会被计算。这支持灵活的代码组合,但可能隐藏空间泄漏问题。
  • 类型系统可提前捕获大量错误;利用类型来表达设计意图(包括使用
    newtype
    区分领域概念)。
  • 了解GHC的输出:当性能至关重要时,将优化后的Core代码(通过
    -ddump-simpl
    生成)作为分析的依据。

Always

务必遵循

  • Be explicit about strict vs lazy data and bindings when modeling accumulators, parsers, or long-lived state.
  • Prefer
    foldl'
    from
    Data.List
    (or strict folds from the right library) for numeric accumulation over plain
    foldl
    on strict values.
  • Profile (
    profiling
    , eventlog,
    ghc-debug
    , etc.) before micro-optimizing.
  • Write small composable functions; rely on inlining and specialization rather than giant monoliths.
  • Use fusion-friendly pipelines (
    map
    ,
    filter
    ,
    foldr
    -based idioms) where appropriate; validate hot paths in Core if allocation matters.
  • 当建模累加器、解析器或长期存在的状态时,显式指定数据和绑定的严格/惰性属性。
  • 对严格类型的值进行数值累加时,优先使用
    Data.List
    中的**
    foldl'
    **(或对应库中的严格折叠函数),而非普通的
    foldl
  • 进行微优化前先做性能分析(使用profiling、eventlog、
    ghc-debug
    等工具)。
  • 编写小巧且可组合的函数;依赖内联和特化机制,而非庞大的单体函数。
  • 适当时使用支持融合的流水线(基于
    map
    filter
    foldr
    的惯用写法);若内存分配是问题,需在Core代码中验证热点路径。

Never

切勿触碰

  • Accidentally build large chains of thunks (classic
    foldl (+) 0
    on large strict sums).
  • Ignore space leaks from unevaluated structure holding onto memory.
  • Micro-optimize without evidence from profiling or Core.
  • Treat laziness as universally good or bad; decide per use case.
  • 避免意外创建大型thunk链(例如对大量严格数值使用
    foldl (+) 0
    的典型错误)。
  • 不要忽略因未求值结构占用内存导致的空间泄漏
  • 不要在没有性能分析或Core代码依据的情况下进行微优化
  • 不要将惰性求值视为绝对的好或坏;需根据具体场景决定。

Prefer

优先选择

  • Strict fields (
    !
    ) on accumulator-like constructor fields;
    UNPACK
    for small unboxed numeric fields when profiling supports it.
  • Newtypes for zero-runtime-cost distinctions vs
    data
    with a single field.
  • INLINE
    /
    INLINABLE
    /
    SPECIALIZE
    on hot polymorphic glue when dictionaries or lack of specialization shows up in Core.
  • Worker/wrapper style: a strict internal worker and a small external API.
  • Monomorphic hot loops when polymorphism still costs after specialization attempts.
  • 对类似累加器的构造器字段使用严格字段
    !
    );当性能分析表明有必要时,对小型未装箱数值字段使用**
    UNPACK
    **。
  • 如需零运行时开销的类型区分,优先使用**
    newtype
    **而非单字段的
    data
  • 当Core代码中出现类型类字典或缺乏特化的情况时,对热点多态粘合代码使用**
    INLINE
    /
    INLINABLE
    /
    SPECIALIZE
    **编译指令。
  • 采用Worker/wrapper风格:内部使用严格的worker函数,对外提供小巧的API。
  • 若尝试特化后多态性仍有性能开销,对热点循环使用单态实现。

Laziness and strictness

惰性与严格求值

haskell
import Data.List (foldl')

-- Infinite lists are fine when consumption is bounded.
naturals :: [Integer]
naturals = [1..]

firstTen :: [Integer]
firstTen = take 10 naturals

-- foldl on strict arithmetic often leaks thunks; foldl' forces as it goes.
badSum :: [Int] -> Int
badSum = foldl (+) 0

goodSum :: [Int] -> Int
goodSum = foldl' (+) 0
Bang patterns (
{-# LANGUAGE BangPatterns #-}
) force evaluation of a binding; use at strategic places (accumulators, fields that must not retain thunks).
Strict fields on
data
constructors evaluate to WHNF when the constructor is entered; combine with profiling to avoid over-forcing.
haskell
import Data.List (foldl')

-- 当消费是有界的时,无限列表是可行的。
naturals :: [Integer]
naturals = [1..]

firstTen :: [Integer]
firstTen = take 10 naturals

-- 对严格算术使用foldl通常会产生thunk泄漏;foldl'会在计算过程中强制求值。
badSum :: [Int] -> Int
badSum = foldl (+) 0

goodSum :: [Int] -> Int
goodSum = foldl' (+) 0
Bang模式
{-# LANGUAGE BangPatterns #-}
)可强制绑定的求值;应在关键位置使用(如累加器、不能保留thunk的字段)。
data
构造器上的严格字段会在构造器被调用时求值为弱头范式(WHNF);需结合性能分析避免过度强制求值。

Fusion and lists

列表融合

List pipelines like
sum . map f . filter p
often fuse under
-O2
into a single loop. If allocation persists, inspect Core. Avoid forcing materialization unnecessarily (e.g. redundant
length
or indexing on huge intermediates in hot code).
GHC applies rewrite rules internally; custom
{-# RULES #-}
is advanced and must be validated (correctness and phase interactions).
类似
sum . map f . filter p
的列表流水线在
-O2
优化下通常会融合为单个循环。若仍存在内存分配问题,需检查Core代码。避免不必要地强制实例化(例如在热点代码中对大型中间列表冗余调用
length
或索引操作)。
GHC内部会应用重写规则;自定义
{-# RULES #-}
属于高级用法,必须进行验证(包括正确性和阶段交互)。

Newtypes

Newtype

haskell
newtype UserId = UserId Int
  deriving (Eq, Ord, Show)

newtype Email = Email String
  deriving (Eq, Show)
Use
newtype
for distinct types with identical representation.
GeneralizedNewtypeDeriving
can derive classes when appropriate and policy allows.
haskell
newtype UserId = UserId Int
  deriving (Eq, Ord, Show)

newtype Email = Email String
  deriving (Eq, Show)
当需要表示相同但逻辑不同的类型时,使用
newtype
。在合适且符合规范的情况下,可使用
GeneralizedNewtypeDeriving
派生类型类实例。

Specialization and inlining

特化与内联

Polymorphic hot code may pass type-class dictionaries. Mitigations:
  • Give a monomorphic variant for the hot path.
  • Use
    {-# SPECIALIZE #-}
    for concrete instantiations.
  • Use
    {-# INLINABLE #-}
    on small polymorphic helpers so call sites can specialize.
Verify with Core, not assumptions.
热点多态代码可能会传递类型类字典。缓解方法:
  • 为热点路径提供单态变体。
  • 对具体实例化使用**
    {-# SPECIALIZE #-}
    **。
  • 对小型多态辅助函数使用**
    {-# INLINABLE #-}
    **,以便调用点可以进行特化。
需通过Core代码验证,而非仅凭假设。

Reading Core (quick guide)

阅读Core代码(快速指南)

Compile with something like
ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures YourModule.hs
(flags vary by need).
  • case
    usually forces evaluation; extra
    let
    bindings can mean allocation.
  • Look for fusion: one tight recursive loop vs multiple passes.
  • Check whether dictionary calls remain in hot loops.
使用类似如下命令编译:
ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures YourModule.hs
(具体标志可按需调整)。
  • **
    case
    表达式通常会强制求值;额外的
    let
    **绑定可能意味着内存分配。
  • 检查是否融合:是单个紧凑的递归循环还是多轮遍历。
  • 检查热点循环中是否仍存在字典调用

Mental checklist

思维检查清单

  1. When does each subexpression get forced?
  2. Where might thunks retain memory (closures, lazy fields,
    foldl
    -style accumulation)?
  3. Will this pipeline fuse or allocate intermediates?
  4. What does simplified Core show for the hot path?
  5. Is the hot code monomorphic and specialized?
  1. 何时每个子表达式会被强制求值?
  2. 何处可能存在thunk占用内存(闭包、惰性字段、
    foldl
    风格的累加)?
  3. 该流水线是否会融合,还是会分配中间列表?
  4. 简化后的Core代码对热点路径的显示结果是什么?
  5. 热点代码是否为单态且已特化?

Signature moves (when profiling says so)

关键优化手段(当性能分析表明需要时)

  • Strict accumulators:
    foldl'
    , bang patterns, strict fields.
  • UNPACK
    small strict numeric fields to reduce indirection.
  • INLINE
    /
    INLINABLE
    /
    SPECIALIZE
    to recover specialization.
  • Fusion-friendly combinators; avoid accidental intermediate lists in inner loops.
  • Worker/wrapper refactor for clearer strict internals.
  • Re-check Core after each change.
  • 严格累加器:使用
    foldl'
    、bang模式、严格字段。
  • 对小型严格数值字段使用
    UNPACK
    以减少间接引用。
  • 使用
    INLINE
    /
    INLINABLE
    /
    SPECIALIZE
    恢复特化效果。
  • 使用支持融合的组合子;避免在内部循环中意外创建中间列表。
  • 采用Worker/wrapper重构,使内部严格逻辑更清晰。
  • 每次修改后重新检查Core代码。

Additional resources

额外资源

For extended examples and GHC flag recipes, see reference.md.
如需更多示例和GHC标志使用方法,请参考reference.md