lattner-compiler-infrastructure

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Chris Lattner Style Guide

Chris Lattner 风格指南

Overview

概述

Chris Lattner created LLVM (the compiler infrastructure that powers most modern compilers), Clang (the C/C++/Objective-C frontend), Swift (Apple's systems language), and MLIR (multi-level intermediate representation). His work fundamentally changed how compilers are built and how languages evolve.
Chris Lattner 创建了LLVM(为多数现代编译器提供支持的编译器基础设施)、Clang(C/C++/Objective-C前端)、Swift(苹果的系统级语言)以及MLIR(多级中间表示)。他的工作从根本上改变了编译器的构建方式与语言的演进路径。

Core Philosophy

核心理念

"The key insight of LLVM is that compiler infrastructure should be reusable."
"Good IR design is about finding the right level of abstraction."
"Languages should evolve based on real-world usage, not theoretical purity."
Lattner believes in building robust, reusable infrastructure that enables an ecosystem of tools—not one-off solutions.
"LLVM的核心洞见在于编译器基础设施应当具备可复用性。"
"优秀的IR设计关键在于找到合适的抽象层级。"
"语言应当基于实际使用场景演进,而非追求理论纯粹性。"
Lattner 主张构建健壮、可复用的基础设施,以此支撑工具生态系统,而非打造一次性解决方案。

Design Principles

设计原则

  1. Modular Infrastructure: Build reusable components, not monolithic systems.
  2. Progressive Lowering: Transform through well-defined IR levels.
  3. Library-First Design: Compilers are libraries, not just executables.
  4. Pragmatic Evolution: Languages improve through real usage feedback.
  1. 模块化基础设施:构建可复用组件,而非单体式系统。
  2. 渐进式下推转换:通过定义清晰的IR层级完成转换。
  3. 优先库设计:编译器是库,而非仅为可执行文件。
  4. 务实演进:语言通过实际使用反馈逐步改进。

When Writing Compiler Code

编写编译器代码时的准则

Always

始终遵循

  • Design IRs with clear semantics and invariants
  • Make passes composable and reusable
  • Provide excellent diagnostics and error messages
  • Build infrastructure others can extend
  • Think about the entire compilation pipeline
  • Document design decisions and tradeoffs
  • 设计具备清晰语义与不变性的IR
  • 让编译过程(pass)具备可组合性与可复用性
  • 提供优质的诊断与错误信息
  • 构建可供他人扩展的基础设施
  • 考量整个编译流水线
  • 记录设计决策与权衡取舍

Never

绝对避免

  • Build closed, monolithic compiler architectures
  • Sacrifice usability for implementation convenience
  • Ignore error recovery and diagnostics
  • Let optimization passes have hidden dependencies
  • Couple frontend concerns with backend concerns
  • Design IRs without considering transformations
  • 构建封闭的单体式编译器架构
  • 为实现便利牺牲易用性
  • 忽略错误恢复与诊断功能
  • 让优化过程存在隐藏依赖
  • 将前端关注点与后端关注点耦合
  • 在设计IR时不考虑转换需求

Prefer

优先选择

  • SSA form for optimization IRs
  • Explicit type systems over implicit
  • Library APIs over command-line tools
  • Incremental compilation where possible
  • Clear phase ordering over ad-hoc passes
  • Compositional design over special cases
  • 针对优化IR使用SSA形式
  • 显式类型系统而非隐式类型系统
  • 库API而非命令行工具
  • 尽可能采用增量编译
  • 清晰的阶段排序而非临时编译过程
  • 组合式设计而非特殊用例

Code Patterns

代码模式

LLVM IR Philosophy

LLVM IR 设计理念

llvm
; LLVM IR: explicit, typed, SSA form
; Every value has exactly one definition
; Control flow is explicit

define i32 @factorial(i32 %n) {
entry:
  %cmp = icmp sle i32 %n, 1
  br i1 %cmp, label %base, label %recurse

base:
  ret i32 1

recurse:
  %n_minus_1 = sub i32 %n, 1
  %fact_sub = call i32 @factorial(i32 %n_minus_1)
  %result = mul i32 %n, %fact_sub
  ret i32 %result
}

; Key properties:
; - SSA: each %variable defined exactly once
; - Typed: every operation has explicit types
; - Explicit control flow: br, ret, etc.
; - No hidden state or side effects in IR
llvm
; LLVM IR: explicit, typed, SSA form
; Every value has exactly one definition
; Control flow is explicit

define i32 @factorial(i32 %n) {
entry:
  %cmp = icmp sle i32 %n, 1
  br i1 %cmp, label %base, label %recurse

base:
  ret i32 1

recurse:
  %n_minus_1 = sub i32 %n, 1
  %fact_sub = call i32 @factorial(i32 %n_minus_1)
  %result = mul i32 %n, %fact_sub
  ret i32 %result
}

; Key properties:
; - SSA: each %variable defined exactly once
; - Typed: every operation has explicit types
; - Explicit control flow: br, ret, etc.
; - No hidden state or side effects in IR

Pass Infrastructure Design

编译过程(Pass)基础设施设计

cpp
// LLVM-style pass infrastructure
// Passes are modular, composable, declarative

class MyOptimizationPass : public PassInfoMixin<MyOptimizationPass> {
public:
    PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
        // Get required analyses
        auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
        auto &LI = AM.getResult<LoopAnalysis>(F);
        
        bool Changed = false;
        
        for (auto &BB : F) {
            Changed |= optimizeBlock(BB, DT, LI);
        }
        
        if (!Changed)
            return PreservedAnalyses::all();
        
        // Declare what we preserved
        PreservedAnalyses PA;
        PA.preserve<DominatorTreeAnalysis>();
        return PA;
    }
    
private:
    bool optimizeBlock(BasicBlock &BB, DominatorTree &DT, LoopInfo &LI);
};

// Register the pass
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
    return {
        LLVM_PLUGIN_API_VERSION, "MyPass", "v0.1",
        [](PassBuilder &PB) {
            PB.registerPipelineParsingCallback(
                [](StringRef Name, FunctionPassManager &FPM,
                   ArrayRef<PassBuilder::PipelineElement>) {
                    if (Name == "my-opt") {
                        FPM.addPass(MyOptimizationPass());
                        return true;
                    }
                    return false;
                });
        }
    };
}
cpp
// LLVM-style pass infrastructure
// Passes are modular, composable, declarative

class MyOptimizationPass : public PassInfoMixin<MyOptimizationPass> {
public:
    PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
        // Get required analyses
        auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
        auto &LI = AM.getResult<LoopAnalysis>(F);
        
        bool Changed = false;
        
        for (auto &BB : F) {
            Changed |= optimizeBlock(BB, DT, LI);
        }
        
        if (!Changed)
            return PreservedAnalyses::all();
        
        // Declare what we preserved
        PreservedAnalyses PA;
        PA.preserve<DominatorTreeAnalysis>();
        return PA;
    }
    
private:
    bool optimizeBlock(BasicBlock &BB, DominatorTree &DT, LoopInfo &LI);
};

// Register the pass
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
    return {
        LLVM_PLUGIN_API_VERSION, "MyPass", "v0.1",
        [](PassBuilder &PB) {
            PB.registerPipelineParsingCallback(
                [](StringRef Name, FunctionPassManager &FPM,
                   ArrayRef<PassBuilder::PipelineElement>) {
                    if (Name == "my-opt") {
                        FPM.addPass(MyOptimizationPass());
                        return true;
                    }
                    return false;
                });
        }
    };
}

Diagnostic Excellence

优质诊断实现

cpp
// Swift/Clang-style diagnostics
// Errors should be helpful, not cryptic

class DiagnosticEngine {
public:
    // Structured diagnostics with fix-its
    void diagnose(SourceLoc Loc, Diagnostic Diag) {
        emitDiagnostic(Loc, Diag.getKind(), Diag.getMessage());
        
        // Show the source location
        emitSourceSnippet(Loc);
        
        // Provide fix-its when possible
        for (auto &FixIt : Diag.getFixIts()) {
            emitFixIt(FixIt);
        }
        
        // Add educational notes
        for (auto &Note : Diag.getNotes()) {
            emitNote(Note);
        }
    }
};

// Example diagnostic output:
// error: cannot convert value of type 'String' to expected type 'Int'
//     let x: Int = "hello"
//                  ^~~~~~~
// fix-it: did you mean to use Int(_:)?
//     let x: Int = Int("hello") ?? 0
cpp
// Swift/Clang-style diagnostics
// Errors should be helpful, not cryptic

class DiagnosticEngine {
public:
    // Structured diagnostics with fix-its
    void diagnose(SourceLoc Loc, Diagnostic Diag) {
        emitDiagnostic(Loc, Diag.getKind(), Diag.getMessage());
        
        // Show the source location
        emitSourceSnippet(Loc);
        
        // Provide fix-its when possible
        for (auto &FixIt : Diag.getFixIts()) {
            emitFixIt(FixIt);
        }
        
        // Add educational notes
        for (auto &Note : Diag.getNotes()) {
            emitNote(Note);
        }
    }
};

// Example diagnostic output:
// error: cannot convert value of type 'String' to expected type 'Int'
//     let x: Int = "hello"
//                  ^~~~~~~
// fix-it: did you mean to use Int(_:)?
//     let x: Int = Int("hello") ?? 0

Progressive Lowering (MLIR Style)

渐进式下推转换(MLIR 风格)

cpp
// MLIR: Multi-Level IR for progressive lowering
// High-level ops → Mid-level ops → Low-level ops → LLVM IR

// High-level: domain-specific operations
%result = linalg.matmul ins(%A, %B : tensor<4x8xf32>, tensor<8x16xf32>)
                        outs(%C : tensor<4x16xf32>) -> tensor<4x16xf32>

// After tiling transformation:
%tiled = scf.for %i = %c0 to %c4 step %c2 {
    %slice_a = tensor.extract_slice %A[%i, 0][2, 8][1, 1]
    %slice_c = tensor.extract_slice %C[%i, 0][2, 16][1, 1]
    %computed = linalg.matmul ins(%slice_a, %B) outs(%slice_c)
    scf.yield %computed
}

// After vectorization:
%vec = vector.contract {indexing_maps = [...], kind = #vector.kind<add>}
    %vec_a, %vec_b, %vec_c : vector<2x8xf32>, vector<8x16xf32> into vector<2x16xf32>

// Finally: LLVM IR
// Each level has clear semantics and transformations
cpp
// MLIR: Multi-Level IR for progressive lowering
// High-level ops → Mid-level ops → Low-level ops → LLVM IR

// High-level: domain-specific operations
%result = linalg.matmul ins(%A, %B : tensor<4x8xf32>, tensor<8x16xf32>)
                        outs(%C : tensor<4x16xf32>) -> tensor<4x16xf32>

// After tiling transformation:
%tiled = scf.for %i = %c0 to %c4 step %c2 {
    %slice_a = tensor.extract_slice %A[%i, 0][2, 8][1, 1]
    %slice_c = tensor.extract_slice %C[%i, 0][2, 16][1, 1]
    %computed = linalg.matmul ins(%slice_a, %B) outs(%slice_c)
    scf.yield %computed
}

// After vectorization:
%vec = vector.contract {indexing_maps = [...], kind = #vector.kind<add>}
    %vec_a, %vec_b, %vec_c : vector<2x8xf32>, vector<8x16xf32> into vector<2x16xf32>

// Finally: LLVM IR
// Each level has clear semantics and transformations

Type System Design

类型系统设计

swift
// Swift-style type system: expressive, safe, pragmatic

// Protocol-oriented design
protocol Numeric {
    static func +(lhs: Self, rhs: Self) -> Self
    static func *(lhs: Self, rhs: Self) -> Self
}

// Associated types for flexibility
protocol Collection {
    associatedtype Element
    associatedtype Index: Comparable
    
    var startIndex: Index { get }
    var endIndex: Index { get }
    subscript(position: Index) -> Element { get }
}

// Generics with constraints
func sum<T: Numeric>(_ values: [T]) -> T {
    values.reduce(.zero, +)
}

// Optionals as explicit nullability
func find<T: Equatable>(_ value: T, in array: [T]) -> Int? {
    for (index, element) in array.enumerated() {
        if element == value {
            return index
        }
    }
    return nil  // Explicit absence
}

// Result types for error handling
enum Result<Success, Failure: Error> {
    case success(Success)
    case failure(Failure)
}
swift
// Swift-style type system: expressive, safe, pragmatic

// Protocol-oriented design
protocol Numeric {
    static func +(lhs: Self, rhs: Self) -> Self
    static func *(lhs: Self, rhs: Self) -> Self
}

// Associated types for flexibility
protocol Collection {
    associatedtype Element
    associatedtype Index: Comparable
    
    var startIndex: Index { get }
    var endIndex: Index { get }
    subscript(position: Index) -> Element { get }
}

// Generics with constraints
func sum<T: Numeric>(_ values: [T]) -> T {
    values.reduce(.zero, +)
}

// Optionals as explicit nullability
func find<T: Equatable>(_ value: T, in array: [T]) -> Int? {
    for (index, element) in array.enumerated() {
        if element == value {
            return index
        }
    }
    return nil  // Explicit absence
}

// Result types for error handling
enum Result<Success, Failure: Error> {
    case success(Success)
    case failure(Failure)
}

Compiler as Library

编译器即库

cpp
// Clang as a library, not just a tool
// Enable building custom tools on compiler infrastructure

#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/Tooling.h"

// Custom AST visitor
class FunctionFinder : public RecursiveASTVisitor<FunctionFinder> {
public:
    bool VisitFunctionDecl(FunctionDecl *FD) {
        if (FD->hasBody()) {
            llvm::outs() << "Found function: " << FD->getName() << "\n";
            analyzeComplexity(FD);
        }
        return true;
    }
    
private:
    void analyzeComplexity(FunctionDecl *FD);
};

// Build custom tools using Clang's libraries
int main(int argc, const char **argv) {
    auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyCategory);
    if (!ExpectedParser) {
        llvm::errs() << ExpectedParser.takeError();
        return 1;
    }
    
    ClangTool Tool(ExpectedParser->getCompilations(),
                   ExpectedParser->getSourcePathList());
    
    return Tool.run(newFrontendActionFactory<MyFrontendAction>().get());
}
cpp
// Clang as a library, not just a tool
// Enable building custom tools on compiler infrastructure

#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/Tooling.h"

// Custom AST visitor
class FunctionFinder : public RecursiveASTVisitor<FunctionFinder> {
public:
    bool VisitFunctionDecl(FunctionDecl *FD) {
        if (FD->hasBody()) {
            llvm::outs() << "Found function: " << FD->getName() << "\n";
            analyzeComplexity(FD);
        }
        return true;
    }
    
private:
    void analyzeComplexity(FunctionDecl *FD);
};

// Build custom tools using Clang's libraries
int main(int argc, const char **argv) {
    auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyCategory);
    if (!ExpectedParser) {
        llvm::errs() << ExpectedParser.takeError();
        return 1;
    }
    
    ClangTool Tool(ExpectedParser->getCompilations(),
                   ExpectedParser->getSourcePathList());
    
    return Tool.run(newFrontendActionFactory<MyFrontendAction>().get());
}

Memory Ownership in Swift

Swift 中的内存所有权

swift
// Swift's ownership model: safe by default, explicit when needed

// Default: automatic reference counting
class Node {
    var value: Int
    var children: [Node]
    
    init(value: Int) {
        self.value = value
        self.children = []
    }
}

// Explicit ownership for performance-critical code
func processBuffer(_ buffer: borrowing [UInt8]) -> Int {
    // borrowing: read-only access, no copy
    buffer.reduce(0, +)
}

func consumeBuffer(_ buffer: consuming [UInt8]) -> [UInt8] {
    // consuming: takes ownership, no copy
    var result = buffer
    result.append(0)
    return result
}

// Copy-on-write for value semantics with efficiency
struct LargeData {
    private var storage: Storage
    
    mutating func modify() {
        // Copy only if shared
        if !isKnownUniquelyReferenced(&storage) {
            storage = storage.copy()
        }
        storage.data[0] = 42
    }
}
swift
// Swift's ownership model: safe by default, explicit when needed

// Default: automatic reference counting
class Node {
    var value: Int
    var children: [Node]
    
    init(value: Int) {
        self.value = value
        self.children = []
    }
}

// Explicit ownership for performance-critical code
func processBuffer(_ buffer: borrowing [UInt8]) -> Int {
    // borrowing: read-only access, no copy
    buffer.reduce(0, +)
}

func consumeBuffer(_ buffer: consuming [UInt8]) -> [UInt8] {
    // consuming: takes ownership, no copy
    var result = buffer
    result.append(0)
    return result
}

// Copy-on-write for value semantics with efficiency
struct LargeData {
    private var storage: Storage
    
    mutating func modify() {
        // Copy only if shared
        if !isKnownUniquelyReferenced(&storage) {
            storage = storage.copy()
        }
        storage.data[0] = 42
    }
}

IR Design Principles

IR 设计原则

Intermediate Representation Design
══════════════════════════════════════════════════════════════

Level           Abstraction         Purpose
────────────────────────────────────────────────────────────
Source          Syntax trees        Parsing, early semantic
AST/HIR         Typed trees         Type checking, inference
MIR/SIL         Typed CFG           Optimization, ownership
LLVM IR         Typed SSA           Machine-independent opt
Machine IR      Target ops          Instruction selection
Assembly        Text                Final output

Key principles:
• Each level has ONE clear purpose
• Lowering is progressive and well-defined
• Analyses valid at one level may not be at another
• Transformations declare their requirements
Intermediate Representation Design
══════════════════════════════════════════════════════════════

Level           Abstraction         Purpose
────────────────────────────────────────────────────────────
Source          Syntax trees        Parsing, early semantic
AST/HIR         Typed trees         Type checking, inference
MIR/SIL         Typed CFG           Optimization, ownership
LLVM IR         Typed SSA           Machine-independent opt
Machine IR      Target ops          Instruction selection
Assembly        Text                Final output

Key principles:
• Each level has ONE clear purpose
• Lowering is progressive and well-defined
• Analyses valid at one level may not be at another
• Transformations declare their requirements

Mental Model

思维模型

Lattner approaches compiler design by asking:
  1. What's the right abstraction level? Different problems need different IRs
  2. Is this reusable? Build infrastructure, not one-off tools
  3. What's the user experience? Diagnostics, error recovery, tooling
  4. How will this evolve? Design for change and extension
  5. Can others build on this? Library-first, composable design
Lattner 设计编译器时会思考以下问题:
  1. 合适的抽象层级是什么? 不同问题需要不同的IR
  2. 这是否可复用? 构建基础设施,而非一次性工具
  3. 用户体验如何? 诊断功能、错误恢复、工具链
  4. 它将如何演进? 为变更与扩展而设计
  5. 他人能否在此基础上构建? 优先库设计、组合式设计

Signature Lattner Moves

Lattner 的标志性设计

  • LLVM's pass manager: Modular, composable optimization passes
  • Clang's diagnostics: The gold standard for helpful error messages
  • Swift's optionals: Explicit nullability without verbosity
  • MLIR's dialect system: Multi-level IR with extensible operations
  • Library-first design: Compilers as reusable infrastructure
  • Progressive lowering: Clear transformation stages
  • SwiftUI's result builders: Compiler magic that feels natural
  • LLVM 的 Pass 管理器:模块化、可组合的优化过程
  • Clang 的诊断系统:优质错误信息的黄金标准
  • Swift 的可选类型(Optionals):无需冗余代码的显式空值处理
  • MLIR 的方言系统:具备可扩展操作的多级IR
  • 优先库设计:作为可复用基础设施的编译器
  • 渐进式下推转换:清晰的转换阶段
  • SwiftUI 的结果构建器:自然易用的编译器魔法