compiler-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Compiler Development Skill

编译器开发技能

This skill provides comprehensive knowledge of building compilers and language implementations using the LLVM infrastructure.
本技能提供了使用LLVM基础设施构建编译器和语言实现的全面知识。

Compiler Architecture Overview

编译器架构概述

Classic Three-Phase Design

经典三阶段设计

Source Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code
                ↓              ↓                      ↓
             AST/IR      LLVM IR Passes          Target Code
Source Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code
                ↓              ↓                      ↓
             AST/IR      LLVM IR Passes          Target Code

Frontend Development

前端开发

Lexical Analysis

词法分析

cpp
// Token types for a simple language
enum class TokenKind {
    Identifier,
    Number,
    String,
    Keyword,
    Operator,
    Punctuation,
    EndOfFile
};

struct Token {
    TokenKind kind;
    std::string value;
    SourceLocation location;
};
cpp
// Token types for a simple language
enum class TokenKind {
    Identifier,
    Number,
    String,
    Keyword,
    Operator,
    Punctuation,
    EndOfFile
};

struct Token {
    TokenKind kind;
    std::string value;
    SourceLocation location;
};

Parser Implementation

解析器实现

  • Recursive Descent: Easy to implement, good error messages
  • Operator Precedence Parsing: Efficient for expression parsing
  • LALR/LR: Use tools like Bison for complex grammars
  • 递归下降法:易于实现,错误提示友好
  • 运算符优先级解析:高效处理表达式解析
  • LALR/LR:使用Bison等工具处理复杂语法

AST Design

AST设计

cpp
class Expr {
public:
    virtual ~Expr() = default;
    virtual llvm::Value* codegen() = 0;
};

class BinaryExpr : public Expr {
    std::unique_ptr<Expr> LHS, RHS;
    char Op;
public:
    llvm::Value* codegen() override {
        llvm::Value* L = LHS->codegen();
        llvm::Value* R = RHS->codegen();
        
        switch (Op) {
            case '+': return Builder.CreateFAdd(L, R, "addtmp");
            case '-': return Builder.CreateFSub(L, R, "subtmp");
            case '*': return Builder.CreateFMul(L, R, "multmp");
            case '/': return Builder.CreateFDiv(L, R, "divtmp");
        }
    }
};
cpp
class Expr {
public:
    virtual ~Expr() = default;
    virtual llvm::Value* codegen() = 0;
};

class BinaryExpr : public Expr {
    std::unique_ptr<Expr> LHS, RHS;
    char Op;
public:
    llvm::Value* codegen() override {
        llvm::Value* L = LHS->codegen();
        llvm::Value* R = RHS->codegen();
        
        switch (Op) {
            case '+': return Builder.CreateFAdd(L, R, "addtmp");
            case '-': return Builder.CreateFSub(L, R, "subtmp");
            case '*': return Builder.CreateFMul(L, R, "multmp");
            case '/': return Builder.CreateFDiv(L, R, "divtmp");
        }
    }
};

LLVM IR Generation

LLVM IR生成

Module and Context Setup

模块与上下文设置

cpp
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/IRBuilder.h"

class CodeGen {
    std::unique_ptr<llvm::LLVMContext> Context;
    std::unique_ptr<llvm::Module> Module;
    std::unique_ptr<llvm::IRBuilder<>> Builder;
    
public:
    CodeGen() {
        Context = std::make_unique<llvm::LLVMContext>();
        Module = std::make_unique<llvm::Module>("my_module", *Context);
        Builder = std::make_unique<llvm::IRBuilder<>>(*Context);
    }
};
cpp
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/IRBuilder.h"

class CodeGen {
    std::unique_ptr<llvm::LLVMContext> Context;
    std::unique_ptr<llvm::Module> Module;
    std::unique_ptr<llvm::IRBuilder<>> Builder;
    
public:
    CodeGen() {
        Context = std::make_unique<llvm::LLVMContext>();
        Module = std::make_unique<llvm::Module>("my_module", *Context);
        Builder = std::make_unique<llvm::IRBuilder<>>(*Context);
    }
};

Function Generation

函数生成

cpp
llvm::Function* createFunction(const std::string& name, 
                                llvm::Type* returnType,
                                std::vector<llvm::Type*> params) {
    llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false);
    llvm::Function* F = llvm::Function::Create(
        FT, llvm::Function::ExternalLinkage, name, Module.get());
    
    llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
    Builder->SetInsertPoint(BB);
    
    return F;
}
cpp
llvm::Function* createFunction(const std::string& name, 
                                llvm::Type* returnType,
                                std::vector<llvm::Type*> params) {
    llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false);
    llvm::Function* F = llvm::Function::Create(
        FT, llvm::Function::ExternalLinkage, name, Module.get());
    
    llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
    Builder->SetInsertPoint(BB);
    
    return F;
}

JIT Compilation

JIT编译

LLVM ORC JIT

LLVM ORC JIT

cpp
#include "llvm/ExecutionEngine/Orc/LLJIT.h"

auto JIT = llvm::orc::LLJITBuilder().create();
if (!JIT) {
    handleError(JIT.takeError());
}

// Add module
(*JIT)->addIRModule(llvm::orc::ThreadSafeModule(
    std::move(Module), std::move(Context)));

// Look up symbol and execute
auto Sym = (*JIT)->lookup("main");
auto* MainFn = (int(*)())Sym->getAddress();
int result = MainFn();
cpp
#include "llvm/ExecutionEngine/Orc/LLJIT.h"

auto JIT = llvm::orc::LLJITBuilder().create();
if (!JIT) {
    handleError(JIT.takeError());
}

// Add module
(*JIT)->addIRModule(llvm::orc::ThreadSafeModule(
    std::move(Module), std::move(Context)));

// Look up symbol and execute
auto Sym = (*JIT)->lookup("main");
auto* MainFn = (int(*)())Sym->getAddress();
int result = MainFn();

Optimization Pass Pipeline

优化Pass流水线

New Pass Manager (Recommended)

推荐使用新Pass管理器

cpp
#include "llvm/Passes/PassBuilder.h"

void optimizeModule(llvm::Module& M) {
    llvm::PassBuilder PB;
    llvm::LoopAnalysisManager LAM;
    llvm::FunctionAnalysisManager FAM;
    llvm::CGSCCAnalysisManager CGAM;
    llvm::ModuleAnalysisManager MAM;
    
    PB.registerModuleAnalyses(MAM);
    PB.registerCGSCCAnalyses(CGAM);
    PB.registerFunctionAnalyses(FAM);
    PB.registerLoopAnalyses(LAM);
    PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
    
    llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
        llvm::OptimizationLevel::O2);
    MPM.run(M, MAM);
}
cpp
#include "llvm/Passes/PassBuilder.h"

void optimizeModule(llvm::Module& M) {
    llvm::PassBuilder PB;
    llvm::LoopAnalysisManager LAM;
    llvm::FunctionAnalysisManager FAM;
    llvm::CGSCCAnalysisManager CGAM;
    llvm::ModuleAnalysisManager MAM;
    
    PB.registerModuleAnalyses(MAM);
    PB.registerCGSCCAnalyses(CGAM);
    PB.registerFunctionAnalyses(FAM);
    PB.registerLoopAnalyses(LAM);
    PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
    
    llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
        llvm::OptimizationLevel::O2);
    MPM.run(M, MAM);
}

Custom Pass Implementation

自定义Pass实现

cpp
struct MyPass : public llvm::PassInfoMixin<MyPass> {
    llvm::PreservedAnalyses run(llvm::Function& F, 
                                 llvm::FunctionAnalysisManager& FAM) {
        for (auto& BB : F) {
            for (auto& I : BB) {
                // Transform instructions
            }
        }
        return llvm::PreservedAnalyses::none();
    }
};
cpp
struct MyPass : public llvm::PassInfoMixin<MyPass> {
    llvm::PreservedAnalyses run(llvm::Function& F, 
                                 llvm::FunctionAnalysisManager& FAM) {
        for (auto& BB : F) {
            for (auto& I : BB) {
                // Transform instructions
            }
        }
        return llvm::PreservedAnalyses::none();
    }
};

Language Implementation Patterns

语言实现模式

Memory-Safe Languages

内存安全语言

  • Use LLVM's memory sanitizer hooks
  • Implement bounds checking with GEP introspection
  • Reference counting or garbage collection integration
  • 使用LLVM的内存消毒器钩子
  • 通过GEP内省实现边界检查
  • 集成引用计数或垃圾回收机制

Type Systems

类型系统

  • Implement type inference during AST construction
  • Generate appropriate LLVM types (i32, float, struct, ptr)
  • Handle generic types via monomorphization or boxing
  • 在AST构建期间实现类型推断
  • 生成合适的LLVM类型(i32、float、struct、ptr)
  • 通过单态化或装箱处理泛型类型

Error Handling

错误处理

  • Generate exception handling via LLVM's landingpad/invoke
  • Implement Result/Option types as tagged unions
  • Use LLVM's personality functions for unwinding
  • 通过LLVM的landingpad/invoke生成异常处理代码
  • 将Result/Option类型实现为标记联合
  • 使用LLVM的个性函数进行栈展开

Notable Language Implementations

知名语言实现案例

Systems Languages

系统级语言

  • Rust: Complex borrow checker, trait system → LLVM
  • Zig: Comptime evaluation, safety features
  • Carbon: C++ interop, modern syntax
  • Rust:复杂的借用检查器、trait系统 → 基于LLVM
  • Zig:编译期求值、安全特性
  • Carbon:C++互操作性、现代语法

Scripting Languages

脚本语言

  • Julia: JIT-compiled scientific computing
  • Crystal: Ruby-like syntax, static typing
  • Nim: Python-like, multi-backend
  • Julia:JIT编译的科学计算语言
  • Crystal:类Ruby语法、静态类型
  • Nim:类Python语法、多后端支持

Domain-Specific

领域特定语言

  • Solidity: Ethereum smart contracts
  • MLIR: Multi-level IR for ML/AI workloads
  • Halide: Image processing DSL
  • Solidity:以太坊智能合约语言
  • MLIR:面向ML/AI工作负载的多级IR
  • Halide:图像处理DSL

Development Workflow

开发工作流

  1. Start Simple: Begin with Kaleidoscope tutorial
  2. Incremental Features: Add one language feature at a time
  3. Test Extensively: Unit tests for each compiler phase
  4. Use LLVM Tools: opt, llc, llvm-dis for debugging IR
  5. Profile and Optimize: Focus on common code patterns
  1. 从简入手:从Kaleidoscope教程开始
  2. 增量添加特性:一次添加一个语言特性
  3. 全面测试:为每个编译器阶段编写单元测试
  4. 使用LLVM工具:使用opt、llc、llvm-dis调试IR
  5. 性能分析与优化:聚焦常见代码模式

Resources

资源

Official Tutorials

官方教程

  • LLVM Kaleidoscope: Building a language from scratch
  • Clang internals: Frontend implementation patterns
  • Writing an LLVM Backend: Target code generation
  • LLVM Kaleidoscope:从零构建一门语言
  • Clang内部机制:前端实现模式
  • 编写LLVM后端:目标代码生成

Community Projects

社区项目

See DIY Compiler section in README.md for 100+ example implementations across different language paradigms.
查看README.md中的DIY Compiler章节,包含100+个不同语言范式的示例实现。

Getting Detailed Information

获取详细信息

When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
This README contains comprehensive curated lists of:
  • 100+ DIY compiler implementations (DIY Compiler section)
  • Toolchain configurations and IDE setup
  • Compiler development tutorials and books
当你需要详细且最新的资源链接、工具列表或项目参考时,可从以下地址获取最新数据:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md
该README包含全面的精选列表:
  • 100+个DIY编译器实现(DIY Compiler章节)
  • 工具链配置与IDE设置
  • 编译器开发教程与书籍