compiler-development
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCompiler Development Skill
编译器开发技能
This skill provides comprehensive knowledge of building compilers and language implementations using the LLVM infrastructure.
本技能提供了使用LLVM基础设施构建编译器和语言实现的全面知识。
Compiler Architecture Overview
编译器架构概述
Classic Three-Phase Design
经典三阶段设计
Source Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code
↓ ↓ ↓
AST/IR LLVM IR Passes Target CodeSource Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code
↓ ↓ ↓
AST/IR LLVM IR Passes Target CodeFrontend Development
前端开发
Lexical Analysis
词法分析
cpp
// Token types for a simple language
enum class TokenKind {
Identifier,
Number,
String,
Keyword,
Operator,
Punctuation,
EndOfFile
};
struct Token {
TokenKind kind;
std::string value;
SourceLocation location;
};cpp
// Token types for a simple language
enum class TokenKind {
Identifier,
Number,
String,
Keyword,
Operator,
Punctuation,
EndOfFile
};
struct Token {
TokenKind kind;
std::string value;
SourceLocation location;
};Parser Implementation
解析器实现
- Recursive Descent: Easy to implement, good error messages
- Operator Precedence Parsing: Efficient for expression parsing
- LALR/LR: Use tools like Bison for complex grammars
- 递归下降法:易于实现,错误提示友好
- 运算符优先级解析:高效处理表达式解析
- LALR/LR:使用Bison等工具处理复杂语法
AST Design
AST设计
cpp
class Expr {
public:
virtual ~Expr() = default;
virtual llvm::Value* codegen() = 0;
};
class BinaryExpr : public Expr {
std::unique_ptr<Expr> LHS, RHS;
char Op;
public:
llvm::Value* codegen() override {
llvm::Value* L = LHS->codegen();
llvm::Value* R = RHS->codegen();
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '/': return Builder.CreateFDiv(L, R, "divtmp");
}
}
};cpp
class Expr {
public:
virtual ~Expr() = default;
virtual llvm::Value* codegen() = 0;
};
class BinaryExpr : public Expr {
std::unique_ptr<Expr> LHS, RHS;
char Op;
public:
llvm::Value* codegen() override {
llvm::Value* L = LHS->codegen();
llvm::Value* R = RHS->codegen();
switch (Op) {
case '+': return Builder.CreateFAdd(L, R, "addtmp");
case '-': return Builder.CreateFSub(L, R, "subtmp");
case '*': return Builder.CreateFMul(L, R, "multmp");
case '/': return Builder.CreateFDiv(L, R, "divtmp");
}
}
};LLVM IR Generation
LLVM IR生成
Module and Context Setup
模块与上下文设置
cpp
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/IRBuilder.h"
class CodeGen {
std::unique_ptr<llvm::LLVMContext> Context;
std::unique_ptr<llvm::Module> Module;
std::unique_ptr<llvm::IRBuilder<>> Builder;
public:
CodeGen() {
Context = std::make_unique<llvm::LLVMContext>();
Module = std::make_unique<llvm::Module>("my_module", *Context);
Builder = std::make_unique<llvm::IRBuilder<>>(*Context);
}
};cpp
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/IRBuilder.h"
class CodeGen {
std::unique_ptr<llvm::LLVMContext> Context;
std::unique_ptr<llvm::Module> Module;
std::unique_ptr<llvm::IRBuilder<>> Builder;
public:
CodeGen() {
Context = std::make_unique<llvm::LLVMContext>();
Module = std::make_unique<llvm::Module>("my_module", *Context);
Builder = std::make_unique<llvm::IRBuilder<>>(*Context);
}
};Function Generation
函数生成
cpp
llvm::Function* createFunction(const std::string& name,
llvm::Type* returnType,
std::vector<llvm::Type*> params) {
llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false);
llvm::Function* F = llvm::Function::Create(
FT, llvm::Function::ExternalLinkage, name, Module.get());
llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
Builder->SetInsertPoint(BB);
return F;
}cpp
llvm::Function* createFunction(const std::string& name,
llvm::Type* returnType,
std::vector<llvm::Type*> params) {
llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false);
llvm::Function* F = llvm::Function::Create(
FT, llvm::Function::ExternalLinkage, name, Module.get());
llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
Builder->SetInsertPoint(BB);
return F;
}JIT Compilation
JIT编译
LLVM ORC JIT
LLVM ORC JIT
cpp
#include "llvm/ExecutionEngine/Orc/LLJIT.h"
auto JIT = llvm::orc::LLJITBuilder().create();
if (!JIT) {
handleError(JIT.takeError());
}
// Add module
(*JIT)->addIRModule(llvm::orc::ThreadSafeModule(
std::move(Module), std::move(Context)));
// Look up symbol and execute
auto Sym = (*JIT)->lookup("main");
auto* MainFn = (int(*)())Sym->getAddress();
int result = MainFn();cpp
#include "llvm/ExecutionEngine/Orc/LLJIT.h"
auto JIT = llvm::orc::LLJITBuilder().create();
if (!JIT) {
handleError(JIT.takeError());
}
// Add module
(*JIT)->addIRModule(llvm::orc::ThreadSafeModule(
std::move(Module), std::move(Context)));
// Look up symbol and execute
auto Sym = (*JIT)->lookup("main");
auto* MainFn = (int(*)())Sym->getAddress();
int result = MainFn();Optimization Pass Pipeline
优化Pass流水线
New Pass Manager (Recommended)
推荐使用新Pass管理器
cpp
#include "llvm/Passes/PassBuilder.h"
void optimizeModule(llvm::Module& M) {
llvm::PassBuilder PB;
llvm::LoopAnalysisManager LAM;
llvm::FunctionAnalysisManager FAM;
llvm::CGSCCAnalysisManager CGAM;
llvm::ModuleAnalysisManager MAM;
PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
llvm::OptimizationLevel::O2);
MPM.run(M, MAM);
}cpp
#include "llvm/Passes/PassBuilder.h"
void optimizeModule(llvm::Module& M) {
llvm::PassBuilder PB;
llvm::LoopAnalysisManager LAM;
llvm::FunctionAnalysisManager FAM;
llvm::CGSCCAnalysisManager CGAM;
llvm::ModuleAnalysisManager MAM;
PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
llvm::OptimizationLevel::O2);
MPM.run(M, MAM);
}Custom Pass Implementation
自定义Pass实现
cpp
struct MyPass : public llvm::PassInfoMixin<MyPass> {
llvm::PreservedAnalyses run(llvm::Function& F,
llvm::FunctionAnalysisManager& FAM) {
for (auto& BB : F) {
for (auto& I : BB) {
// Transform instructions
}
}
return llvm::PreservedAnalyses::none();
}
};cpp
struct MyPass : public llvm::PassInfoMixin<MyPass> {
llvm::PreservedAnalyses run(llvm::Function& F,
llvm::FunctionAnalysisManager& FAM) {
for (auto& BB : F) {
for (auto& I : BB) {
// Transform instructions
}
}
return llvm::PreservedAnalyses::none();
}
};Language Implementation Patterns
语言实现模式
Memory-Safe Languages
内存安全语言
- Use LLVM's memory sanitizer hooks
- Implement bounds checking with GEP introspection
- Reference counting or garbage collection integration
- 使用LLVM的内存消毒器钩子
- 通过GEP内省实现边界检查
- 集成引用计数或垃圾回收机制
Type Systems
类型系统
- Implement type inference during AST construction
- Generate appropriate LLVM types (i32, float, struct, ptr)
- Handle generic types via monomorphization or boxing
- 在AST构建期间实现类型推断
- 生成合适的LLVM类型(i32、float、struct、ptr)
- 通过单态化或装箱处理泛型类型
Error Handling
错误处理
- Generate exception handling via LLVM's landingpad/invoke
- Implement Result/Option types as tagged unions
- Use LLVM's personality functions for unwinding
- 通过LLVM的landingpad/invoke生成异常处理代码
- 将Result/Option类型实现为标记联合
- 使用LLVM的个性函数进行栈展开
Notable Language Implementations
知名语言实现案例
Systems Languages
系统级语言
- Rust: Complex borrow checker, trait system → LLVM
- Zig: Comptime evaluation, safety features
- Carbon: C++ interop, modern syntax
- Rust:复杂的借用检查器、trait系统 → 基于LLVM
- Zig:编译期求值、安全特性
- Carbon:C++互操作性、现代语法
Scripting Languages
脚本语言
- Julia: JIT-compiled scientific computing
- Crystal: Ruby-like syntax, static typing
- Nim: Python-like, multi-backend
- Julia:JIT编译的科学计算语言
- Crystal:类Ruby语法、静态类型
- Nim:类Python语法、多后端支持
Domain-Specific
领域特定语言
- Solidity: Ethereum smart contracts
- MLIR: Multi-level IR for ML/AI workloads
- Halide: Image processing DSL
- Solidity:以太坊智能合约语言
- MLIR:面向ML/AI工作负载的多级IR
- Halide:图像处理DSL
Development Workflow
开发工作流
- Start Simple: Begin with Kaleidoscope tutorial
- Incremental Features: Add one language feature at a time
- Test Extensively: Unit tests for each compiler phase
- Use LLVM Tools: opt, llc, llvm-dis for debugging IR
- Profile and Optimize: Focus on common code patterns
- 从简入手:从Kaleidoscope教程开始
- 增量添加特性:一次添加一个语言特性
- 全面测试:为每个编译器阶段编写单元测试
- 使用LLVM工具:使用opt、llc、llvm-dis调试IR
- 性能分析与优化:聚焦常见代码模式
Resources
资源
Official Tutorials
官方教程
- LLVM Kaleidoscope: Building a language from scratch
- Clang internals: Frontend implementation patterns
- Writing an LLVM Backend: Target code generation
- LLVM Kaleidoscope:从零构建一门语言
- Clang内部机制:前端实现模式
- 编写LLVM后端:目标代码生成
Community Projects
社区项目
See DIY Compiler section in README.md for 100+ example implementations across different language paradigms.
查看README.md中的DIY Compiler章节,包含100+个不同语言范式的示例实现。
Getting Detailed Information
获取详细信息
When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.mdThis README contains comprehensive curated lists of:
- 100+ DIY compiler implementations (DIY Compiler section)
- Toolchain configurations and IDE setup
- Compiler development tutorials and books
当你需要详细且最新的资源链接、工具列表或项目参考时,可从以下地址获取最新数据:
https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md该README包含全面的精选列表:
- 100+个DIY编译器实现(DIY Compiler章节)
- 工具链配置与IDE设置
- 编译器开发教程与书籍