parser-development
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePurpose
用途
Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.
在创建或修改Biome的解析器时使用本技能。涵盖使用ungrammar编写语法、词法分析器实现、错误恢复策略以及列表解析模式。
Prerequisites
前置条件
- Install required tools:
just install-tools - Understand the language syntax you're implementing
- Read for detailed concepts
crates/biome_parser/CONTRIBUTING.md
- 安装所需工具:
just install-tools - 了解你要实现的语言语法
- 阅读以了解详细概念
crates/biome_parser/CONTRIBUTING.md
Common Workflows
常见工作流
Create Grammar for New Language
为新语言创建语法
Create a file in (e.g., ):
.ungramxtask/codegen/html.ungram// html.ungram
// Legend:
// Name = -- non-terminal definition
// 'ident' -- token (terminal)
// A B -- sequence
// A | B -- alternation
// A* -- zero or more repetition
// (A (',' A)* ','?) -- repetition with separator and optional trailing comma
// A? -- zero or one repetition
// label:A -- suggested name for field
HtmlRoot = element*
HtmlElement =
'<'
tag_name: HtmlName
attributes: HtmlAttributeList
'>'
children: HtmlElementList
'<' '/' close_tag_name: HtmlName '>'
HtmlAttributeList = HtmlAttribute*
HtmlAttribute =
| HtmlSimpleAttribute
| HtmlBogusAttribute
HtmlSimpleAttribute =
name: HtmlName
'='
value: HtmlString
HtmlBogusAttribute = /* error recovery node */Naming conventions:
- Prefix all nodes with language name: ,
HtmlElementCssRule - Unions start with :
AnyAnyHtmlAttribute - Error recovery nodes use :
BogusHtmlBogusAttribute - Lists end with :
ListHtmlAttributeList - Lists are mandatory (never optional), empty by default
在目录下创建文件(例如):
xtask/codegen/.ungramhtml.ungram// html.ungram
// 说明:
// Name = -- 非终结符定义
// 'ident' -- 标记(终结符)
// A B -- 序列
// A | B -- 选择
// A* -- 零次或多次重复
// (A (',' A)* ','?) -- 带分隔符且可选 trailing 逗号的重复
// A? -- 零次或一次重复
// label:A -- 字段的建议名称
HtmlRoot = element*
HtmlElement =
'<'
tag_name: HtmlName
attributes: HtmlAttributeList
'>'
children: HtmlElementList
'<' '/' close_tag_name: HtmlName '>'
HtmlAttributeList = HtmlAttribute*
HtmlAttribute =
| HtmlSimpleAttribute
| HtmlBogusAttribute
HtmlSimpleAttribute =
name: HtmlName
'='
value: HtmlString
HtmlBogusAttribute = /* 错误恢复节点 */命名规范:
- 所有节点以语言名称为前缀:、
HtmlElementCssRule - 联合类型以开头:
AnyAnyHtmlAttribute - 错误恢复节点使用:
BogusHtmlBogusAttribute - 列表以结尾:
ListHtmlAttributeList - 列表是强制的(永远不可选),默认为空
Generate Parser from Grammar
从语法生成解析器
shell
undefinedshell
undefinedGenerate for specific language
为特定语言生成
just gen-grammar html
just gen-grammar html
Generate for multiple languages
为多种语言生成
just gen-grammar html css
just gen-grammar html css
Generate all grammars
生成所有语法
just gen-grammar
This creates:
- `biome_html_syntax/src/generated/` - Node definitions
- `biome_html_factory/src/generated/` - Node construction helpers
- Parser skeleton files (you'll implement the actual parsing logic)just gen-grammar
此操作会创建:
- `biome_html_syntax/src/generated/` - 节点定义
- `biome_html_factory/src/generated/` - 节点构造助手
- 解析器骨架文件(你需要实现实际的解析逻辑)Implement a Lexer
实现词法分析器
Create in your parser crate:
lexer/mod.rsrust
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};
pub(crate) struct HtmlLexer<'source> {
source: &'source str,
position: usize,
current_kind: HtmlSyntaxKind,
diagnostics: Vec<ParseDiagnostic>,
}
impl<'source> Lexer<'source> for HtmlLexer<'source> {
const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();
fn source(&self) -> &'source str {
self.source
}
fn current(&self) -> Self::Kind {
self.current_kind
}
fn position(&self) -> usize {
self.position
}
fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
// Implement token scanning logic
let start = self.position;
let kind = self.read_next_token();
self.current_kind = kind;
kind
}
// Implement other required methods...
}在你的parser crate中创建:
lexer/mod.rsrust
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};
pub(crate) struct HtmlLexer<'source> {
source: &'source str,
position: usize,
current_kind: HtmlSyntaxKind,
diagnostics: Vec<ParseDiagnostic>,
}
impl<'source> Lexer<'source> for HtmlLexer<'source> {
const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();
fn source(&self) -> &'source str {
self.source
}
fn current(&self) -> Self::Kind {
self.current_kind
}
fn position(&self) -> usize {
self.position
}
fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
// 实现标记扫描逻辑
let start = self.position;
let kind = self.read_next_token();
self.current_kind = kind;
kind
}
// 实现其他必需的方法...
}Implement Token Source
实现标记源
rust
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;
pub(crate) struct HtmlTokenSource<'src> {
lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}
impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
&mut self.lexer
}
}rust
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;
pub(crate) struct HtmlTokenSource<'src> {
lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}
impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
&mut self.lexer
}
}Write Parse Rules
编写解析规则
Example: Parsing an if statement:
rust
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;
fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
// Presence test - return Absent if not at 'if'
if !p.at(T![if]) {
return Absent;
}
let m = p.start();
// Parse required tokens
p.expect(T![if]);
p.expect(T!['(']);
// Parse required nodes with error recovery
parse_any_expression(p).or_add_diagnostic(p, expected_expression);
p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);
// Parse optional else clause
if p.at(T![else]) {
parse_else_clause(p).ok();
}
Present(m.complete(p, JS_IF_STATEMENT))
}示例:解析if语句:
rust
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;
fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
// 存在性检查 - 如果当前不是'if'则返回Absent
if !p.at(T![if]) {
return Absent;
}
let m = p.start();
// 解析必需的标记
p.expect(T![if]);
p.expect(T!['(']);
// 带错误恢复的必需节点解析
parse_any_expression(p).or_add_diagnostic(p, expected_expression);
p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);
// 解析可选的else子句
if p.at(T![else]) {
parse_else_clause(p).ok();
}
Present(m.complete(p, JS_IF_STATEMENT))
}Parse Lists with Error Recovery
带错误恢复的列表解析
Use for comma-separated lists:
ParseSeparatedListrust
struct ArrayElementsList;
impl ParseSeparatedList for ArrayElementsList {
type ParsedElement = CompletedMarker;
fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
parse_array_element(p)
}
fn is_at_list_end(&self, p: &mut Parser) -> bool {
// Stop at array closing bracket or file end
p.at(T![']']) || p.at(EOF)
}
fn recover(
&mut self,
p: &mut Parser,
parsed_element: ParsedSyntax<Self::ParsedElement>,
) -> RecoveryResult {
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(
JS_BOGUS_EXPRESSION,
token_set![T![']'], T![,]]
),
expected_array_element,
)
}
fn separating_element_kind(&mut self) -> JsSyntaxKind {
T![,]
}
}
// Use the list parser
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
let m = p.start();
ArrayElementsList.parse_list(p);
m.complete(p, JS_ARRAY_ELEMENT_LIST)
}使用处理逗号分隔的列表:
ParseSeparatedListrust
struct ArrayElementsList;
impl ParseSeparatedList for ArrayElementsList {
type ParsedElement = CompletedMarker;
fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
parse_array_element(p)
}
fn is_at_list_end(&self, p: &mut Parser) -> bool {
// 在数组闭合括号或文件结尾处停止
p.at(T![']']) || p.at(EOF)
}
fn recover(
&mut self,
p: &mut Parser,
parsed_element: ParsedSyntax<Self::ParsedElement>,
) -> RecoveryResult {
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(
JS_BOGUS_EXPRESSION,
token_set![T![']'], T![,]]
),
expected_array_element,
)
}
fn separating_element_kind(&mut self) -> JsSyntaxKind {
T![,]
}
}
// 使用列表解析器
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
let m = p.start();
ArrayElementsList.parse_list(p);
m.complete(p, JS_ARRAY_ELEMENT_LIST)
}Implement Error Recovery
实现错误恢复
Error recovery wraps invalid tokens in nodes:
BOGUSrust
// Recovery set includes:
// - List terminator tokens (e.g., ']', '}')
// - Statement terminators (e.g., ';')
// - List separators (e.g., ',')
let recovery_set = token_set![T![']'], T![,], T![;]];
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
expected_expression_error,
)错误恢复会将无效标记包裹在节点中:
BOGUSrust
// 恢复集合包括:
// - 列表终止标记(例如']'、'}')
// - 语句终止符(例如';')
// - 列表分隔符(例如',')
let recovery_set = token_set![T![']'], T![,], T![;]];
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
expected_expression_error,
)Handle Conditional Syntax
处理条件语法
For syntax only valid in certain contexts (e.g., strict mode):
rust
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// Mark as invalid in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder(
"`with` statements are not allowed in strict mode",
marker.range(p)
)
});
Present(conditional.or_invalid_to_bogus(p))
}针对仅在特定上下文中有效的语法(例如严格模式):
rust
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// 在严格模式下标记为无效
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder(
"`with`语句在严格模式下不被允许",
marker.range(p)
)
});
Present(conditional.or_invalid_to_bogus(p))
}Test Parser
测试解析器
Create test files in :
tests/crates/biome_html_parser/tests/
├── html_specs/
│ ├── ok/
│ │ ├── simple_element.html
│ │ └── nested_elements.html
│ └── error/
│ ├── unclosed_tag.html
│ └── invalid_syntax.html
└── html_test.rsRun tests:
shell
cd crates/biome_html_parser
cargo test在目录下创建测试文件:
tests/crates/biome_html_parser/tests/
├── html_specs/
│ ├── ok/
│ │ ├── simple_element.html
│ │ └── nested_elements.html
│ └── error/
│ ├── unclosed_tag.html
│ └── invalid_syntax.html
└── html_test.rs运行测试:
shell
cd crates/biome_html_parser
cargo testTips
提示
- Presence test: Always return if the first token doesn't match - never progress parsing before returning
AbsentAbsent - Required vs optional: Use for required tokens,
p.expect()for optional onesp.eat() - Missing markers: Use for required nodes to add missing markers and errors
.or_add_diagnostic() - Error recovery: Include list terminators, separators, and statement boundaries in recovery sets
- Bogus nodes: Check grammar for which node types are valid in your context
BOGUS_* - Checkpoints: Use to save state and
p.checkpoint()if parsing failsp.rewind() - Lookahead: Use to check tokens,
p.at()for lookahead beyond current tokenp.nth_at() - Lists are mandatory: Always create list nodes even if empty - use not
parse_list()parse_list().ok()
- 存在性检查:如果第一个标记不匹配,始终返回- 在返回
Absent前绝不要推进解析Absent - 必需与可选:对必需标记使用,对可选标记使用
p.expect()p.eat() - 缺失标记:对必需节点使用以添加缺失标记和错误
.or_add_diagnostic() - 错误恢复:在恢复集合中包含列表终止符、分隔符和语句边界
- Bogus节点:检查语法以确定在你的上下文中哪些节点类型是有效的
BOGUS_* - 检查点:使用保存状态,在解析失败时使用
p.checkpoint()回退p.rewind() - 前瞻:使用检查标记,使用
p.at()查看当前标记之后的前瞻标记p.nth_at() - 列表是强制的:即使为空也要始终创建列表节点 - 使用而非
parse_list()parse_list().ok()
Common Patterns
常见模式
rust
// Optional token
if p.eat(T![async]) {
// handle async
}
// Required token with error
p.expect(T!['{']);
// Optional node
parse_type_annotation(p).ok();
// Required node with error
parse_expression(p).or_add_diagnostic(p, expected_expression);
// Lookahead
if p.at(T![if]) || p.at(T![for]) {
// handle control flow
}
// Checkpoint for backtracking
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
p.rewind(checkpoint);
parse_something_else(p);
}rust
// 可选标记
if p.eat(T![async]) {
// 处理async
}
// 带错误的必需标记
p.expect(T!['{']);
// 可选节点
parse_type_annotation(p).ok();
// 带错误的必需节点
parse_expression(p).or_add_diagnostic(p, expected_expression);
// 前瞻
if p.at(T![if]) || p.at(T![for]) {
// 处理控制流
}
// 用于回溯的检查点
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
p.rewind(checkpoint);
parse_something_else(p);
}References
参考资料
- Full guide:
crates/biome_parser/CONTRIBUTING.md - Grammar examples:
xtask/codegen/*.ungram - Parser examples:
crates/biome_js_parser/src/syntax/ - Error recovery: Search for in existing parsers
ParseRecoveryTokenSet
- 完整指南:
crates/biome_parser/CONTRIBUTING.md - 语法示例:
xtask/codegen/*.ungram - 解析器示例:
crates/biome_js_parser/src/syntax/ - 错误恢复:在现有解析器中搜索
ParseRecoveryTokenSet