powershell-utf8-fixer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PowerShell UTF-8 Fixer

PowerShell UTF-8 修复工具

Problem

问题

PowerShell on Windows requires UTF-8 with BOM encoding for scripts containing non-ASCII characters (Korean, Chinese, Japanese, emoji, etc.). Without the BOM (Byte Order Mark), PowerShell interprets the file using the system's default encoding (typically CP949 on Korean Windows), causing character display issues.
Symptoms:
  • Korean text appears as
    ???
    or garbled characters
  • Emoji and special characters don't display correctly
  • Write-Host
    output shows corrupted text
  • One script works fine while another with identical code shows garbled text
Root cause: File encoding mismatch
  • ✓ UTF-8 with BOM (EF BB BF): PowerShell reads correctly
  • ✗ UTF-8 without BOM: PowerShell uses system default encoding → garbled text
Windows系统上的PowerShell要求包含非ASCII字符(韩语、中文、日语、表情符号等)的脚本采用带BOM的UTF-8编码。如果没有BOM(字节顺序标记),PowerShell会使用系统默认编码(韩语Windows通常为CP949)来解析文件,从而导致字符显示异常。
症状:
  • 韩语文本显示为
    ???
    或乱码
  • 表情符号和特殊字符无法正常显示
  • Write-Host
    输出的文本损坏
  • 两段代码完全相同的脚本,一段正常显示,另一段出现乱码
根本原因: 文件编码不匹配
  • ✓ 带BOM的UTF-8(EF BB BF):PowerShell可正确读取
  • ✗ 不带BOM的UTF-8:PowerShell使用系统默认编码 → 乱码

Quick Fix

快速修复

When encountering encoding issues in PowerShell scripts:
  1. Check encoding:
bash
node scripts/check_powershell_encoding.js <file_or_directory>
遇到PowerShell脚本编码问题时:
  1. 检查编码:
bash
node scripts/check_powershell_encoding.js <file_or_directory>

Or with npm:

或使用npm:

npm run check <file_or_directory>

2. **Fix encoding:**

```bash
node scripts/fix_powershell_encoding.js <file_or_directory>
npm run check <file_or_directory>

2. **修复编码:**

```bash
node scripts/fix_powershell_encoding.js <file_or_directory>

Or with npm:

或使用npm:

npm run fix <file_or_directory>
undefined
npm run fix <file_or_directory>
undefined

Workflow

工作流程

When Creating New PowerShell Scripts

创建新PowerShell脚本时

After creating any
.ps1
file with non-ASCII characters:
bash
node scripts/fix_powershell_encoding.js script.ps1
This ensures the file is saved with UTF-8 BOM from the start.
创建包含非ASCII字符的
.ps1
文件后:
bash
node scripts/fix_powershell_encoding.js script.ps1
这能确保文件从一开始就以带BOM的UTF-8格式保存。

When Diagnosing Display Issues

排查显示问题时

If a PowerShell script shows garbled text:
  1. Check the encoding:
bash
node scripts/check_powershell_encoding.js problematic_script.ps1
  1. If it shows "UTF-8 without BOM", fix it:
bash
node scripts/fix_powershell_encoding.js problematic_script.ps1
  1. Test the script again - text should now display correctly
如果PowerShell脚本出现乱码:
  1. 检查编码:
bash
node scripts/check_powershell_encoding.js problematic_script.ps1
  1. 如果显示“UTF-8 without BOM”,则进行修复:
bash
node scripts/fix_powershell_encoding.js problematic_script.ps1
  1. 再次测试脚本 - 文本应能正常显示

Batch Processing

批量处理

To check/fix all PowerShell scripts in a directory:
bash
undefined
要检查/修复目录下所有PowerShell脚本:
bash
undefined

Check all scripts

检查所有脚本

node scripts/check_powershell_encoding.js scripts/windows/
node scripts/check_powershell_encoding.js scripts/windows/

Fix all scripts that need it

修复所有需要处理的脚本

node scripts/fix_powershell_encoding.js scripts/windows/
undefined
node scripts/fix_powershell_encoding.js scripts/windows/
undefined

Prevention

预防措施

To prevent encoding issues in the future:
  1. Before committing: Run the checker on modified
    .ps1
    files
  2. In CI/CD: Add encoding validation to your pipeline
  3. Editor settings: Configure your editor to save
    .ps1
    files as UTF-8 with BOM
为避免未来出现编码问题:
  1. 提交前: 对修改后的
    .ps1
    文件运行检查工具
  2. CI/CD中: 在流水线中添加编码验证步骤
  3. 编辑器设置: 配置编辑器将
    .ps1
    文件保存为带BOM的UTF-8格式

Editor Configuration Examples

编辑器配置示例

VS Code (settings.json):
json
{
  "[powershell]": {
    "files.encoding": "utf8bom"
  }
}
Cursor (settings.json):
json
{
  "[powershell]": {
    "files.encoding": "utf8bom"
  }
}
VS Code(settings.json):
json
{
  "[powershell]": {
    "files.encoding": "utf8bom"
  }
}
Cursor(settings.json):
json
{
  "[powershell]": {
    "files.encoding": "utf8bom"
  }
}

Scripts

脚本说明

check_powershell_encoding.js

check_powershell_encoding.js

Diagnoses encoding issues without modifying files.
Usage:
bash
node scripts/check_powershell_encoding.js <file_or_directory>
Output:
  • ✓ UTF-8 with BOM: File is correctly encoded
  • ⚠ UTF-8 without BOM: File needs fixing
  • ⚠ UTF-16: File uses UTF-16 encoding
  • ✗ Unknown: Unable to detect encoding
Exit codes:
  • 0: All files have UTF-8 BOM
  • 1: Some files need fixing or have errors
诊断编码问题但不修改文件。
使用方法:
bash
node scripts/check_powershell_encoding.js <file_or_directory>
输出:
  • ✓ UTF-8 with BOM:文件编码正确
  • ⚠ UTF-8 without BOM:文件需要修复
  • ⚠ UTF-16:文件使用UTF-16编码
  • ✗ Unknown:无法检测编码
退出码:
  • 0:所有文件均为带BOM的UTF-8格式
  • 1:部分文件需要修复或存在错误

fix_powershell_encoding.js

fix_powershell_encoding.js

Adds UTF-8 BOM to PowerShell files that don't have it.
Usage:
bash
node scripts/fix_powershell_encoding.js <file_or_directory>
Behavior:
  • Reads file content as UTF-8
  • Writes back with UTF-8 BOM (utf-8-sig)
  • Skips files that already have UTF-8 BOM
  • Processes
    .ps1
    files recursively in directories
Exit codes:
  • 0: All files processed successfully
  • 1: Errors occurred during processing
为不带BOM的PowerShell文件添加UTF-8 BOM。
使用方法:
bash
node scripts/fix_powershell_encoding.js <file_or_directory>
行为:
  • 以UTF-8格式读取文件内容
  • 以带BOM的UTF-8(utf-8-sig)格式写回
  • 跳过已带BOM的UTF-8文件
  • 递归处理目录下的
    .ps1
    文件
退出码:
  • 0:所有文件处理成功
  • 1:处理过程中出现错误

Technical Details

技术细节

UTF-8 BOM: The byte sequence
EF BB BF
at the start of a file signals UTF-8 encoding to PowerShell and other Windows applications.
Why PowerShell needs BOM:
  • Without BOM, PowerShell uses
    [Console]::OutputEncoding
    (often CP949/CP1252)
  • With BOM, PowerShell correctly identifies the file as UTF-8
  • This is specific to Windows PowerShell's file reading behavior
Alternative workarounds (not recommended):
  • Adding encoding commands to each script (verbose, error-prone)
  • Using
    -Encoding UTF8
    parameters (doesn't help with file reading)
  • Avoiding non-ASCII characters (limits usability)
The proper solution is to save files with UTF-8 BOM.
UTF-8 BOM: 文件开头的字节序列
EF BB BF
用于向PowerShell及其他Windows应用标识UTF-8编码。
PowerShell需要BOM的原因:
  • 没有BOM时,PowerShell会使用
    [Console]::OutputEncoding
    (通常为CP949/CP1252)
  • 有BOM时,PowerShell能正确识别文件为UTF-8格式
  • 这是Windows PowerShell文件读取行为的特有机制
替代解决方法(不推荐):
  • 在每个脚本中添加编码命令(繁琐且易出错)
  • 使用
    -Encoding UTF8
    参数(对文件读取无帮助)
  • 避免使用非ASCII字符(限制可用性)
正确的解决方案是将文件保存为带BOM的UTF-8格式。