modernize-scientific-stack

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Modernize Scientific Computing Stack

科学计算栈现代化升级

This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.

本技能提供将旧版Python 2科学计算代码迁移到采用现代库与最佳实践的Python 3的指导方案。

When to Use This Skill

适用场景

Apply this skill when:

Migrating Python 2 scientific scripts to Python 3
Updating legacy data processing code using outdated patterns
Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
Converting scripts from csv module to pandas for data analysis
Replacing os.path with pathlib for path manipulation

在以下场景中使用本技能：

将Python 2科学脚本迁移到Python 3
更新使用过时模式的旧版数据处理代码
现代化处理已弃用的文件操作、字符串编码或数值计算库的脚本
将脚本从csv模块转换为使用pandas进行数据分析
用pathlib替代os.path进行路径操作

Approach

实施步骤

Phase 1: Complete Code Discovery

阶段1：完整代码探查

Before making any changes, ensure complete understanding of the existing codebase:

Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.
Identify all dependencies - Check for:
- Import statements (standard library and third-party)
- Configuration files (JSON, YAML, INI)
- Data files (CSV, Excel, pickle)
- Environment requirements
Map the data flow - Understand:
- Input file formats and encodings
- Data transformations applied
- Output format requirements
- Any intermediate files or caches

在进行任何修改前，确保完全理解现有代码库：

完整阅读所有源文件 - 如果文件读取被截断，请在继续操作前获取完整内容。切勿根据部分读取的内容推测文件全部内容。
识别所有依赖项 - 检查以下内容：
- 导入语句（标准库与第三方库）
- 配置文件（JSON、YAML、INI）
- 数据文件（CSV、Excel、pickle）
- 环境要求
梳理数据流 - 明确：
- 输入文件格式与编码
- 应用的数据转换操作
- 输出格式要求
- 任何中间文件或缓存

Phase 2: Identify Migration Requirements

阶段2：识别迁移需求

Common Python 2 to Python 3 migration patterns in scientific code:

Legacy Pattern	Modern Replacement
`print "text"`	`print("text")`
`unicode()` / `str()`	`str()` with explicit encoding
`open(file)`	`open(file, encoding='utf-8')`
`os.path.join()`	`pathlib.Path()`
`csv` module	`pandas.read_csv()`
`for key in dict.keys()`	`for key in dict`
`dict.has_key(x)`	`x in dict`
Manual file iteration	Context managers ( `with` statements)
`xrange()`	`range()`
Integer division `/`	Explicit `//` or float division

科学计算代码中常见的Python 2到Python 3迁移模式：

旧版模式	现代替代方案
`print "text"`	`print("text")`
`unicode()` / `str()`	带显式编码的 `str()`
`open(file)`	`open(file, encoding='utf-8')`
`os.path.join()`	`pathlib.Path()`
`csv` 模块	`pandas.read_csv()`
`for key in dict.keys()`	`for key in dict`
`dict.has_key(x)`	`x in dict`
手动文件迭代	上下文管理器（ `with` 语句）
`xrange()`	`range()`
整数除法 `/`	显式 `//` 或浮点除法

Phase 3: Implementation Strategy

阶段3：实施策略

Create the modernized script with these priorities:
- UTF-8 encoding for all file operations
- pathlib.Path for all file path manipulations
- pandas for CSV/data processing
- Type hints where beneficial
- Context managers for resource handling

Handle configuration files - Check for file existence before reading:

python

config_path = Path("config.json")
if config_path.exists():
    config = json.loads(config_path.read_text(encoding='utf-8'))

Create requirements.txt - Include all dependencies with version constraints

创建现代化脚本，优先遵循以下原则：
- 所有文件操作使用UTF-8编码
- 所有文件路径操作使用pathlib.Path
- 使用pandas进行CSV/数据处理
- 按需添加类型提示
- 使用上下文管理器处理资源

处理配置文件 - 读取前检查文件是否存在：

python

config_path = Path("config.json")
if config_path.exists():
    config = json.loads(config_path.read_text(encoding='utf-8'))

创建requirements.txt - 包含所有带版本约束的依赖项

Phase 4: Verification Protocol

阶段4：验证流程

Critical: Always verify file operations

After writing any file, read it back to confirm:

The complete content was written (not truncated)
The syntax is valid
All imports are present

Testing sequence:

Syntax validation - Run Python syntax check:
bash
```
python -m py_compile script.py
```
Import verification - Test all imports resolve:
bash
```
python -c "from script import *"
```
Functional test - Run the script and compare output to expected results
Output validation - Verify output format matches requirements exactly

关键：始终验证文件操作

写入任何文件后，重新读取以确认：

内容已完整写入（未被截断）
语法有效
所有导入语句都存在

测试顺序：

语法验证 - 运行Python语法检查：
bash
```
python -m py_compile script.py
```
导入验证 - 测试所有导入是否可解析：
bash
```
python -c "from script import *"
```
功能测试 - 运行脚本并将输出与预期结果对比
输出验证 - 确认输出格式完全符合要求

Common Pitfalls to Avoid

需避免的常见陷阱

Truncated file content - Never proceed with partial file reads. If a response shows
```
... [truncated]
```
or incomplete content, request the full file before continuing.
Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.
Encoding issues - Always specify
```
encoding='utf-8'
```
explicitly in file operations. Legacy scripts often have implicit ASCII assumptions.
Path string concatenation - Replace all
```
os.path.join()
```
and string concatenation for paths with
```
pathlib.Path
```
operations.
Missing edge case handling:
- Empty data files or datasets
- Missing required files
- Invalid data types in CSV columns
- Stations/entities with no matching data
Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.

文件内容截断 - 切勿基于部分文件内容进行操作。如果返回内容显示
```
... [truncated]
```
或不完整，请在继续前获取完整文件。
未验证的写入操作 - 执行写入操作后，务必重新读取文件以确认内容已完整正确写入。
编码问题 - 在文件操作中始终显式指定
```
encoding='utf-8'
```
。旧版脚本通常默认使用ASCII编码。
路径字符串拼接 - 用
```
pathlib.Path
```
操作替代所有
```
os.path.join()
```
和路径字符串拼接。
缺失边缘情况处理：
- 空数据文件或数据集
- 缺失必要文件
- CSV列中的无效数据类型
- 无匹配数据的站点/实体
环境设置重复 - 配置环境（venv、PATH）时，确认设置已持久化，而非在每个命令中重复配置。

Verification Checklist

验证清单

Output Validation

输出验证

When the task specifies an expected output format, verify the output matches exactly:

Run the modernized script
Capture the output
Compare against expected format character-by-character if needed
Pay attention to:
- Decimal precision in numerical output
- Whitespace and formatting
- Order of output items
- Units and labels

当任务指定预期输出格式时，确认输出完全匹配：

运行现代化后的脚本
捕获输出内容
必要时逐字符与预期格式对比
注意以下细节：
- 数值输出的小数精度
- 空白字符与格式
- 输出项的顺序
- 单位与标签