modernize-scientific-stack
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModernize Scientific Computing Stack
科学计算栈现代化升级
This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.
本技能提供将旧版Python 2科学计算代码迁移到采用现代库与最佳实践的Python 3的指导方案。
When to Use This Skill
适用场景
Apply this skill when:
- Migrating Python 2 scientific scripts to Python 3
- Updating legacy data processing code using outdated patterns
- Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
- Converting scripts from csv module to pandas for data analysis
- Replacing os.path with pathlib for path manipulation
在以下场景中使用本技能:
- 将Python 2科学脚本迁移到Python 3
- 更新使用过时模式的旧版数据处理代码
- 现代化处理已弃用的文件操作、字符串编码或数值计算库的脚本
- 将脚本从csv模块转换为使用pandas进行数据分析
- 用pathlib替代os.path进行路径操作
Approach
实施步骤
Phase 1: Complete Code Discovery
阶段1:完整代码探查
Before making any changes, ensure complete understanding of the existing codebase:
-
Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.
-
Identify all dependencies - Check for:
- Import statements (standard library and third-party)
- Configuration files (JSON, YAML, INI)
- Data files (CSV, Excel, pickle)
- Environment requirements
-
Map the data flow - Understand:
- Input file formats and encodings
- Data transformations applied
- Output format requirements
- Any intermediate files or caches
在进行任何修改前,确保完全理解现有代码库:
-
完整阅读所有源文件 - 如果文件读取被截断,请在继续操作前获取完整内容。切勿根据部分读取的内容推测文件全部内容。
-
识别所有依赖项 - 检查以下内容:
- 导入语句(标准库与第三方库)
- 配置文件(JSON、YAML、INI)
- 数据文件(CSV、Excel、pickle)
- 环境要求
-
梳理数据流 - 明确:
- 输入文件格式与编码
- 应用的数据转换操作
- 输出格式要求
- 任何中间文件或缓存
Phase 2: Identify Migration Requirements
阶段2:识别迁移需求
Common Python 2 to Python 3 migration patterns in scientific code:
| Legacy Pattern | Modern Replacement |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| Manual file iteration | Context managers ( |
| |
Integer division | Explicit |
科学计算代码中常见的Python 2到Python 3迁移模式:
| 旧版模式 | 现代替代方案 |
|---|---|
| |
| 带显式编码的 |
| |
| |
| |
| |
| |
| 手动文件迭代 | 上下文管理器( |
| |
整数除法 | 显式 |
Phase 3: Implementation Strategy
阶段3:实施策略
-
Create the modernized script with these priorities:
- UTF-8 encoding for all file operations
- pathlib.Path for all file path manipulations
- pandas for CSV/data processing
- Type hints where beneficial
- Context managers for resource handling
-
Handle configuration files - Check for file existence before reading:python
config_path = Path("config.json") if config_path.exists(): config = json.loads(config_path.read_text(encoding='utf-8')) -
Create requirements.txt - Include all dependencies with version constraints
-
创建现代化脚本,优先遵循以下原则:
- 所有文件操作使用UTF-8编码
- 所有文件路径操作使用pathlib.Path
- 使用pandas进行CSV/数据处理
- 按需添加类型提示
- 使用上下文管理器处理资源
-
处理配置文件 - 读取前检查文件是否存在:python
config_path = Path("config.json") if config_path.exists(): config = json.loads(config_path.read_text(encoding='utf-8')) -
创建requirements.txt - 包含所有带版本约束的依赖项
Phase 4: Verification Protocol
阶段4:验证流程
Critical: Always verify file operations
After writing any file, read it back to confirm:
- The complete content was written (not truncated)
- The syntax is valid
- All imports are present
Testing sequence:
-
Syntax validation - Run Python syntax check:bash
python -m py_compile script.py -
Import verification - Test all imports resolve:bash
python -c "from script import *" -
Functional test - Run the script and compare output to expected results
-
Output validation - Verify output format matches requirements exactly
关键:始终验证文件操作
写入任何文件后,重新读取以确认:
- 内容已完整写入(未被截断)
- 语法有效
- 所有导入语句都存在
测试顺序:
-
语法验证 - 运行Python语法检查:bash
python -m py_compile script.py -
导入验证 - 测试所有导入是否可解析:bash
python -c "from script import *" -
功能测试 - 运行脚本并将输出与预期结果对比
-
输出验证 - 确认输出格式完全符合要求
Common Pitfalls to Avoid
需避免的常见陷阱
-
Truncated file content - Never proceed with partial file reads. If a response showsor incomplete content, request the full file before continuing.
... [truncated] -
Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.
-
Encoding issues - Always specifyexplicitly in file operations. Legacy scripts often have implicit ASCII assumptions.
encoding='utf-8' -
Path string concatenation - Replace alland string concatenation for paths with
os.path.join()operations.pathlib.Path -
Missing edge case handling:
- Empty data files or datasets
- Missing required files
- Invalid data types in CSV columns
- Stations/entities with no matching data
-
Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.
-
文件内容截断 - 切勿基于部分文件内容进行操作。如果返回内容显示或不完整,请在继续前获取完整文件。
... [truncated] -
未验证的写入操作 - 执行写入操作后,务必重新读取文件以确认内容已完整正确写入。
-
编码问题 - 在文件操作中始终显式指定。旧版脚本通常默认使用ASCII编码。
encoding='utf-8' -
路径字符串拼接 - 用操作替代所有
pathlib.Path和路径字符串拼接。os.path.join() -
缺失边缘情况处理:
- 空数据文件或数据集
- 缺失必要文件
- CSV列中的无效数据类型
- 无匹配数据的站点/实体
-
环境设置重复 - 配置环境(venv、PATH)时,确认设置已持久化,而非在每个命令中重复配置。
Verification Checklist
验证清单
Before marking the task complete, confirm:
- All source files were read completely (no truncation)
- Written files were verified by reading back
- All Python 2 patterns have been converted
- File encodings are explicitly specified
- pathlib is used for all path operations
- pandas is used for data processing (where appropriate)
- requirements.txt includes all dependencies
- Script runs without errors
- Output matches expected format exactly
- Edge cases are handled (empty data, missing files)
在标记任务完成前,确认以下事项:
- 所有源文件已完整阅读(无截断)
- 已通过重新读取验证写入的文件
- 所有Python 2模式已完成转换
- 文件编码已显式指定
- 所有路径操作使用pathlib
- 已在合适场景使用pandas进行数据处理
- requirements.txt包含所有依赖项
- 脚本可无错误运行
- 输出完全符合预期格式
- 已处理边缘情况(空数据、缺失文件)
Output Validation
输出验证
When the task specifies an expected output format, verify the output matches exactly:
- Run the modernized script
- Capture the output
- Compare against expected format character-by-character if needed
- Pay attention to:
- Decimal precision in numerical output
- Whitespace and formatting
- Order of output items
- Units and labels
当任务指定预期输出格式时,确认输出完全匹配:
- 运行现代化后的脚本
- 捕获输出内容
- 必要时逐字符与预期格式对比
- 注意以下细节:
- 数值输出的小数精度
- 空白字符与格式
- 输出项的顺序
- 单位与标签