modernize-scientific-stack

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Modernize Scientific Computing Stack

科学计算栈现代化升级

This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.
本技能提供将旧版Python 2科学计算代码迁移到采用现代库与最佳实践的Python 3的指导方案。

When to Use This Skill

适用场景

Apply this skill when:
  • Migrating Python 2 scientific scripts to Python 3
  • Updating legacy data processing code using outdated patterns
  • Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
  • Converting scripts from csv module to pandas for data analysis
  • Replacing os.path with pathlib for path manipulation
在以下场景中使用本技能:
  • 将Python 2科学脚本迁移到Python 3
  • 更新使用过时模式的旧版数据处理代码
  • 现代化处理已弃用的文件操作、字符串编码或数值计算库的脚本
  • 将脚本从csv模块转换为使用pandas进行数据分析
  • 用pathlib替代os.path进行路径操作

Approach

实施步骤

Phase 1: Complete Code Discovery

阶段1:完整代码探查

Before making any changes, ensure complete understanding of the existing codebase:
  1. Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.
  2. Identify all dependencies - Check for:
    • Import statements (standard library and third-party)
    • Configuration files (JSON, YAML, INI)
    • Data files (CSV, Excel, pickle)
    • Environment requirements
  3. Map the data flow - Understand:
    • Input file formats and encodings
    • Data transformations applied
    • Output format requirements
    • Any intermediate files or caches
在进行任何修改前,确保完全理解现有代码库:
  1. 完整阅读所有源文件 - 如果文件读取被截断,请在继续操作前获取完整内容。切勿根据部分读取的内容推测文件全部内容。
  2. 识别所有依赖项 - 检查以下内容:
    • 导入语句(标准库与第三方库)
    • 配置文件(JSON、YAML、INI)
    • 数据文件(CSV、Excel、pickle)
    • 环境要求
  3. 梳理数据流 - 明确:
    • 输入文件格式与编码
    • 应用的数据转换操作
    • 输出格式要求
    • 任何中间文件或缓存

Phase 2: Identify Migration Requirements

阶段2:识别迁移需求

Common Python 2 to Python 3 migration patterns in scientific code:
Legacy PatternModern Replacement
print "text"
print("text")
unicode()
/
str()
str()
with explicit encoding
open(file)
open(file, encoding='utf-8')
os.path.join()
pathlib.Path()
csv
module
pandas.read_csv()
for key in dict.keys()
for key in dict
dict.has_key(x)
x in dict
Manual file iterationContext managers (
with
statements)
xrange()
range()
Integer division
/
Explicit
//
or float division
科学计算代码中常见的Python 2到Python 3迁移模式:
旧版模式现代替代方案
print "text"
print("text")
unicode()
/
str()
带显式编码的
str()
open(file)
open(file, encoding='utf-8')
os.path.join()
pathlib.Path()
csv
模块
pandas.read_csv()
for key in dict.keys()
for key in dict
dict.has_key(x)
x in dict
手动文件迭代上下文管理器(
with
语句)
xrange()
range()
整数除法
/
显式
//
或浮点除法

Phase 3: Implementation Strategy

阶段3:实施策略

  1. Create the modernized script with these priorities:
    • UTF-8 encoding for all file operations
    • pathlib.Path for all file path manipulations
    • pandas for CSV/data processing
    • Type hints where beneficial
    • Context managers for resource handling
  2. Handle configuration files - Check for file existence before reading:
    python
    config_path = Path("config.json")
    if config_path.exists():
        config = json.loads(config_path.read_text(encoding='utf-8'))
  3. Create requirements.txt - Include all dependencies with version constraints
  1. 创建现代化脚本,优先遵循以下原则:
    • 所有文件操作使用UTF-8编码
    • 所有文件路径操作使用pathlib.Path
    • 使用pandas进行CSV/数据处理
    • 按需添加类型提示
    • 使用上下文管理器处理资源
  2. 处理配置文件 - 读取前检查文件是否存在:
    python
    config_path = Path("config.json")
    if config_path.exists():
        config = json.loads(config_path.read_text(encoding='utf-8'))
  3. 创建requirements.txt - 包含所有带版本约束的依赖项

Phase 4: Verification Protocol

阶段4:验证流程

Critical: Always verify file operations
After writing any file, read it back to confirm:
  • The complete content was written (not truncated)
  • The syntax is valid
  • All imports are present
Testing sequence:
  1. Syntax validation - Run Python syntax check:
    bash
    python -m py_compile script.py
  2. Import verification - Test all imports resolve:
    bash
    python -c "from script import *"
  3. Functional test - Run the script and compare output to expected results
  4. Output validation - Verify output format matches requirements exactly
关键:始终验证文件操作
写入任何文件后,重新读取以确认:
  • 内容已完整写入(未被截断)
  • 语法有效
  • 所有导入语句都存在
测试顺序:
  1. 语法验证 - 运行Python语法检查:
    bash
    python -m py_compile script.py
  2. 导入验证 - 测试所有导入是否可解析:
    bash
    python -c "from script import *"
  3. 功能测试 - 运行脚本并将输出与预期结果对比
  4. 输出验证 - 确认输出格式完全符合要求

Common Pitfalls to Avoid

需避免的常见陷阱

  1. Truncated file content - Never proceed with partial file reads. If a response shows
    ... [truncated]
    or incomplete content, request the full file before continuing.
  2. Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.
  3. Encoding issues - Always specify
    encoding='utf-8'
    explicitly in file operations. Legacy scripts often have implicit ASCII assumptions.
  4. Path string concatenation - Replace all
    os.path.join()
    and string concatenation for paths with
    pathlib.Path
    operations.
  5. Missing edge case handling:
    • Empty data files or datasets
    • Missing required files
    • Invalid data types in CSV columns
    • Stations/entities with no matching data
  6. Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.
  1. 文件内容截断 - 切勿基于部分文件内容进行操作。如果返回内容显示
    ... [truncated]
    或不完整,请在继续前获取完整文件。
  2. 未验证的写入操作 - 执行写入操作后,务必重新读取文件以确认内容已完整正确写入。
  3. 编码问题 - 在文件操作中始终显式指定
    encoding='utf-8'
    。旧版脚本通常默认使用ASCII编码。
  4. 路径字符串拼接 - 用
    pathlib.Path
    操作替代所有
    os.path.join()
    和路径字符串拼接。
  5. 缺失边缘情况处理
    • 空数据文件或数据集
    • 缺失必要文件
    • CSV列中的无效数据类型
    • 无匹配数据的站点/实体
  6. 环境设置重复 - 配置环境(venv、PATH)时,确认设置已持久化,而非在每个命令中重复配置。

Verification Checklist

验证清单

Before marking the task complete, confirm:
  • All source files were read completely (no truncation)
  • Written files were verified by reading back
  • All Python 2 patterns have been converted
  • File encodings are explicitly specified
  • pathlib is used for all path operations
  • pandas is used for data processing (where appropriate)
  • requirements.txt includes all dependencies
  • Script runs without errors
  • Output matches expected format exactly
  • Edge cases are handled (empty data, missing files)
在标记任务完成前,确认以下事项:
  • 所有源文件已完整阅读(无截断)
  • 已通过重新读取验证写入的文件
  • 所有Python 2模式已完成转换
  • 文件编码已显式指定
  • 所有路径操作使用pathlib
  • 已在合适场景使用pandas进行数据处理
  • requirements.txt包含所有依赖项
  • 脚本可无错误运行
  • 输出完全符合预期格式
  • 已处理边缘情况(空数据、缺失文件)

Output Validation

输出验证

When the task specifies an expected output format, verify the output matches exactly:
  1. Run the modernized script
  2. Capture the output
  3. Compare against expected format character-by-character if needed
  4. Pay attention to:
    • Decimal precision in numerical output
    • Whitespace and formatting
    • Order of output items
    • Units and labels
当任务指定预期输出格式时,确认输出完全匹配:
  1. 运行现代化后的脚本
  2. 捕获输出内容
  3. 必要时逐字符与预期格式对比
  4. 注意以下细节:
    • 数值输出的小数精度
    • 空白字符与格式
    • 输出项的顺序
    • 单位与标签