ci-optimization-specialist

Original🇺🇸 English
Translated

Optimizes GitHub Actions CI/CD workflows through test sharding, intelligent caching, and workflow parallelization. Use when CI execution time exceeds limits, costs are too high, or workflows need parallelization.

7installs
Added on

NPX Install

npx skill4agent add d-oit/do-novelist-ai ci-optimization-specialist

CI Optimization Specialist

Quick Start

This skill optimizes GitHub Actions workflows for:
  1. Test sharding: Parallel test execution across multiple runners
  2. Caching: pnpm store, Playwright browsers, Vite build cache
  3. Workflow optimization: Job dependencies and concurrency

When to Use

  • CI execution time exceeds 10-15 minutes
  • GitHub Actions costs too high
  • Need faster developer feedback loops
  • Tests not parallelized

Test Sharding Setup

Basic Pattern (Automatic Distribution)

Add matrix strategy to
.github/workflows/ci.yml
:
yaml
e2e-tests:
  name: 🧪 E2E Tests [Shard ${{ matrix.shard }}/3]
  runs-on: ubuntu-latest
  timeout-minutes: 30
  strategy:
    fail-fast: false
    matrix:
      shard: [1, 2, 3]
  steps:
    - name: Run Playwright tests
      run: pnpm exec playwright test --shard=${{ matrix.shard }}/3
      env:
        CI: true
Expected improvement: 60-65% faster for 3 shards

Advanced Pattern (Manual Distribution)

For unbalanced test suites, manually distribute by duration:
yaml
matrix:
  include:
    - shard: 1
      pattern: 'ai-generation|project-management' # Heavy tests
    - shard: 2
      pattern: 'project-wizard|settings|publishing' # Medium tests
    - shard: 3
      pattern: 'world-building|versioning|mock-validation' # Light tests

# In step:
run: pnpm exec playwright test --grep "${{ matrix.pattern }}"

Critical Caching Patterns

pnpm Store Cache

ALWAYS cache pnpm store to avoid re-downloading packages:
yaml
- name: Get pnpm store directory
  id: pnpm-cache
  shell: bash
  run: echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

- name: Setup pnpm cache
  uses: actions/cache@v4
  with:
    path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
    key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
    restore-keys: |
      ${{ runner.os }}-pnpm-store-

Playwright Browsers Cache

Cache 500MB+ browser binaries:
yaml
- name: Cache Playwright browsers
  uses: actions/cache@v4
  id: playwright-cache
  with:
    path: ~/.cache/ms-playwright
    key: ${{ runner.os }}-playwright-${{ hashFiles('**/pnpm-lock.yaml') }}

- name: Install Playwright browsers
  if: steps.playwright-cache.outputs.cache-hit != 'true'
  run: pnpm exec playwright install --with-deps chromium

- name: Install Playwright system dependencies
  if: steps.playwright-cache.outputs.cache-hit == 'true'
  run: pnpm exec playwright install-deps chromium

Vite Build Cache

For monorepos or frequent builds:
yaml
- name: Cache Vite build
  uses: actions/cache@v4
  with:
    path: |
      dist/
      node_modules/.vite/
    key: ${{ runner.os }}-vite-${{ hashFiles('src/**', 'vite.config.ts') }}

Workflow Optimization

Job Dependencies

Use
needs
to control execution flow:
yaml
jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Build
        run: pnpm run build
      - name: Run unit tests
        run: pnpm test

  e2e-tests:
    needs: build-and-test # Wait for build to complete
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3]
    steps:
      - name: Run E2E tests
        run: pnpm exec playwright test --shard=${{ matrix.shard }}/3

Concurrency Control

Prevent multiple runs on same branch:
yaml
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Artifact Management

Per-Shard Artifacts

Upload test reports from each shard:
yaml
- name: Upload Playwright report
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: playwright-report-shard-${{ matrix.shard }}-${{ github.sha }}
    path: playwright-report/
    retention-days: 7
    compression-level: 6

Artifact Cleanup

Set short retention for test reports to reduce storage costs:
yaml
retention-days: 7 # Default is 90 days
compression-level: 6 # Compress to reduce storage

Performance Monitoring

Expected Benchmarks

OptimizationBeforeAfterImprovement
Test sharding (3 shards)27 min9-10 min60-65%
pnpm cache hit2-3 min10-15s85-90%
Playwright cache hit1-2 min5-10s90-95%
Vite build cache1-2 min5-10s90-95%

Regression Detection

Set timeout thresholds as guardrails:
yaml
timeout-minutes: 30 # Fail if shard exceeds 30 minutes
Monitor shard execution times and rebalance if one shard consistently exceeds others by >2 minutes.

Optimization Workflow

Phase 1: Baseline

  1. Record current CI execution times
  2. Identify slowest jobs
  3. Measure cache hit rates (check Actions logs)

Phase 2: Implement Caching

  1. Add pnpm store cache (highest impact)
  2. Add Playwright browser cache
  3. Add build caches if applicable
  4. Verify cache keys work correctly

Phase 3: Implement Sharding

  1. Calculate optimal shard count (target 3-5 min per shard)
  2. Add matrix strategy to workflow
  3. Test locally:
    playwright test --shard=1/3
  4. Monitor shard balance in CI

Phase 4: Monitor & Adjust

  1. Track execution times over 5-10 runs
  2. Identify unbalanced shards (>2 min variance)
  3. Adjust shard distribution if needed
  4. Set up alerts for regressions

Common Issues

Shard imbalance (one shard takes 2x longer)
  • Use manual distribution with
    --grep
    patterns
  • Group heavy tests together, distribute across shards
Cache misses despite correct key
  • Verify
    hashFiles
    glob patterns match actual files
  • Check if lock file changes on every run (shouldn't happen)
Playwright install fails with cache hit
  • Ensure system dependencies installed separately:
    playwright install-deps
Tests fail in CI but pass locally
  • Check environment variables (CI=true may affect behavior)
  • Verify mock setup works in parallel execution
  • Increase timeouts for slow operations

Success Criteria

  • CI execution time < 15 minutes total
  • Cache hit rate > 85% for dependencies
  • Shard execution time variance < 2 minutes
  • Zero timeout failures from slow tests

References

For detailed examples and templates: