Total 50,473 skills, Data Processing has 2559 skills
Showing 12 of 2559 skills
Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment
Retrieves protein structure data from RCSB PDB, PDBe, and AlphaFold with protein disambiguation, quality assessment, and comprehensive structural profiles. Creates detailed structure reports with experimental metadata, ligand information, and download links. Use when users need protein structures, 3D models, crystallography data, or mention PDB IDs (4-character codes like 1ABC) or UniProt accessions.
Analyze cost reduction initiatives and operational efficiency measures from earnings transcripts, including headcount actions, facility consolidation, and productivity improvements.
Retrieve paper metadata from arXiv using keyword queries and save results as JSONL (`papers/papers_raw.jsonl`). **Trigger**: arXiv, arxiv, paper search, metadata retrieval, 文献检索, 论文检索, 拉取元数据, 离线导入. **Use when**: 需要一个初始论文集合(survey/snapshot 的 Stage C1),来源为 arXiv(在线检索或离线导入 export)。 **Skip if**: 已经有可用的 `papers/papers_raw.jsonl`,或数据源不是 arXiv。 **Network**: 在线检索需要网络;离线 `--input <export.*>` 不需要网络。 **Guardrail**: 只做 metadata;不要在 `output/` 写长 prose。
Expert-level research methodology, academic writing, statistical analysis, and scientific investigation
Use when animating charts, graphs, dashboards, data transitions, or any information visualization work.
Designs reliable ETL and data synchronization jobs with incremental updates, idempotency guarantees, watermark tracking, error handling, and retry logic. Use for "ETL jobs", "data sync", "incremental sync", or "data pipeline".
Complete DataForSEO API integration for SEO data and analysis. Use when the user asks for keyword research, search volume, SERP analysis, backlink audits, competitor analysis, rank tracking, domain authority, technical SEO audits, content monitoring, Google Trends, or any SEO-related data queries. Covers all DataForSEO APIs including SERP, Keywords Data, DataForSEO Labs, Backlinks, OnPage, Domain Analytics, Content Analysis, Business Data, Merchant, App Data, and AI Optimization APIs. Outputs CSV files.
Use for creating websets, running searches, importing CSV data, managing items, and adding enrichments to extract structured data.
Use when implementing data governance frameworks, building data catalogs, establishing data lineage, defining data quality rules, or setting up data stewardship programs - covers metadata management, data quality, and complianceUse when ", " mentioned.
Guide for modernizing legacy Python 2 scientific computing code to Python 3 with modern libraries. This skill should be used when migrating scientific scripts involving data processing, numerical computation, or analysis from Python 2 to Python 3, or when updating deprecated scientific computing patterns to modern equivalents (pandas, numpy, pathlib).
Calculate statistical significance for A/B tests. Sample size estimation, power analysis, and conversion rate comparisons with confidence intervals.