Total 50,523 skills, Data Processing has 2561 skills
Showing 12 of 2561 skills
Workload-aware architecture design for Apache Doris. MUST USE when designing data architectures, choosing between data models, planning ingestion strategies, sizing clusters, or translating business requirements into Apache Doris system designs. Complements doris-best-practices with decision frameworks and sizing-first workflow. Use when user describes a workload involving: IoT, sensor data, telemetry, real-time analytics, dashboard, log analysis, log search, CDC sync, time-series, device monitoring, point query service, ad-hoc analytics, lakehouse federation, ETL/ELT pipeline, report analytics, clickstream, user behavior, observability, metrics, fleet tracking, or any OLAP workload requiring table design from scratch. Also triggers on prompts like: "design a table for...", "how should I store...", "build an architecture for...", "we have X devices sending data every Y seconds", "recommend a cluster size for...", "what data model should I use for...", "we need to ingest X GB/day", "migrate from MySQL/PostgreSQL to Apache Doris". Also use for legacy analytics/search/serving stack consolidation prompts even when Apache Doris is not named explicitly, including replacing or migrating from Impala, Kudu, Elasticsearch/ES, Greenplum, Presto, HBase, Hive, Hadoop, Redis, or Lambda-style multi-engine data platforms.
Use this skill when working with Brain Imaging Data Structure (BIDS) datasets: organizing neuroscience and biomedical data (MRI, EEG, MEG, iEEG, PET, microscopy, NIRS, motion capture, EMG, MR spectroscopy, behavioral), querying BIDS layouts, validating compliance, converting DICOM to BIDS, writing metadata sidecars, or creating BIDS derivatives.
Query the Precision Medicine Knowledge Graph (PrimeKG) for multiscale biological data including genes, drugs, diseases, phenotypes, and more.
Bayesian modeling with PyMC. Build hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior checks, for probabilistic programming and inference.
Build and analyze phylogenetic trees using MAFFT (multiple alignment), IQ-TREE 2 (maximum likelihood), and FastTree (fast NJ/ML). Visualize with ETE3 or FigTree. For evolutionary analysis, microbial genomics, viral phylodynamics, protein family analysis, and molecular clock studies.
Use when defining field mappings for data streams, populating ecs.yml with ECS field references, selecting ECS categorization values, choosing custom field types, or troubleshooting mapping validation failures.
Run OpenMMDL molecular dynamics workflows via the FastFold Workflows API (`openmmdl_v1`) from local topology + optional ligand files, prepare draft scripts, execute drafts, wait for completion, fetch artifacts/metrics, and extract trajectory frames. Use when users ask for OpenMMDL, protein-ligand MD, OpenMMDL script preparation, or `/openmmdl/results/<workflow_id>` reruns.
Best practices for developing tools, dashboards and interactive data apps with HoloViz Panel. Create reactive, component-based UIs with widgets, layouts, templates, and real-time updates. Use when developing interactive data exploration tools, dashboards, data apps, or any interactive Python web application. Supports file uploads, streaming data, multi-page apps, and integration with HoloViews, hvPlot, Pandas, Polars, DuckDB and the rest of the HoloViz and PyData ecosystems.
Generate financial analytics and insights from ~/Documents/finances/ data. Terminal report shows: net worth trends (30d/90d/1y), asset allocation, liability table with APR and monthly interest, cash flow (last 30 days), Bitcoin detail with sparkline. HTML dashboard: interactive plotly charts for all of the above over time. Use when: reviewing finances, answering questions about net worth, spending patterns, debt paydown progress, Bitcoin holdings, asset allocation. Keywords: financial report, net worth, spending, cash flow, debt, liabilities, bitcoin, how am I doing financially, finances summary, show me my finances, portfolio.
Track data lineage and provenance from source to consumption. Use when auditing data flows, debugging data quality issues, ensuring compliance (GDPR, SOX), or understanding data dependencies. Covers lineage tracking, impact analysis, data catalogs, and metadata management.
Explains core Apache Beam programming model concepts including PCollections, PTransforms, Pipelines, and Runners. Use when learning Beam fundamentals or explaining pipeline concepts.
Scan multiple symbols with indicator conditions. Find stocks matching RSI oversold, EMA crossovers, Supertrend signals, and custom filter combinations.