Search Results: dataset

Found 288 Skills

nemo-gym-pivot-datasets

Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout, trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym Responses-style row conversion, pivot selection, single-step tool-use configs, agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage.

🇺🇸|EnglishTranslated

5 scripts/Checked

AI & Machine Learninglllllllama/ai-paper-repro...

paper-context-resolver

Optional sub-skill for README-first AI repo reproduction. Use only when README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacing README guidance by default.

🇺🇸|EnglishTranslated

125.5k

AI & Machine Learninglllllllama/ai-paper-repro...

env-and-assets-bootstrap

Sub-skill for environment and asset preparation in README-first AI repo reproduction. Use when the task is specifically to prepare a conservative conda-first environment, checkpoint and dataset path assumptions, cache location hints, and setup notes before any run on a README-documented repository. Do not use for repo scanning, full orchestration, paper interpretation, final run reporting, or generic environment setup that is not tied to a specific reproduction target.

🇺🇸|EnglishTranslated

125.2k

2 scripts/Attention

Data Processingsupercent-io/skills-templ...

data-analysis

Analyze datasets to extract insights, identify patterns, and generate reports. Use when exploring data, creating visualizations, or performing statistical analysis. Handles CSV, JSON, SQL queries, and Python pandas operations.

🇺🇸|EnglishTranslated

13.9k

AI & Machine Learningfirecrawl/firecrawl-workf...

firecrawl-knowledge-base

Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown organized from web sources.

🇺🇸|EnglishTranslated

456

Mobile Developmentflutter/skills

flutter-handling-concurrency

Executes long-running tasks in background isolates to keep the UI responsive. Use when performing heavy computations or parsing large datasets.

🇺🇸|EnglishTranslated

149

AI & Machine Learningdavila7/claude-code-templ...

hqq-quantization

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningk-dense-ai/claude-scienti...

hypothesis-generation

Structured hypothesis formulation from observations. Use when you have experimental observations or data and need to formulate testable hypotheses with predictions, propose mechanisms, and design experiments to test them. Follows scientific method framework. For open-ended ideation use scientific-brainstorming; for automated LLM-driven hypothesis testing on datasets use hypogenic.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

pyopenms

Python interface to OpenMS for mass spectrometry data analysis. Use for LC-MS/MS proteomics and metabolomics workflows including file handling (mzML, mzXML, mzTab, FASTA, pepXML, protXML, mzIdentML), signal processing, feature detection, peptide identification, and quantitative analysis. Apply when working with mass spectrometry data, analyzing proteomics experiments, or processing metabolomics datasets.

🇺🇸|EnglishTranslated

Data Processingk-dense-ai/claude-scienti...

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

arboreto

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningk-dense-ai/claude-scienti...

hypogenic

Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.

🇺🇸|EnglishTranslated