Search Results: dataset

Found 288 Skills

Data Processinggemini-cli-extensions/dat...

gcp-pipeline-resource-provisioning

Automates declarative resource creation and provisioning for data pipelines, supporting BigQuery, Dataform, Dataproc, BigQuery Data Transfer Service (DTS), and other resources. It manages environment-specific configurations (dev, staging, prod) through a deployment.yaml file. Use when: - Modifying or creating deployment.yaml for deployment settings. - Resolving environment-specific variables (e.g., Project IDs, Regions) for deployment. - Provisioning supported infrastructure like BigQuery datasets/tables, Dataform resources, or DTS resources via deployment.yaml. Do not use when: - Resources already exist. - Managing resources not supported by `gcloud beta orchestration-pipelines resource-types list`. - Managing general cloud infrastructure (VMs, networks, Kubernetes, IAM policies), which are better suited for Terraform. - Infrastructure spans multiple cloud providers (AWS, Azure, etc.). - Already uses Terraform for the target resources.

🇺🇸|EnglishTranslated

Data Processingpostplusai/postplus-skill...

x-tools

Local execution tools for X/Twitter hosted collection workflows, including actor runs, dataset normalization, tweet ranking, account ranking, audience graph construction, and language clustering.

🇺🇸|EnglishTranslated

18 scripts/Attention

Data Processingpostplusai/postplus-skill...

xiaohongshu-tools

Local execution tools for Xiaohongshu/Rednote hosted collection workflows, including actor runs, dataset normalization, account and post ranking, comment clustering, product-pool ranking, and topic-map building.

🇺🇸|EnglishTranslated

20 scripts/Attention

DevOps & Cloud Servicesgetcompanion-ai/feynman

runpod-compute

Provision and manage GPU pods on RunPod for long-running experiments. Use when the user needs persistent GPU compute with SSH access, large datasets, or multi-step experiments.

🇺🇸|EnglishTranslated

AI & Machine Learninggetcompanion-ai/feynman

ml-training-recipe

Find implementable ML training recipes from papers, datasets, docs, and code. Use when the user wants to fine-tune, train, reproduce, or choose a practical ML method, dataset, hyperparameter setup, or benchmark recipe.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

geo-database

Access NCBI GEO for gene expression/genomics data. Search/download microarray and RNA-seq datasets (GSE, GSM, GPL), retrieve SOFT/Matrix files, for transcriptomics and expression analysis.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

hypogenic

Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.

🇺🇸|EnglishTranslated

Tools & Utilitiesdavila7/claude-code-templ...

get-available-resources

This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.

🇺🇸|EnglishTranslated

1 scripts/Attention

Data Processingdavila7/claude-code-templ...

omero-integration

Microscopy data management platform. Access images via Python, retrieve datasets, analyze pixels, manage ROIs/annotations, batch processing, for high-content screening and microscopy workflows.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

geniml

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

🇺🇸|EnglishTranslated

Data Processingailabs-393/ai-labs-claude...

data-analyst

This skill should be used when analyzing CSV datasets, handling missing values through intelligent imputation, and creating interactive dashboards to visualize data trends. Use this skill for tasks involving data quality assessment, automated missing value detection and filling, statistical analysis, and generating Plotly Dash dashboards for exploratory data analysis.

🇺🇸|EnglishTranslated

4 scripts/Checked