Loading...
Loading...
Found 200 Skills
Deep-dive data profiling for a specific table. Use when the user asks to profile a table, wants statistics about a dataset, asks about data quality, or needs to understand a table's structure and content. Requires a table name.
This skill should be used when analyzing CSV datasets, handling missing values through intelligent imputation, and creating interactive dashboards to visualize data trends. Use this skill for tasks involving data quality assessment, automated missing value detection and filling, statistical analysis, and generating Plotly Dash dashboards for exploratory data analysis.
React performance optimization patterns using memoization, code splitting, and efficient rendering strategies. Use when optimizing slow React applications, reducing bundle size, or improving user experience with large datasets.
Profile and explore datasets to understand their shape, quality, and patterns before analysis. Use when encountering a new dataset, assessing data quality, discovering column distributions, identifying nulls and outliers, or deciding which dimensions to analyze.
Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
Data journalism workflows for analysis, visualization, and storytelling. Use when analyzing datasets, creating charts and maps, cleaning messy data, calculating statistics or building data-driven stories. Essential for reporters, newsrooms and researchers working with quantitative information.
Use this skill for setting up vector similarity search with pgvector for AI/ML embeddings, RAG applications, or semantic search. **Trigger when user asks to:** - Store or search vector embeddings in PostgreSQL - Set up semantic search, similarity search, or nearest neighbor search - Create HNSW or IVFFlat indexes for vectors - Implement RAG (Retrieval Augmented Generation) with PostgreSQL - Optimize pgvector performance, recall, or memory usage - Use binary quantization for large vector datasets **Keywords:** pgvector, embeddings, semantic search, vector similarity, HNSW, IVFFlat, halfvec, cosine distance, nearest neighbor, RAG, LLM, AI search Covers: halfvec storage, HNSW index configuration (m, ef_construction, ef_search), quantization strategies, filtered search, bulk loading, and performance tuning.
An analytical in-process SQL database management system. Designed for fast analytical queries (OLAP). Highly interoperable with Python's data ecosystem (Pandas, NumPy, Arrow, Polars). Supports querying files (CSV, Parquet, JSON) directly without an ingestion step. Use for complex SQL queries on Pandas/Polars data, querying large Parquet/CSV files directly, joining data from different sources, analytical pipelines, local datasets too big for Excel, intermediate data storage and feature engineering for ML.
Compare two datasets to find differences, added/removed rows, changed values. Use for data validation, ETL verification, or tracking changes.
Guidance for data resharding tasks that involve reorganizing files across directory structures with constraints on file sizes and directory contents. This skill applies when redistributing datasets, splitting large files, or reorganizing data into shards while maintaining constraints like maximum files per directory or maximum file sizes. Use when tasks involve resharding, data partitioning, or directory-constrained file reorganization.
Use this skill when the user wants to debug, diagnose, or systematically iterate on an experiment that already exists, or when they need a structured experiment log for tracking runs, hypotheses, failures, results, and next steps during active research. Apply it to underperforming methods, training that will not converge, regressions after a change, inconsistent results across datasets, aimless experimentation without progress, and questions like 'why doesn't this work?', 'no progress after many attempts', or 'how should I investigate this failure?'. Also use it for setting up practical experiment logging/record-keeping that supports debugging and iteration. Do not use it for designing a brand-new experiment pipeline or full experiment program (use experiment-pipeline), generating research ideas, fixing isolated coding/syntax errors, or writing retrospective summaries into research memory/notes/knowledge bases.