Total 50,483 skills, Data Processing has 2559 skills
Showing 12 of 2559 skills
Ingest and transform large data files (CSV/JSON) into Elasticsearch indices. Stream-based processing for files up to 30GB, cross-version migration (ES 8.x ↔ 9.x), custom JavaScript transformations, and reindexing with transforms. Use when you need to load data into Elasticsearch, migrate indices, or transform data during ingestion.
Use when "NetworkX", "graph analysis", "network analysis", "graph algorithms", "shortest path", "centrality", "PageRank", "community detection", "social network", "knowledge graph"
Data visualization for charts and graphs. Use when user needs "画图/图表/可视化". Creates static PNG or interactive HTML charts from data.
Guidance for building and fixing Cython extensions, particularly for numpy compatibility issues. This skill should be used when tasks involve compiling Cython code, fixing deprecated numpy type errors, or resolving compatibility issues between Cython extensions and modern numpy versions (2.0+).
Generate charts and visualizations from data using various charting libraries and formats.
Google Optimization Tools. An open-source software suite for optimization, specialized in vehicle routing, flows, integer and linear programming, and constraint programming. Features the world-class CP-SAT solver. Use for vehicle routing problems (VRP), scheduling, bin packing, knapsack problems, linear programming (LP), integer programming (MIP), network flows, constraint programming, combinatorial optimization, resource allocation, shift scheduling, job-shop scheduling, and discrete optimization problems.
Use when "data visualization", "plotting", "charts", "matplotlib", "plotly", "seaborn", "graphs", "figures", "heatmap", "scatter plot", "bar chart", "interactive plots"
The practice of collecting, analyzing, and acting on data to drive product decisions. Great analytics isn't about dashboards—it's about insights that lead to action. Every metric should answer a question that changes behavior. This skill covers event tracking, metrics design, dashboards, user behavior analysis, and data-driven decision making. The best analytics teams measure what matters, not what's easy to measure. Use when "analytics, metrics, tracking, dashboard, funnel, cohort, retention, events, KPI, measure, data, insights, conversion, engagement, analytics, metrics, data, dashboards, tracking, funnels, cohorts, KPIs, insights" mentioned.
Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.
Delta Lake integration with cloud storage (S3, GCS, Azure). Covers storage_options, PyArrow filesystem, time travel, and partitioned writes.
Elasticsearch 集群管理
Manage database, file system, and API connections for Sling. Use when setting up connections, testing connectivity, discovering tables/files, or configuring credentials.