Search Results: computer-vision

Found 28 Skills

AI & Machine Learningmicrosoft/agent-skills

azure-ai-vision-imageanalysis-py

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

vss-deploy-detection-tracking-2d

Use to deploy, run, debug, or tear down the RTVI-CV 2D detection / tracking microservice and call its REST API. Not for VLM, embedding, or analytics — use the matching vss-* skill.

🇺🇸|EnglishTranslated

24 scripts/Attention

AI & Machine Learningmindrally/skills

transformers-huggingface

Expert guidance for working with Hugging Face Transformers library for NLP, computer vision, and multimodal AI tasks.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-train-reid

Person re-identification (ReID). Learns discriminative embeddings to match the same person across different camera views, based on metric learning. Use when training, evaluating, exporting, or running inference for a TAO person re-identification model. Trigger phrases include "train ReID", "person re-identification", "cross-camera person matching", "ReID embeddings", "person re-id".

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-port-huggingface-model

Integrate a HuggingFace Computer Vision model into the NVIDIA TAO Toolkit ecosystem (tao-core config, tao-pytorch trainer, tao-deploy TensorRT pipeline). Use when the user asks to "integrate a HuggingFace model into TAO", "add an HF model to TAO Toolkit", "wire a HuggingFace ViT/DETR/ SegFormer into tao-pytorch", "build a TAO trainer + deploy pipeline for an HF CV model", or pastes a HuggingFace model URL/ID and wants it turned into a TAO model. Covers the full 7-phase loop: prerequisites check, HuggingFace inspection and validation, codebase exploration, tao-core configuration and native trainer implementation, ONNX export plus TensorRT deploy integration, packaging and L0 testing, container-based end-to-end validation, and (conditional) accuracy/latency tuning. Supports classification, object detection, semantic / instance / panoptic segmentation, zero-shot detection, and depth estimation.

🇺🇸|EnglishTranslated

Mobile Developmentsoftware-mansion-labs/rea...

react-native-executorch

Build on-device AI into React Native apps using ExecuTorch. Provides hooks for LLMs, computer vision, OCR, audio processing, and embeddings without cloud dependencies. Use when building AI features into mobile apps - AI chatbots, image recognition, speech processing, or text search.

🇺🇸|EnglishTranslated

AI & Machine Learningcharleswiltgen/axiom

axiom-vision-ref

Vision framework API, VNDetectHumanHandPoseRequest, VNDetectHumanBodyPoseRequest, person segmentation, face detection, VNImageRequestHandler, recognized points, joint landmarks, VNRecognizeTextRequest, VNDetectBarcodesRequest, DataScannerViewController, VNDocumentCameraViewController, RecognizeDocumentsRequest

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

segment-anything-model

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

tao-train-action-recognition

Action recognition from video sequences. Supports RGB, optical flow, and joint (multi-stream) input types for classifying temporal actions in video clips. Use when training, evaluating, exporting, or running inference on a TAO action-recognition model. Trigger phrases include "train action recognition", "video action classification", "RGB + optical flow action model", "TAO ActionRecognition".

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

tao-train-fast-foundation-stereo

Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of FoundationStereo. Predicts disparity maps from stereo image pairs with ~10× lower latency than full FoundationStereo. Use when training, evaluating, exporting, or running inference for a TAO FastFoundationStereo (FFS) model. Trigger phrases include "train fast stereo", "real-time stereo disparity", "FastFoundationStereo", "distilled stereo depth".

🇺🇸|EnglishTranslated

Mobile Developmentcharleswiltgen/axiom

axiom-vision

subject segmentation, VNGenerateForegroundInstanceMaskRequest, isolate object from hand, VisionKit subject lifting, image foreground detection, instance masks, class-agnostic segmentation, VNRecognizeTextRequest, OCR, VNDetectBarcodesRequest, DataScannerViewController, document scanning, RecognizeDocumentsRequest

🇺🇸|EnglishTranslated

AI & Machine Learningletta-ai/skills

video-processing

Guide for video analysis and frame-level event detection tasks using OpenCV and similar libraries. This skill should be used when detecting events in videos (jumps, movements, gestures), extracting frames, analyzing motion patterns, or implementing computer vision algorithms on video data. It provides verification strategies and helps avoid common pitfalls in video processing workflows.

🇺🇸|EnglishTranslated