Loading...
Loading...
Found 13 Skills
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.
Use when "tokenizers", "HuggingFace tokenizer", "BPE", "WordPiece", or asking about "train tokenizer", "custom vocabulary", "tokenization", "subword", "fast tokenizer", "encode text"
tokenization과 context window를 중심으로 긴 입력 처리 한계와 실무 대응 방법(분할, 요약, 우선순위화)을 학습시키는 모듈.
Hugging Face Transformers best practices including model loading, tokenization, fine-tuning workflows, and inference optimization. Use when working with transformer models, fine-tuning LLMs, implementing NLP tasks, or optimizing transformer inference.
Detect and mask PII (names, emails, phones, SSN, addresses) in text and CSV files. Multiple masking strategies with reversible tokenization option.
Train your own GPT-2 level LLM for under $100 using nanochat, Karpathy's minimal hackable harness covering tokenization, pretraining, finetuning, evaluation, inference, and chat UI.
End-to-end Stellar development playbook. Covers Soroban smart contracts (Rust SDK), Stellar CLI, JavaScript/Python/Go SDKs for client apps, Stellar RPC (preferred) and Horizon API (legacy), Stellar Assets vs Soroban tokens (SAC bridge), wallet integration (Freighter, Stellar Wallets Kit), smart accounts with passkeys, status-sensitive zero-knowledge proof patterns, testing strategies, security patterns, and common pitfalls. Optimized for payments, asset tokenization, DeFi, privacy-aware applications, and financial applications. Use when building on Stellar, Soroban, or working with XLM, Stellar Assets, trustlines, anchors, SEPs, ZK proofs, or the Stellar RPC/Horizon APIs.
Use this skill when building NLP pipelines, implementing text classification, semantic search, embeddings, or summarization. Triggers on text preprocessing, tokenization, embeddings, vector search, named entity recognition, sentiment analysis, text classification, summarization, and any task requiring natural language processing.
High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
Work with state-of-the-art machine learning models for NLP, computer vision, audio, and multimodal tasks using HuggingFace Transformers. This skill should be used when fine-tuning pre-trained models, performing inference with pipelines, generating text, training sequence models, or working with BERT, GPT, T5, ViT, and other transformer architectures. Covers model loading, tokenization, training with Trainer API, text generation strategies, and task-specific patterns for classification, NER, QA, summarization, translation, and image tasks. (plugin:scientific-packages@claude-scientific-skills)
Implement PCI DSS compliance requirements for secure handling of payment card data and payment systems. Use when securing payment processing, achieving PCI compliance, or implementing payment card security measures.