Search Results: tokenization

Found 22 Skills

AI & Machine Learningdavila7/claude-code-templ...

sentencepiece

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

🇺🇸|EnglishTranslated

Tools & Utilitiesokx/plugin-store

pendle

Pendle Finance yield tokenization plugin. Buy or sell fixed-yield PT tokens, trade YT yield tokens, provide or remove AMM liquidity, and mint or redeem PT+YT pairs. Trigger phrases: buy PT, sell PT, buy YT, sell YT, Pendle fixed yield, Pendle liquidity, add liquidity Pendle, remove liquidity Pendle, mint PT YT, redeem PT YT, Pendle positions, Pendle markets, Pendle APY. Chinese: 购买PT, 出售PT, 购买YT, 出售YT, Pendle固定收益, Pendle流动性, Pendle持仓, Pendle市场

🇺🇸|EnglishTranslated

Tools & Utilitiesokx/plugin-store

pendle-plugin

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

huggingface-tokenizers

Use when "tokenizers", "HuggingFace tokenizer", "BPE", "WordPiece", or asking about "train tokenizer", "custom vocabulary", "tokenization", "subword", "fast tokenizer", "encode text"

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

data-anonymizer

Detect and mask PII (names, emails, phones, SSN, addresses) in text and CSV files. Multiple masking strategies with reversible tokenization option.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningvuralserhat86/antigravity...

huggingface_transformers

Hugging Face Transformers best practices including model loading, tokenization, fine-tuning workflows, and inference optimization. Use when working with transformer models, fine-tuning LLMs, implementing NLP tasks, or optimizing transformer inference.

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

huggingface-tokenizers

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

🇺🇸|EnglishTranslated

AI & Machine Learningolorolor/fundamentals-wit...

module2-tokens-context

tokenization과 context window를 중심으로 긴 입력 처리 한계와 실무 대응 방법(분할, 요약, 우선순위화)을 학습시키는 모듈.

🇺🇸|EnglishTranslated

Security & Compliancevtexdocs/ai-skills

payment-pci-security

Apply when handling credit card data, implementing secureProxyUrl flows, or working with payment security and proxy code. Covers PCI DSS compliance, Secure Proxy card tokenization, sensitive data handling rules, X-PROVIDER-Forward-To header usage, and custom token creation. Use for any payment connector that processes credit, debit, or co-branded card payments to prevent data breaches and PCI violations.

🇺🇸|EnglishTranslated

AI & Machine Learningneo4j-contrib/neo4j-skill...

neo4j-genai-plugin-skill

Use Neo4j GenAI Plugin ai.text.* functions and procedures for in-Cypher embedding generation, text completion, structured output, chat, tokenization, and batch ingestion. Covers ai.text.embed(), ai.text.embedBatch(), ai.text.completion(), ai.text.structuredCompletion(), ai.text.aggregateCompletion(), ai.text.chat(), ai.text.tokenCount(), ai.text.chunkByTokenLimit(), and provider configuration for OpenAI, Azure OpenAI, VertexAI, and Amazon Bedrock. Requires CYPHER 25. Replaces deprecated genai.vector.encode(). Use when writing pure-Cypher GraphRAG, embedding nodes in-graph, generating structured maps from prompts, or calling LLMs inside Cypher queries. Does NOT handle neo4j-graphrag Python library pipelines — use neo4j-graphrag-skill. Does NOT handle vector index creation/search — use neo4j-vector-index-skill.

🇺🇸|EnglishTranslated

AI & Machine Learningdaemon-blockint-tech/agen...

research-engineer-scientist-tokens

Guides research engineering and science on LLM tokens—hypotheses about context use, tokenization, compression, and inference efficiency; rigorous benchmarks (tokens per task, quality–cost Pareto); ablation design; instrumentation and reproducible logs; and research memos that inform product decisions. Use when designing token-efficiency experiments, measuring context utilization, comparing compression or routing methods, analyzing tokenizer effects, or writing technical reports on token/cost trade-offs—not for phased cost roadmaps and owners (ai-token-improvement-plan-engineer), production context pipeline implementation (ai-context-engineer), single-prompt edits (prompt-engineer), general non-token AI research (ai-researcher), or shipping features (ai-engineer).

🇺🇸|EnglishTranslated

AI & Machine Learningabsolutelyskilled/absolut...

nlp-engineering

Use this skill when building NLP pipelines, implementing text classification, semantic search, embeddings, or summarization. Triggers on text preprocessing, tokenization, embeddings, vector search, named entity recognition, sentiment analysis, text classification, summarization, and any task requiring natural language processing.

🇺🇸|EnglishTranslated