Loading...
Loading...
Implement LDA topic modeling to discover latent topics in document collections. Use this skill when the user needs to extract topics from a text corpus, categorize documents by theme, or explore thematic structure — even if they say 'what are the main topics', 'topic extraction', or 'document clustering by theme'.
npx skill4agent add asgard-ai-platform/skills algo-nlp-ldaIRON LAW: The Number of Topics K Must Be Chosen, Not Discovered
LDA does NOT tell you how many topics exist. K is a hyperparameter.
Too few topics: overly broad, mixed themes. Too many: fragmented,
redundant topics. Use coherence score (C_v) to compare K values,
but the final choice requires human judgment on topic interpretability.{
"topics": [{"id": 0, "label": "finance", "top_words": ["revenue", "profit", "quarter", "growth"], "coherence": 0.55}],
"doc_topics": [{"doc_id": "d1", "dominant_topic": 0, "topic_distribution": [0.7, 0.1, 0.2]}],
"metadata": {"K": 10, "coherence_avg": 0.48, "documents": 5000, "vocabulary": 8000}
}| Input | Expected | Why |
|---|---|---|
| Very short documents | Poor topic assignment | Too few words for reliable mixture estimation |
| Homogeneous corpus | 1-2 topics dominate | All documents are similar, limited topic diversity |
| K=1 | Single topic = corpus vocabulary | Degenerate case, no discrimination |
references/topic-evaluation.mdreferences/advanced-lda.md