Loading...
Loading...
Build a production-ready multilabel classifier on tabular data using XGBoost wrapped in MultiOutputClassifier. Use when each row can have multiple labels simultaneously (tags, attributes, gene functions, content moderation categories, multi-disease detection). Covers hamming loss, per-label metrics, label co-occurrence, MultiOutputClassifier vs ClassifierChain, and per-label SHAP. Default to this for any tabular multilabel problem.
npx skill4agent add brojonat/llmsrules multilabel-classificationMultiOutputClassifierClassifierChainmulticlass-classificationbinary-classification<project>/
├── data/
├── src/
│ ├── train.py # ibis read → MultiOutputClassifier(XGBClassifier) → MLflow
│ ├── predict.py # reload, return per-row label vector + per-label probas
│ └── plots.py # label balance, co-occurrence, per-label metrics, cardinality
├── notebooks/
│ └── demo.py
└── mlruns/import ibis
table = ibis.duckdb.connect().read_parquet("data/train.parquet")
feature_cols = [c for c in table.columns if c.startswith("feature_")]
label_cols = [c for c in table.columns if c.startswith("label_")]
data = (
table
.select(*feature_cols, *label_cols)
.execute()
)
X = data[feature_cols]
Y = data[label_cols].to_numpy().astype(int) # shape: (n_samples, n_labels)YMultiOutputClassifierfrom sklearn.compose import ColumnTransformer
from sklearn.multioutput import MultiOutputClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
def build_pipeline(feature_cols, seed):
return Pipeline([
("preprocess", ColumnTransformer([("num", StandardScaler(), feature_cols)])),
("clf", MultiOutputClassifier(
XGBClassifier(
n_estimators=300,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
reg_lambda=1.0,
objective="binary:logistic",
eval_metric="logloss",
random_state=seed,
n_jobs=-1,
),
n_jobs=-1, # parallelize across labels
)),
])MultiOutputClassifierbinary:logisticfrom sklearn.metrics import hamming_loss, accuracy_score
ham = hamming_loss(Y_test, Y_pred) # primary metric, lower = better
exact_match = accuracy_score(Y_test, Y_pred) # subset accuracy — too strict alonef1_scoresamples| Average | What it computes | When to use |
|---|---|---|
| macro | Unweighted mean of per-label F1 | All labels matter equally — rare labels drag the average down (good) |
| micro | F1 over the pooled | Overall correctness across all label slots |
| weighted | Per-label F1 weighted by support | Weights toward common labels — hides rare-label failures |
| samples | Per-row F1, then averaged across rows | Per-row "did we get the labels mostly right?" — useful for tagging tasks |
from sklearn.metrics import f1_score
f1_macro = f1_score(Y_test, Y_pred, average="macro", zero_division=0)
f1_micro = f1_score(Y_test, Y_pred, average="micro", zero_division=0)
f1_weighted = f1_score(Y_test, Y_pred, average="weighted", zero_division=0)
f1_samples = f1_score(Y_test, Y_pred, average="samples", zero_division=0)MultiOutputClassifierimport numpy as np
n_labels = Y.shape[1]
cooc = np.zeros((n_labels, n_labels))
for i in range(n_labels):
i_count = int(Y[:, i].sum())
if i_count == 0:
continue
for j in range(n_labels):
cooc[i, j] = float(((Y[:, i] == 1) & (Y[:, j] == 1)).sum() / i_count)
# cooc[i, j] = "given label_i is on, how often is label_j also on?"MultiOutputClassifierClassifierChainClassifierChainfrom sklearn.multioutput import ClassifierChain
clf_chain = ClassifierChain(
XGBClassifier(...),
order=[0, 1, 2, 3, 4, 5], # or "random" for cross-validated stability
random_state=42,
)ClassifierChainMultiOutputClassifierClassifierChainfor i, lbl in enumerate(label_cols):
f1_i = float(f1_score(Y_test[:, i], Y_pred[:, i], average="binary", zero_division=0))
mlflow.log_metric(f"test_f1__{lbl}", f1_i)| Kind | What |
|---|---|
| data path, n_rows, n_features, n_labels, label_columns, seed, hyperparameters |
| hamming_loss (primary), subset_accuracy, F1 macro / micro / weighted / samples, per-label F1 (one metric per label), label cardinality (true vs predicted) |
| data hash, label cardinality / density from sidecar |
| model, label balance bar, co-occurrence heatmap, per-label metrics bar, label cardinality histogram (true vs pred) |
MultiOutputClassifierMultiOutputClassifier(..., n_jobs=-1)zero_division=0zero_division=0scale_pos_weightMultiOutputClassifiersample_weightY.shape == (n_samples, n_labels)y.shape == (n_samples,)[0, n_classes)f1_scoredemo.pyMultiOutputClassifier