Loading...
Loading...
Guidance for setting up HuggingFace model inference services with Flask APIs. This skill applies when downloading HuggingFace models, creating inference endpoints, or building ML model serving APIs. Use for tasks involving transformers library, model caching, and REST API creation for ML models.
npx skill4agent add letta-ai/skills hf-model-inferenceuvpipcondauvtransformerstorchtensorflowflask/app/model_cache/model_namefrom transformers import pipeline
model = pipeline("task-type", model="model-name", cache_dir="/path/to/cache")from flask import Flask, request, jsonify
from transformers import pipeline
app = Flask(__name__)
model = None # Load at startup
@app.route('/predict', methods=['POST'])
def predict():
# Handle inference
pass{"error": "message"}0.0.0.0app.run(host='0.0.0.0', port=5000)curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{"text": "valid input text"}'curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{"text": "different input text"}'curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{}'torch/health