run-models
verifiedRun AI models on Replicate via predictions, webhooks, and streaming.
>_replicate/skills/skills/run-models·commit 4e85074
name: run-models description: Run AI models on Replicate via predictions, webhooks, and streaming.
Docs
- Reference: https://replicate.com/docs/llms.txt
- OpenAPI schema: https://api.replicate.com/openapi.json
- MCP server: https://mcp.replicate.com
- Per-model docs:
https://replicate.com/{owner}/{model}/llms.txt - Set
Accept: text/markdownwhen requesting docs pages for Markdown responses.
Workflow
- Choose the right model - Search with the API or ask the user.
- Get model metadata - Fetch input and output schema via API.
- Create prediction - POST to /v1/predictions.
- Poll for results - GET prediction until status is "succeeded".
- Return output - Usually URLs to generated content.
Three ways to get output
- Create a prediction, store its id from the response, and poll until completion.
- Set a
Prefer: waitheader when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds. - Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.
Guidelines
- Use the
POST /v1/predictionsendpoint, as it supports both official and community models. - Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
- Validate input parameters against schema constraints (
minimum,maximum,enumvalues). Don't generate values that violate them. - When unsure about a parameter value, use the model's default example or omit the optional parameter.
- Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
- Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
- Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
- Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
- Webhooks are a good mechanism for receiving and storing prediction output.
Predictions
- A prediction goes through these states:
starting->processing->succeeded/failed/canceled. - Official models use
owner/nameformat. Community models requireowner/name:version_id. - The
POST /v1/predictionsendpoint handles both.
Webhooks
- Set
webhookto an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes. - Filter events with
webhook_events_filter:start,output,logs,completed. - Validate webhook signatures using the
Webhook-ID,Webhook-Timestamp, andWebhook-Signatureheaders. Get the signing secret fromGET /v1/webhooks/default/secret.
Prediction lifetime
- Set
lifetimeto auto-cancel predictions that run too long (e.g.30s,5m,1h). Measured from creation time.
Streaming
- Language models that support streaming include a
streamURL in the response. Use SSE to receive incremental output.
File handling
- Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
- Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.
Multi-model workflows
- Chain models by passing output URLs as file inputs to the next model.
- Start all independent predictions in parallel, then collect results.
- Output URLs are valid for 1 hour, which is enough for pipeline steps.