name: resemble-detect description: Deepfake detection and media safety — detect AI-generated audio, images, video, and text, trace synthesis sources, apply watermarks, verify speaker identity, and analyze media intelligence using Resemble AI

Resemble Detect — Deepfake Detection & Media Safety

Analyze audio, image, video, and text for synthetic manipulation, AI-generated content, watermarks, speaker identity, and media intelligence using the Resemble AI platform.

Core Principle — THE IRON LAW

"NEVER DECLARE MEDIA AS REAL OR FAKE WITHOUT A COMPLETED DETECTION RESULT."

Do not guess, infer, or speculate about media authenticity. Every authenticity claim must be backed by a completed Resemble detect job with a returned label, score, and status: "completed". If the detection is still processing, wait. If it failed, say so — do not substitute your own judgment.

When to Use

Use this skill whenever the user's request involves any of these:

Checking if audio, video, image, or text is AI-generated or manipulated
Detecting deepfakes in any media format
Verifying media authenticity or provenance
Identifying which AI platform synthesized audio (source tracing)
Applying or detecting watermarks on media
Analyzing media for speaker info, emotion, transcription, or misinformation
Asking natural-language questions about detection results
Matching or verifying speaker identity against known voice profiles
Detecting AI-generated or machine-written text
Any mention of: "deepfake", "fake detection", "synthetic media", "voice verification", "watermark", "media forensics", "authenticity check", "source tracing", "is this real", "AI-written text", "text detection"

Do NOT use for text-to-speech generation, voice cloning, or speech-to-text transcription — those are separate Resemble capabilities.

Capability Decision Tree

User wants to...	Use this	API endpoint
Check if media is AI-generated / deepfake	Deepfake Detection	`POST /detect`
Know which AI platform made fake audio	Audio Source Tracing	`POST /detect` with flag
Get speaker info, emotion, transcription from media	Intelligence	`POST /intelligence`
Ask questions about a completed detection	Detect Intelligence	`POST /detects/{uuid}/intelligence`
Apply an invisible watermark to media	Watermark Apply	`POST /watermark/apply`
Check if media contains a watermark	Watermark Detect	`POST /watermark/detect`
Verify a speaker's identity against known profiles	Identity Search	`POST /identity/search`
Check if text is AI-generated	Text Detection	`POST /text_detect`
Create a voice identity profile for future matching	Identity Create	`POST /identity`

When multiple capabilities apply (e.g., user wants deepfake detection AND intelligence), combine them in a single POST /detect call using the intelligence: true flag rather than making separate requests.

Required Setup

API Key: Bearer token from the Resemble AI dashboard
Base URL: https://app.resemble.ai/api/v2
Auth Header: Authorization: Bearer <RESEMBLE_API_KEY>
Media Requirement: All media must be at a publicly accessible HTTPS URL

If the user provides a local file path instead of a URL, inform them the file must be hosted at a public HTTPS URL first. Do not attempt to upload local files to the API.

MCP Tools Available

When the Resemble MCP server is connected, use these tools instead of raw API calls:

Tool	Purpose
`resemble_docs_lookup`	Get comprehensive docs for any detect sub-topic
`resemble_search`	Search across all documentation
`resemble_api_endpoint`	Get exact OpenAPI spec for any endpoint
`resemble_api_search`	Find endpoints by keyword
`resemble_get_page`	Read specific documentation pages
`resemble_list_topics`	List all available topics

Tool usage pattern: Use resemble_docs_lookup with topic "detect" to get the full picture, then resemble_api_endpoint for exact request/response schemas before making API calls.

Phase 1: Deepfake Detection

The core capability. Submit any audio, image, or video for AI-generated content analysis.

Submit a Detection

POST /detect
Content-Type: application/json
Authorization: Bearer <API_KEY>

{
  "url": "https://example.com/media.mp4",
  "visualize": true,
  "intelligence": true,
  "audio_source_tracing": true
}

Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	HTTPS URL to audio, image, or video file
`callback_url`	string	No	Webhook URL for async completion notification
`visualize`	boolean	No	Generate heatmap/visualization artifacts
`intelligence`	boolean	No	Run multimodal intelligence analysis alongside detection
`audio_source_tracing`	boolean	No	Identify which AI platform synthesized fake audio
`frame_length`	integer	No	Audio/video analysis window size in seconds (1–4, default 2)
`start_region`	number	No	Start of segment to analyze (seconds)
`end_region`	number	No	End of segment to analyze (seconds)
`model_types`	string	No	`"image"` or `"talking_head"` (for face-swap detection)
`use_reverse_search`	boolean	No	Enable reverse image search (image only)
`use_ood_detector`	boolean	No	Enable out-of-distribution detection
`zero_retention_mode`	boolean	No	Auto-delete media after detection completes

Supported formats:

Audio: WAV, MP3, OGG, M4A, FLAC
Video: MP4, MOV, AVI, WMV
Image: JPG, PNG, GIF, WEBP

Poll for Results

Detection is asynchronous. Poll GET /detect/{uuid} until status is "completed" or "failed".

GET /detect/{uuid}
Authorization: Bearer <API_KEY>

Polling best practice: Start at 2s intervals, back off to 5s, then 10s. Most detections complete within 10–60 seconds depending on media length.

Reading Results by Media Type

Audio results — in metrics:

{
  "label": "fake",
  "score": ["0.92", "0.88", "0.95"],
  "consistency": "0.91",
  "aggregated_score": "0.92",
  "image": "https://..."
}

label: "fake" or "real" — the verdict
score: Per-chunk prediction scores (array)
aggregated_score: Overall confidence (0.0–1.0, higher = more likely synthetic)
consistency: How consistent the prediction is across chunks
image: Visualization heatmap URL (if visualize: true)

Image results — in image_metrics:

{
  "type": "ImageAnalysis",
  "label": "fake",
  "score": 0.87,
  "image": "https://...",
  "ifl": { "score": 0.82, "heatmap": "https://..." },
  "reverse_image_search_sources": [
    { "url": "...", "title": "...", "verdict": "known_fake", "similarity": 0.95 }
  ]
}

label / score: Verdict and confidence
ifl: Invisible Frequency Layer analysis with heatmap
reverse_image_search_sources: Known sources found online (if use_reverse_search: true)

Video results — in video_metrics:

{
  "label": "fake",
  "score": 0.89,
  "certainty": 0.91,
  "children": [
    {
      "type": "VideoResult",
      "conclusion": "Fake",
      "score": 0.89,
      "timestamp": 2.5,
      "children": [...]
    }
  ]
}

Hierarchical tree of frame-level and segment-level results
Each child has timestamp, score, certainty, and may have nested children
Video with audio track returns both metrics (audio) and video_metrics (visual)

Interpreting Scores

Score Range	Interpretation
0.0 – 0.3	Strong indication of authentic/real media
0.3 – 0.5	Inconclusive — recommend additional analysis
0.5 – 0.7	Likely synthetic — flag for review
0.7 – 1.0	High confidence synthetic/AI-generated

Always present scores with context. Say "The detection returned a score of 0.87, indicating high confidence that this audio is AI-generated" — never just "it's fake."

Phase 2: Intelligence — Media Analysis

Analyze media for rich structured insights independent of or alongside detection.

Standalone Intelligence

POST /intelligence
Content-Type: application/json
Authorization: Bearer <API_KEY>

{
  "url": "https://example.com/audio.mp3",
  "json": true
}

Parameters:

Parameter	Type	Required	Description
`url`	string	One of	HTTPS URL to media file
`media_token`	string	One of	Token from secure upload (alternative to URL)
`detect_id`	string	No	UUID of existing detect to associate
`media_type`	string	No	`"audio"`, `"video"`, or `"image"` (auto-detected)
`json`	boolean	No	Return structured fields (default: false for audio/video, true for image)
`callback_url`	string	No	Webhook for async mode

Audio/Video structured response (json: true):

speaker_info — speaker description (age, gender)
language / dialect — detected language
emotion — detected emotional state
speaking_style — conversational, formal, etc.
context — inferred context of the speech
message — content summary
abnormalities — anomalies detected in the media
transcription — full transcript
translation — translation if non-English
misinformation — misinformation analysis

Image structured response:

scene_description — what the image shows
subjects — people/objects identified
authenticity_analysis — visual authenticity assessment
context_and_setting — environment description
abnormalities — visual anomalies
misinformation — misinformation analysis

Detect Intelligence — Ask Questions About Results

After a detection completes, ask natural-language questions about it:

POST /detects/{detect_uuid}/intelligence
Content-Type: application/json
Authorization: Bearer <API_KEY>

{
  "query": "How confident is the model that this audio is fake?"
}

This returns a question UUID. Poll GET /detects/{detect_uuid}/intelligence/{question_uuid} until status is "completed" to get the answer.

Good questions to suggest:

"Summarize the detection results in plain language"
"What specific indicators suggest this is AI-generated?"
"How do the audio and video detection results differ?"
"What is the confidence level and what does it mean?"
"Are there any inconsistencies in the analysis?"

Status flow: pending → processing → completed (or failed)

Prerequisite: The detection must have status: "completed". Submitting a question against a processing or failed detection returns a 422 error.

Phase 3: Audio Source Tracing

When audio is detected as synthetic (label: "fake"), identify which AI platform generated it.

Enable it by setting audio_source_tracing: true in the POST /detect request.

Result appears in the detection response under audio_source_tracing:

{
  "label": "elevenlabs",
  "error_message": null
}

Known source labels include: resemble_ai, elevenlabs, real, and others as the model expands.

Important: Source tracing only runs when audio is labeled as "fake". If the audio is "real", no source tracing result will appear.

Standalone query:

GET /audio_source_tracings — list all source tracing reports
GET /audio_source_tracings/{uuid} — get specific report

Phase 4: Watermarking

Apply invisible watermarks to media for provenance tracking, or detect existing watermarks.

Apply a Watermark

POST /watermark/apply
Content-Type: application/json
Authorization: Bearer <API_KEY>
Prefer: wait

{
  "url": "https://example.com/image.png",
  "strength": 0.3,
  "custom_message": "my-organization"
}

Parameter	Type	Required	Description
`url`	string	Yes	HTTPS URL to media file
`strength`	number	No	Watermark strength 0.0–1.0 (image/video only, default 0.2)
`custom_message`	string	No	Custom message to embed (image/video only, default "resembleai")

Add Prefer: wait header for synchronous response
Without it, poll GET /watermark/apply/{uuid}/result
Response includes watermarked_media URL to download the watermarked file

Detect a Watermark

POST /watermark/detect
Content-Type: application/json
Authorization: Bearer <API_KEY>
Prefer: wait

{
  "url": "https://example.com/suspect-image.png"
}

Audio detection result:

{ "has_watermark": true, "confidence": 0.95 }

Image/Video detection result:

{ "has_watermark": true }

Phase 5: Identity — Speaker Verification (Beta)

Create voice identity profiles and match incoming audio against them.

Beta feature — requires joining the preview program. Inform the user if they encounter access errors.

Create an Identity Profile

POST /identity
Content-Type: application/json
Authorization: Bearer <API_KEY>

{
  "audio_url": "https://example.com/known-speaker.wav",
  "name": "Jane Doe"
}

Search Against Known Identities

POST /identity/search
Content-Type: application/json
Authorization: Bearer <API_KEY>

{
  "audio_url": "https://example.com/unknown-speaker.wav",
  "top_k": 5
}

Response:

{
  "success": true,
  "item": [
    { "uuid": "...", "name": "Jane Doe", "confidence": 0.92, "distance": 0.08 }
  ]
}

Lower distance = closer match. Higher confidence = stronger match.

Phase 6: Text Detection

Detect whether text content is AI-generated or human-written.

Beta feature — requires the detect_beta_user role or a billing plan that includes the dfd_text product.

Submit a Text Detection

POST /text_detect
Content-Type: application/json
Authorization: Bearer <API_KEY>

Add the Prefer: wait header for a synchronous (blocking) response. Without it, the job runs asynchronously — poll or use a callback.

Parameters:

Parameter	Type	Required	Description
`text`	string	Yes	Text to analyze (max 100,000 characters)
`thinking`	string	No	Always use `"low"` (default)
`threshold`	float	No	Decision threshold 0.0–1.0 (default: 0.5)
`callback_url`	string	No	Webhook URL for async completion notification
`privacy_mode`	boolean	No	If true, text content is not stored after analysis

Response:

{
  "success": true,
  "item": {
    "uuid": "abc-123",
    "status": "completed",
    "prediction": "ai",
    "confidence": 0.91,
    "text_content": "This is some text to analyze.",
    "privacy_mode": false,
    "created_at": "...",
    "updated_at": "..."
  }
}

prediction: "ai" or "human" — the verdict
confidence: 0.0–1.0, higher = more confident in the prediction
status: "processing", "completed", or "failed"

Poll for Results

If you did not use Prefer: wait, poll until status is "completed" or "failed":

GET /text_detect/{uuid}
Authorization: Bearer <API_KEY>

List Text Detections

GET /text_detect
Authorization: Bearer <API_KEY>

Returns paginated text detections for the team.

Callback

If callback_url was provided, a POST is sent on completion:

{ "success": true, "item": { ... } }

On failure:

{ "success": false, "item": { ... }, "error": "Error message here" }

Recommended Workflows

Full Media Forensics (Most Thorough)

For a comprehensive analysis, combine all capabilities:

Submit detection with all flags enabled:

{
  "url": "https://example.com/suspect.mp4",
  "visualize": true,
  "intelligence": true,
  "audio_source_tracing": true,
  "use_reverse_search": true
}

Poll until status: "completed"
Read metrics / image_metrics / video_metrics for the verdict
Read intelligence.description for structured media analysis
If audio labeled "fake", check audio_source_tracing.label for the source platform
Ask follow-up questions via Detect Intelligence if anything needs clarification
Check for watermarks via POST /watermark/detect if provenance is relevant

Quick Authenticity Check (Fastest)

For a fast pass/fail:

Submit minimal detection: { "url": "..." }
Poll until complete
Check label and aggregated_score (audio) or label and score (image/video)
Report result with score context

Provenance Pipeline (Content Creators)

For creators who want to prove their content is authentic:

Apply watermark to original content: POST /watermark/apply
Distribute watermarked media
Later, verify provenance: POST /watermark/detect against any copy

Red Flags — Stop and Reassess

Declaring authenticity without a detection result — Never say media is real or fake based on visual/auditory inspection alone
Ignoring the score and reporting only the label — A "fake" label with score 0.51 means something very different from score 0.95
Submitting local file paths to the API — The API requires publicly accessible HTTPS URLs (does not apply to text detection)
Sending text longer than 100,000 characters to text detection — Split into chunks or inform the user of the limit
Polling too aggressively — Start at 2s intervals, back off exponentially; do not loop at <1s
Asking Detect Intelligence questions before detection completes — Results in 422 error
Expecting source tracing on "real" audio — Source tracing only runs on audio labeled "fake"
Treating beta features (Identity) as production-ready — Warn users about beta status
Ignoring zero_retention_mode for sensitive media — Always suggest this flag when the user indicates the media is sensitive or private
Making multiple separate API calls when flags can combine — Use intelligence: true and audio_source_tracing: true on the detection call instead of separate requests

Response Presentation Guidelines

When presenting results to users:

Lead with the verdict — "The detection indicates this audio is likely AI-generated (score: 0.87)"
Provide score context — Use the score interpretation table above
Mention limitations — Detection is probabilistic, not absolute proof
Include actionable next steps — Suggest intelligence queries, source tracing, or watermark checks as appropriate
For inconclusive results (0.3–0.5) — Explicitly state the result is inconclusive and recommend additional analysis with different parameters or manual review
Never present detection as legal evidence — Detection results are analytical tools, not forensic certifications

Error Handling

Error	Cause	Resolution
400	Invalid request body or missing `url`	Check required parameters
401	Invalid or missing API key	Verify `RESEMBLE_API_KEY`
404	Detection UUID not found	Verify the UUID from the creation response
422	Detection not completed (for Intelligence)	Wait for detection to reach `completed` status
429	Rate limited	Back off and retry with exponential delay
500	Server error	Retry once, then report to user

Privacy & Compliance Notes

Zero retention mode: Set zero_retention_mode: true to auto-delete media after analysis. The URL is redacted and media_deleted is set to true post-completion.
Text privacy mode: Set privacy_mode: true on text detection to prevent text content from being stored after analysis.
Data handling: Media URLs and text content are stored by default. For GDPR/compliance-sensitive workflows, enable zero retention (media) or privacy mode (text).
Callback security: If using callback_url, ensure the endpoint is HTTPS and authenticated on the receiving end.