Multimodal video analysis MCP. Extracts transcripts, OCR text, and visual chart data from YouTube, Vimeo, and Instagram. 10 free credits to start. Transcripts capture half the signal. When a speaker says "look at this chart" and points at a number on screen, a transcript-only tool loses the data. Contendeo returns it. Contendeo is a true multimodal video analysis MCP server. We process the actual video frames alongside the audio to give your LLM complete context. Our pipeline (yt-dlp → ffmpeg → Groq Whisper → Tesseract OCR → Claude Vision) extracts timestamped transcripts, keyframe descriptions, and hard OCR data into one structured output. Works with YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, and direct video URLs. **Four tools:** - `quick_transcribe` — fast audio transcription with speaker labels (1 credit) - `deep_analyze` — full multimodal extraction: transcript + visual keyframe analysis + OCR + chart/diagram data (5 credits) - `clip_context` — analyze a specific timestamp range without processing the full video (1–3 credits) - `batch_analyze` — parallel processing of up to 10 videos with cross-video synthesis **Pricing & Free Tier:** Pay only for processing. Cache hits are free, and failures are refunded automatically. Get 10 free credits on signup, no card required. Create your account at [contendeo.app](https://contendeo.app).
How to connect
https://server.smithery.ai/highcryptoclub/contendeo/mcp
curl -X POST https://server.smithery.ai/highcryptoclub/contendeo/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'