ausaf Logo Image
Mohammad Ausaf

Tinify

Enterprise Agentic AI Platform for Autonomous Content Generation

What It Is

Tinify is an internal AI content platform at Galleri5. Users describe what they want in natural language, and the system decomposes that into a multi-step workflow — coordinating image generation, video synthesis, audio, and post-processing across 30+ models from different providers (Gemini, Replicate, FAL, BytePlus, ElevenLabs).

I built most of the backend. The core challenge wasn't just calling AI APIs — it was making autonomous workflow generation work inside a conversational interface.

Architecture

Workflow Planner: When a user sends a complex request, Gemini 2.5 Pro generates a JSON workflow — a DAG of tasks with dependencies. Each node specifies a tool, inputs, and execution mode (sync/async).

DAG Execution: Nodes are topologically sorted (Kahn's algorithm), dependencies resolved at runtime. A node can reference outputs from previous nodes: {"source": "generate_image", "output_key": "s3_url"}.

Dual Execution Modes: Fast tools (text, image gen) run synchronously. Heavy tools (video) submit async jobs and poll status endpoints. Background tasks monitor completion and trigger downstream nodes.

Provider Abstraction: A factory pattern routes requests to the right provider. Cost-aware selection downgrades to cheaper models when user credits run low.

Hard Problem #1: Stateless Planner in Multi-Turn Chat

The workflow planner is stateless — it takes a prompt and outputs a DAG. But it lives inside a chat that has history, context, uploaded files, and previous generations. The user might say "now make that image into a video" — and "that image" refers to something from 5 messages ago.

Solution: Before calling the planner, I build a context window — recent messages, file references with their S3 URIs, previous workflow outputs. This gets injected into the planner prompt so it can resolve references like "that image" to actual asset URLs. The generated workflow then executes independently, but its outputs feed back into the chat as assistant messages, maintaining conversational continuity.

The tricky part: the planner doesn't "remember" the chat, but the chat needs to feel like a continuous conversation. Bridging these two paradigms without leaking context or breaking execution was the core design problem.

Hard Problem #2: Error Handling Across 30+ Providers

External AI providers fail in different ways — rate limits, balance exhaustion, model unavailability, timeouts, malformed responses. Each failure mode needs different handling: retry, fallback, refund, or alert.

Solution: AI-powered error classification. Errors get sent to Gemini Flash with a classification prompt — it distinguishes "balance_exhaustion" from "rate_limit" from "general_error". Each category routes to a different Slack channel with appropriate urgency. Financial errors ping specific team members immediately.

Job-level handling: 180-second timeouts trigger automatic cancellation via provider APIs (FAL, Replicate both support cancel endpoints). On timeout or failure, credits auto-refund to the user's account. No manual intervention needed.

Infrastructure

Caching: Redis-first with MongoDB fallback. Thread contexts cached for sub-10ms retrieval; if Redis is down, system degrades gracefully to 50-200ms DB queries.

Rate Limiting: Sliding window implementation using Redis sorted sets. Per-user, per-provider limits (default 60 req/hour). Accurate to milliseconds.

Connection Pooling: Differential pool sizes — 500 connections for high-traffic databases, 200 for lower-traffic ones. Prevents connection exhaustion under load.

Credit System: Unified ledger across multiple collections (assets, model garden, image studio). Single pool prevents feature arbitrage. Dynamic per-model pricing stored in MongoDB for hot-swapping without deploys.

Built with Tinify Production content generated using this platform.