API Providers
The API layer provides unified access to four AI generation engines for images and videos. All providers are accessed through a single REST API at https://frai.paradream.info/api/v1/creative, authenticated via Cloudflare Access.
Provider Capabilities
| Capability | Seedance (Jimeng) | Gemini (Veo 3) | Kling | Sora 2 |
|---|---|---|---|---|
| Text-to-Image | Yes | Yes | No | No |
| Image-to-Image | Yes | Yes | No | No |
| Text-to-Video | Yes | Yes | Yes | Yes |
| Image-to-Video | Yes | Yes | Yes | Yes |
| Synced Audio | No | No | Yes | Yes |
| Transition Video | Yes | No | No | No |
| Multi-shot Narrative | No | No | Yes | No |
| Sound Generation | No | No | Yes | No |
| Omni-Video (refs/editing) | No | No | Yes | No |
Authentication
All API requests require Cloudflare Access headers. Run the setup script once per machine:
bash skills/creative-engine/scripts/creative-api-setup.shThis saves credentials to ~/.config/creative-api/. All subsequent API calls read from there automatically.
Seedance
Seedance 2.0 (Jimeng) — Image and video generation via the jimeng engine. Best for Chinese-style, anime, and creative video generation with free credits.
Capabilities
Text-to-image, image-to-image, text-to-video, image-to-video, and transition video (unique to Seedance — morphs between a first and last image).
Authentication
QR code authentication via the jimeng mobile app. The API generates a QR code, the user scans it, and the system polls for confirmation.
Job Handling
Async generation. By default, jobs are submitted to the job-runner Worker for automatic polling, R2 upload, and Slack notification. Manual polling is available as an opt-in for immediate results.
Kling
Kling — Video generation with multi-shot narrative, sound generation, and omni-video editing.
Capabilities
Text-to-video, image-to-video, multi-shot narrative (create coherent multi-shot sequences), sound generation (add audio to video), and omni-video (reference-based editing and multi-shot composition).
Authentication
Access key + secret key from the Kling platform.
Prompt Enhancement
Uses the Subject + Action + Scene + Camera + Style five-element formula. A dedicated prompt guide provides scene templates and Kling-specific camera motion options.
Job Handling
Async generation (2-10 minutes). Default: job-runner with Slack notification. Manual polling available via GET /generations/kling/:task_id.
Sora
OpenAI Sora 2 — High-quality cinematic video generation with auto-generated synced audio.
Models
| Model | Resolution | Duration | Best For |
|---|---|---|---|
sora-2 | 1280x720 / 720x1280 | 4, 8, 12s | Fast iteration, drafts |
sora-2-pro | 1792x1024 / 1024x1792 | 4, 8, 12s | Production quality, cinematic footage |
Capabilities
Text-to-video and image-to-video with automatically generated synced audio. Sora handles natural language prompts well — describe scenes conversationally and include audio cues for specific sounds.
Authentication
No manual auth step needed. Uses OPENAI_API_KEY configured in the media-gateway environment.
Job Handling
Async generation. Typical times: sora-2 at 4s duration takes 30-60 seconds; sora-2-pro at 8s takes 90-120 seconds. Default: job-runner with Slack notification.
Gemini
Google Gemini via flow4api — Video generation (Veo 3) and image generation (Nano Banana family).
Models
Video — Veo 3:
| Model | Quality | Best For |
|---|---|---|
veo_3_1_t2v_fast_landscape | Fast, good | Iteration, drafts, landscape videos |
veo_3_1_t2v_landscape | High | Production, cinematic footage |
Image — Nano Banana:
| Model | Resolution | Best For |
|---|---|---|
| Nano Banana 2 (default) | Up to 4K | General use — fast, high quality, affordable |
| Nano Banana Pro | Up to 2K | Fine-grained control, complex compositions |
Capabilities
Text-to-image, image-to-image, text-to-video, image-to-video, and 4K image output. Nano Banana models leverage Gemini’s world knowledge — they understand text in images, create infographics, and accurately render specific subjects.
Authentication
Pre-configured — no user action required. Uses an internal flow4api service.
Response Model
Synchronous — results are returned directly in the API response. No polling needed. This makes Gemini the fastest provider for iteration.
Shared Workflow
Across all providers, the generation workflow follows the same pattern:
- Check API health and credentials
- Select provider (or let the user choose on first generation)
- Enhance prompts using the five-element formula
- Submit the generation request
- Handle async polling or receive sync response
- Download the result and upload to R2 storage with full metadata