API Providers

The API layer provides unified access to four AI generation engines for images and videos. All providers are accessed through a single REST API at https://frai.paradream.info/api/v1/creative, authenticated via Cloudflare Access.

Provider Capabilities

Capability	Seedance (Jimeng)	Gemini (Veo 3)	Kling	Sora 2
Text-to-Image	Yes	Yes	No	No
Image-to-Image	Yes	Yes	No	No
Text-to-Video	Yes	Yes	Yes	Yes
Image-to-Video	Yes	Yes	Yes	Yes
Synced Audio	No	No	Yes	Yes
Transition Video	Yes	No	No	No
Multi-shot Narrative	No	No	Yes	No
Sound Generation	No	No	Yes	No
Omni-Video (refs/editing)	No	No	Yes	No

Authentication

All API requests require Cloudflare Access headers. Run the setup script once per machine:


bash skills/creative-engine/scripts/creative-api-setup.sh

This saves credentials to ~/.config/creative-api/. All subsequent API calls read from there automatically.

Seedance

Seedance 2.0 (Jimeng) — Image and video generation via the jimeng engine. Best for Chinese-style, anime, and creative video generation with free credits.

Capabilities

Text-to-image, image-to-image, text-to-video, image-to-video, and transition video (unique to Seedance — morphs between a first and last image).

Authentication

QR code authentication via the jimeng mobile app. The API generates a QR code, the user scans it, and the system polls for confirmation.

Job Handling

Async generation. By default, jobs are submitted to the job-runner Worker for automatic polling, R2 upload, and Slack notification. Manual polling is available as an opt-in for immediate results.

Kling

Kling — Video generation with multi-shot narrative, sound generation, and omni-video editing.

Capabilities

Text-to-video, image-to-video, multi-shot narrative (create coherent multi-shot sequences), sound generation (add audio to video), and omni-video (reference-based editing and multi-shot composition).

Authentication

Access key + secret key from the Kling platform.

Prompt Enhancement

Uses the Subject + Action + Scene + Camera + Style five-element formula. A dedicated prompt guide provides scene templates and Kling-specific camera motion options.

Job Handling

Async generation (2-10 minutes). Default: job-runner with Slack notification. Manual polling available via GET /generations/kling/:task_id.

Sora

OpenAI Sora 2 — High-quality cinematic video generation with auto-generated synced audio.

Models

Model	Resolution	Duration	Best For
`sora-2`	1280x720 / 720x1280	4, 8, 12s	Fast iteration, drafts
`sora-2-pro`	1792x1024 / 1024x1792	4, 8, 12s	Production quality, cinematic footage

Capabilities

Text-to-video and image-to-video with automatically generated synced audio. Sora handles natural language prompts well — describe scenes conversationally and include audio cues for specific sounds.

Authentication

No manual auth step needed. Uses OPENAI_API_KEY configured in the media-gateway environment.

Job Handling

Async generation. Typical times: sora-2 at 4s duration takes 30-60 seconds; sora-2-pro at 8s takes 90-120 seconds. Default: job-runner with Slack notification.

Gemini

Google Gemini via flow4api — Video generation (Veo 3) and image generation (Nano Banana family).

Models

Video — Veo 3:

Model	Quality	Best For
`veo_3_1_t2v_fast_landscape`	Fast, good	Iteration, drafts, landscape videos
`veo_3_1_t2v_landscape`	High	Production, cinematic footage

Image — Nano Banana:

Model	Resolution	Best For
Nano Banana 2 (default)	Up to 4K	General use — fast, high quality, affordable
Nano Banana Pro	Up to 2K	Fine-grained control, complex compositions

Capabilities

Text-to-image, image-to-image, text-to-video, image-to-video, and 4K image output. Nano Banana models leverage Gemini’s world knowledge — they understand text in images, create infographics, and accurately render specific subjects.

Authentication

Pre-configured — no user action required. Uses an internal flow4api service.

Response Model

Synchronous — results are returned directly in the API response. No polling needed. This makes Gemini the fastest provider for iteration.

Shared Workflow

Across all providers, the generation workflow follows the same pattern:

Check API health and credentials
Select provider (or let the user choose on first generation)
Enhance prompts using the five-element formula
Submit the generation request
Handle async polling or receive sync response
Download the result and upload to R2 storage with full metadata