Skip to Content
Creative EngineAPI Providers

API Providers

The API layer provides unified access to four AI generation engines for images and videos. All providers are accessed through a single REST API at https://frai.paradream.info/api/v1/creative, authenticated via Cloudflare Access.

Provider Capabilities

CapabilitySeedance (Jimeng)Gemini (Veo 3)KlingSora 2
Text-to-ImageYesYesNoNo
Image-to-ImageYesYesNoNo
Text-to-VideoYesYesYesYes
Image-to-VideoYesYesYesYes
Synced AudioNoNoYesYes
Transition VideoYesNoNoNo
Multi-shot NarrativeNoNoYesNo
Sound GenerationNoNoYesNo
Omni-Video (refs/editing)NoNoYesNo

Authentication

All API requests require Cloudflare Access headers. Run the setup script once per machine:

bash skills/creative-engine/scripts/creative-api-setup.sh

This saves credentials to ~/.config/creative-api/. All subsequent API calls read from there automatically.


Seedance

Seedance 2.0 (Jimeng) — Image and video generation via the jimeng engine. Best for Chinese-style, anime, and creative video generation with free credits.

Capabilities

Text-to-image, image-to-image, text-to-video, image-to-video, and transition video (unique to Seedance — morphs between a first and last image).

Authentication

QR code authentication via the jimeng mobile app. The API generates a QR code, the user scans it, and the system polls for confirmation.

Job Handling

Async generation. By default, jobs are submitted to the job-runner Worker for automatic polling, R2 upload, and Slack notification. Manual polling is available as an opt-in for immediate results.


Kling

Kling — Video generation with multi-shot narrative, sound generation, and omni-video editing.

Capabilities

Text-to-video, image-to-video, multi-shot narrative (create coherent multi-shot sequences), sound generation (add audio to video), and omni-video (reference-based editing and multi-shot composition).

Authentication

Access key + secret key from the Kling platform.

Prompt Enhancement

Uses the Subject + Action + Scene + Camera + Style five-element formula. A dedicated prompt guide provides scene templates and Kling-specific camera motion options.

Job Handling

Async generation (2-10 minutes). Default: job-runner with Slack notification. Manual polling available via GET /generations/kling/:task_id.


Sora

OpenAI Sora 2 — High-quality cinematic video generation with auto-generated synced audio.

Models

ModelResolutionDurationBest For
sora-21280x720 / 720x12804, 8, 12sFast iteration, drafts
sora-2-pro1792x1024 / 1024x17924, 8, 12sProduction quality, cinematic footage

Capabilities

Text-to-video and image-to-video with automatically generated synced audio. Sora handles natural language prompts well — describe scenes conversationally and include audio cues for specific sounds.

Authentication

No manual auth step needed. Uses OPENAI_API_KEY configured in the media-gateway environment.

Job Handling

Async generation. Typical times: sora-2 at 4s duration takes 30-60 seconds; sora-2-pro at 8s takes 90-120 seconds. Default: job-runner with Slack notification.


Gemini

Google Gemini via flow4api — Video generation (Veo 3) and image generation (Nano Banana family).

Models

Video — Veo 3:

ModelQualityBest For
veo_3_1_t2v_fast_landscapeFast, goodIteration, drafts, landscape videos
veo_3_1_t2v_landscapeHighProduction, cinematic footage

Image — Nano Banana:

ModelResolutionBest For
Nano Banana 2 (default)Up to 4KGeneral use — fast, high quality, affordable
Nano Banana ProUp to 2KFine-grained control, complex compositions

Capabilities

Text-to-image, image-to-image, text-to-video, image-to-video, and 4K image output. Nano Banana models leverage Gemini’s world knowledge — they understand text in images, create infographics, and accurately render specific subjects.

Authentication

Pre-configured — no user action required. Uses an internal flow4api service.

Response Model

Synchronous — results are returned directly in the API response. No polling needed. This makes Gemini the fastest provider for iteration.

Shared Workflow

Across all providers, the generation workflow follows the same pattern:

  1. Check API health and credentials
  2. Select provider (or let the user choose on first generation)
  3. Enhance prompts using the five-element formula
  4. Submit the generation request
  5. Handle async polling or receive sync response
  6. Download the result and upload to R2 storage with full metadata