Skip to Content
Creative EnginePost-Process

Post-Process Skills

Three post-processing sub-skills for manipulating generated media using ffmpeg. All scripts are Python-based, require ffmpeg/ffprobe in PATH, and follow a config-driven approach: the agent builds a JSON config from the user’s request, passes it to the script, and reports results.

Video Overlay

Overlay images and text on top of existing videos — logos, watermarks, stickers, subtitles, captions, taglines, and brand marks. Auto-calculates position, size, and font parameters based on video resolution.

Key Features

  • Image Overlays — Logos, watermarks, stickers with configurable position, scale, opacity, and timing
  • Text Overlays — Two style presets: hook (big bold centered text for POV/impact) and subtitle (bottom bar with semi-transparent background for captions)
  • Auto-Sizing — Position and scale are calculated relative to video resolution (e.g., logo defaults to 15% of video width)
  • Batch Processing — Apply the same overlay config to multiple videos
  • Reusable Templates — Save overlay configurations as named templates for future use

Defaults

Overlay TypePositionScaleOpacity
Logotop-right15% of width1.0
Watermarkbottom-right15% of width0.3-0.5
Hook textcentered7% of height1.0
Subtitlebottom4% of height1.0 (with background)

Requirements

ffmpeg, ffprobe, Python 3.8+, Pillow >= 10.1.0


Video Assembler

Batch-assemble short videos (target ~15s) from raw footage. The user provides a fixed opening clip, a fixed closing clip, and a pool of available middle segments. The system calculates how much middle time is needed, proposes a segment strategy, then auto-generates all permutations as separate video files.

Key Features

  • Permutation Engine — Given N available segments and k slots, generates P(N,k) unique video variations
  • Duration Calculation — Automatically calculates middle segment durations to hit the target total duration, accounting for transition overlap
  • Transition Support — Crossfade, dissolve, wipe, slide, or hard cut between segments (default: 0.3s crossfade)
  • Manifest Output — Generates manifest.json tracing which segments appear in each output video
  • Dry Run — Preview permutation count and strategy with --dry-run before generating

How It Works

Source Videos --> Define opening/closing + segment pool --> Calculate middle duration needed --> Propose strategy (segment count, durations per slot) --> Generate P(N,k) permutations --> Extract, normalize, concat --> N output MP4s

Permutation Scale

Available (N)Use (k)Output Videos
326
5360
64360
83336

The system warns and asks for confirmation when P(N,k) exceeds 200, or suggests setting a max_videos cap.

Requirements

ffmpeg, ffprobe, Python 3.8+


Audio Toolkit

Extract, trim, concatenate, mix, speed-change, and replace audio tracks. Also includes TTS (text-to-speech) voice generation with emotion and voice personality control. Operations can be chained into pipelines.

Operations

OperationDescription
extractPull the audio track out of a video file
trimCut audio to a time range (start/end timecodes)
concatJoin audio files end-to-end
mixBlend multiple tracks with per-track volume control and time-based automation
speedChange playback speed without affecting pitch (0.25x to 4.0x)
replaceSwap the audio track in a video file
ttsGenerate speech from text using xAI Grok TTS

TTS Voice Generation

The TTS engine supports five voice personalities and nine emotion modifiers. The agent infers the appropriate voice and emotion from the user’s natural language description — users never need to know technical voice IDs.

Voices: eve (energetic), ara (warm/friendly), rex (professional), sal (neutral/balanced), leo (authoritative)

Emotions: soft, loud, whisper, fast, slow, sing, cry, high pitch, low pitch (can be combined)

Inline tags for dynamic speech: [pause], [long-pause], [laugh], [sigh], [gasp], <whisper>, <soft>, <loud>

Pipeline Chaining

Operations chain sequentially — each step’s output becomes the next step’s input:

input.mp4 --> extract --> trim(2-8s) --> speed(1.5x) --> output.mp3
tts("text") --> speed(1.5x) --> replace(video.mp4) --> output.mp4

Requirements

ffmpeg, ffprobe, Python 3.8+