Post-Process Skills

Three post-processing sub-skills for manipulating generated media using ffmpeg. All scripts are Python-based, require ffmpeg/ffprobe in PATH, and follow a config-driven approach: the agent builds a JSON config from the user’s request, passes it to the script, and reports results.

Video Overlay

Overlay images and text on top of existing videos — logos, watermarks, stickers, subtitles, captions, taglines, and brand marks. Auto-calculates position, size, and font parameters based on video resolution.

Key Features

Image Overlays — Logos, watermarks, stickers with configurable position, scale, opacity, and timing
Text Overlays — Two style presets: hook (big bold centered text for POV/impact) and subtitle (bottom bar with semi-transparent background for captions)
Auto-Sizing — Position and scale are calculated relative to video resolution (e.g., logo defaults to 15% of video width)
Batch Processing — Apply the same overlay config to multiple videos
Reusable Templates — Save overlay configurations as named templates for future use

Defaults

Overlay Type	Position	Scale	Opacity
Logo	top-right	15% of width	1.0
Watermark	bottom-right	15% of width	0.3-0.5
Hook text	centered	7% of height	1.0
Subtitle	bottom	4% of height	1.0 (with background)

Requirements

ffmpeg, ffprobe, Python 3.8+, Pillow >= 10.1.0

Video Assembler

Batch-assemble short videos (target ~15s) from raw footage. The user provides a fixed opening clip, a fixed closing clip, and a pool of available middle segments. The system calculates how much middle time is needed, proposes a segment strategy, then auto-generates all permutations as separate video files.

Key Features

Permutation Engine — Given N available segments and k slots, generates P(N,k) unique video variations
Duration Calculation — Automatically calculates middle segment durations to hit the target total duration, accounting for transition overlap
Transition Support — Crossfade, dissolve, wipe, slide, or hard cut between segments (default: 0.3s crossfade)
Manifest Output — Generates manifest.json tracing which segments appear in each output video
Dry Run — Preview permutation count and strategy with --dry-run before generating

How It Works


Source Videos --> Define opening/closing + segment pool
             --> Calculate middle duration needed
             --> Propose strategy (segment count, durations per slot)
             --> Generate P(N,k) permutations
             --> Extract, normalize, concat --> N output MP4s

Permutation Scale

Available (N)	Use (k)	Output Videos
3	2	6
5	3	60
6	4	360
8	3	336

The system warns and asks for confirmation when P(N,k) exceeds 200, or suggests setting a max_videos cap.

Requirements

ffmpeg, ffprobe, Python 3.8+

Audio Toolkit

Extract, trim, concatenate, mix, speed-change, and replace audio tracks. Also includes TTS (text-to-speech) voice generation with emotion and voice personality control. Operations can be chained into pipelines.

Operations

Operation	Description
`extract`	Pull the audio track out of a video file
`trim`	Cut audio to a time range (start/end timecodes)
`concat`	Join audio files end-to-end
`mix`	Blend multiple tracks with per-track volume control and time-based automation
`speed`	Change playback speed without affecting pitch (0.25x to 4.0x)
`replace`	Swap the audio track in a video file
`tts`	Generate speech from text using xAI Grok TTS

TTS Voice Generation

The TTS engine supports five voice personalities and nine emotion modifiers. The agent infers the appropriate voice and emotion from the user’s natural language description — users never need to know technical voice IDs.

Voices: eve (energetic), ara (warm/friendly), rex (professional), sal (neutral/balanced), leo (authoritative)

Emotions: soft, loud, whisper, fast, slow, sing, cry, high pitch, low pitch (can be combined)

Inline tags for dynamic speech: [pause], [long-pause], [laugh], [sigh], [gasp], <whisper>, <soft>, <loud>

Pipeline Chaining

Operations chain sequentially — each step’s output becomes the next step’s input:


input.mp4 --> extract --> trim(2-8s) --> speed(1.5x) --> output.mp3


tts("text") --> speed(1.5x) --> replace(video.mp4) --> output.mp4

Requirements

ffmpeg, ffprobe, Python 3.8+