Post-Process Skills
Three post-processing sub-skills for manipulating generated media using ffmpeg. All scripts are Python-based, require ffmpeg/ffprobe in PATH, and follow a config-driven approach: the agent builds a JSON config from the user’s request, passes it to the script, and reports results.
Video Overlay
Overlay images and text on top of existing videos — logos, watermarks, stickers, subtitles, captions, taglines, and brand marks. Auto-calculates position, size, and font parameters based on video resolution.
Key Features
- Image Overlays — Logos, watermarks, stickers with configurable position, scale, opacity, and timing
- Text Overlays — Two style presets:
hook(big bold centered text for POV/impact) andsubtitle(bottom bar with semi-transparent background for captions) - Auto-Sizing — Position and scale are calculated relative to video resolution (e.g., logo defaults to 15% of video width)
- Batch Processing — Apply the same overlay config to multiple videos
- Reusable Templates — Save overlay configurations as named templates for future use
Defaults
| Overlay Type | Position | Scale | Opacity |
|---|---|---|---|
| Logo | top-right | 15% of width | 1.0 |
| Watermark | bottom-right | 15% of width | 0.3-0.5 |
| Hook text | centered | 7% of height | 1.0 |
| Subtitle | bottom | 4% of height | 1.0 (with background) |
Requirements
ffmpeg, ffprobe, Python 3.8+, Pillow >= 10.1.0
Video Assembler
Batch-assemble short videos (target ~15s) from raw footage. The user provides a fixed opening clip, a fixed closing clip, and a pool of available middle segments. The system calculates how much middle time is needed, proposes a segment strategy, then auto-generates all permutations as separate video files.
Key Features
- Permutation Engine — Given N available segments and k slots, generates P(N,k) unique video variations
- Duration Calculation — Automatically calculates middle segment durations to hit the target total duration, accounting for transition overlap
- Transition Support — Crossfade, dissolve, wipe, slide, or hard cut between segments (default: 0.3s crossfade)
- Manifest Output — Generates
manifest.jsontracing which segments appear in each output video - Dry Run — Preview permutation count and strategy with
--dry-runbefore generating
How It Works
Source Videos --> Define opening/closing + segment pool
--> Calculate middle duration needed
--> Propose strategy (segment count, durations per slot)
--> Generate P(N,k) permutations
--> Extract, normalize, concat --> N output MP4sPermutation Scale
| Available (N) | Use (k) | Output Videos |
|---|---|---|
| 3 | 2 | 6 |
| 5 | 3 | 60 |
| 6 | 4 | 360 |
| 8 | 3 | 336 |
The system warns and asks for confirmation when P(N,k) exceeds 200, or suggests setting a max_videos cap.
Requirements
ffmpeg, ffprobe, Python 3.8+
Audio Toolkit
Extract, trim, concatenate, mix, speed-change, and replace audio tracks. Also includes TTS (text-to-speech) voice generation with emotion and voice personality control. Operations can be chained into pipelines.
Operations
| Operation | Description |
|---|---|
extract | Pull the audio track out of a video file |
trim | Cut audio to a time range (start/end timecodes) |
concat | Join audio files end-to-end |
mix | Blend multiple tracks with per-track volume control and time-based automation |
speed | Change playback speed without affecting pitch (0.25x to 4.0x) |
replace | Swap the audio track in a video file |
tts | Generate speech from text using xAI Grok TTS |
TTS Voice Generation
The TTS engine supports five voice personalities and nine emotion modifiers. The agent infers the appropriate voice and emotion from the user’s natural language description — users never need to know technical voice IDs.
Voices: eve (energetic), ara (warm/friendly), rex (professional), sal (neutral/balanced), leo (authoritative)
Emotions: soft, loud, whisper, fast, slow, sing, cry, high pitch, low pitch (can be combined)
Inline tags for dynamic speech: [pause], [long-pause], [laugh], [sigh], [gasp], <whisper>, <soft>, <loud>
Pipeline Chaining
Operations chain sequentially — each step’s output becomes the next step’s input:
input.mp4 --> extract --> trim(2-8s) --> speed(1.5x) --> output.mp3tts("text") --> speed(1.5x) --> replace(video.mp4) --> output.mp4Requirements
ffmpeg, ffprobe, Python 3.8+