Caption Styles
Animated Caption Styles
Each effect transforms how your captions appear on screen. From subtle fades to cinematic glitch — pick the style that matches your brand and content type.
Karaoke Captions
Karaoke captions display the entire sentence or phrase on screen from the start, then highlight each word individually as it's spoken. Unlike Build or Pop categories where words appear and disappear, Karaoke keeps all text visible at all times — only the styling of the active word changes. This creates a reading experience similar to a teleprompter or lyric video, where viewers can read ahead, follow along at their own pace, or glance away and easily find their place again. The highlighting mechanism is driven by word-level timestamps from Whisper AI, ensuring frame-accurate sync between the spoken audio and the visual highlight. VideoCaptions.AI offers four highlight sub-styles: Scale (the active word grows slightly), Background (a colored block appears behind the active word), Bounce (the active word bounces vertically), and ColorChange (the active word switches to your chosen highlight color). Each sub-style can be combined with a custom highlight color for full creative control.
Learn moreTypewriter Effect
The Typewriter effect is a character-level reveal animation where each letter of a word appears sequentially from left to right, as if being typed on screen in real time. Unlike effects that animate the entire word as a single unit (like FadeIn or ScaleUp), Typewriter breaks the word into individual characters and reveals them over the effect's duration. This creates a distinctive 'terminal' or 'being written' aesthetic that feels deliberate and measured. The effect works particularly well for storytelling, narration, and educational content where the pacing should feel thoughtful rather than energetic. When used with the Build category, words type themselves onto screen one by one as the speaker says them, accumulating into complete sentences. The result is a reading experience that matches the natural pace of speech — each word types itself out just as the speaker begins to say it, creating a deeply satisfying audio-visual sync.
Learn moreFlash Captions
Flash is a caption category (not an animation effect) that changes how words are displayed within a page. Instead of appearing one by one like Build or Pop, Flash shows all words in the page simultaneously. The entire phrase materializes on screen as a complete unit, stays visible for the page's duration, then disappears when the next page begins. This creates a 'statement card' aesthetic — each page is a self-contained message that hits the viewer all at once. Flash is the simplest category visually, but it's arguably the most impactful for the right content: bold declarations, key quotes, call-to-actions, and any moment where you want the viewer to absorb an entire phrase instantly. When combined with the ScaleUp effect, Flash creates the iconic MrBeast-style caption that has become the visual language of high-energy YouTube and TikTok content. With no entrance effect (effectName: none), Flash simply appears — clean, instant, no-nonsense.
Learn moreMrBeast Style Captions
MrBeast-style captions are the most recognizable caption aesthetic on the internet. Characterized by large, bold sans-serif text (typically Bangers, Impact, or similar display fonts) that springs onto screen with a scale-up animation, these captions have become the visual shorthand for 'high-production YouTube content.' The style uses the Flash category (all words appear simultaneously) combined with the ScaleUp effect (words spring from 80% to 100% size with a physics-based spring animation). Text is typically white with a heavy dark stroke (4-6px) for readability over any background, centered on screen, and kept to 2-4 words per page for maximum punch. What makes this style so effective is its physicality — the spring animation gives the text a sense of weight and momentum, as if the words are being stamped onto the screen. This creates a visceral impact that flat, unanimated text simply cannot match. Every major YouTube creator from MrBeast to MKBHD to Ali Abdaal uses some variant of this style.
Learn moreGlitch Effect
The Glitch effect creates a digital distortion animation where the word's text splits into separate RGB color channels (red, green, blue) that offset from each other horizontally, combined with a random positional shake that simulates a malfunctioning display. The effect progresses from maximum distortion to clean text, creating an entrance animation that looks like the word is 'glitching into existence' from digital noise. This aesthetic draws from cyberpunk visual culture, VHS corruption, and CRT monitor artifacts — a style that resonates strongly with tech, gaming, and sci-fi communities on social media. The Glitch effect uses seeded randomness to ensure that the shake pattern is deterministic — the same frame always produces the same visual, which is critical for Remotion's frame-based rendering model. The result is chaotic-looking but perfectly reproducible.
Learn moreNeon Pulse Effect
Neon Pulse is a two-phase animation: first, the text fades in from transparent to fully visible; then, a glow effect behind the text pulses rhythmically like a neon sign breathing in the dark. Unlike most effects that play once and stop, Neon Pulse is a looping animation — the glow continues to pulse for as long as the word is on screen. This creates a living, atmospheric quality where the captions feel like luminous objects rather than flat text overlays. The effect pairs naturally with dark video backgrounds, nightlife footage, music content, and anything with a retro-futuristic or synthwave aesthetic. The pulsing glow uses a sinusoidal oscillation that smoothly increases and decreases the glow's blur radius and opacity, creating the characteristic 'breathing' pattern that real neon signs exhibit as their gas tubes fluctuate in brightness.
Learn moreBounce Effect
The Bounce effect animates each word entering from below its final position, overshooting upward, then settling into place with a spring-based physics simulation. The motion follows a damped spring curve: the word accelerates upward, passes its target position (overshoots), bounces back down past it, and oscillates with decreasing amplitude until it settles. This creates a physically believable 'bouncy' entrance that feels weighty and satisfying. Unlike simple ease-in transitions, the spring physics give the text a sense of mass — heavier-looking fonts appear to bounce with more momentum, while lighter fonts feel quick and snappy. The Bounce effect works particularly well with the Pop category (one word at a time) for TikTok-style content, creating a rapid-fire sequence where each word bounces into view with infectious energy.
Learn moreFade In Effect
Fade In is the most understated and versatile effect in VideoCaptions.AI's library. It animates text from fully transparent to fully opaque, with a subtle upward Y-axis translation that gives the entrance a sense of direction without being dramatic. The spring-based easing ensures the motion feels natural rather than mechanical — the text doesn't just linearly appear, it gently decelerates into its final position with a soft settle. This simplicity is its greatest strength: Fade In is appropriate for literally any platform, any content type, and any audience. It never feels out of place, never distracts from the message, and never clashes with the video's mood. Professional contexts where Bounce or Glitch would be inappropriate still welcome Fade In. The effect serves the content rather than competing with it, making it the default choice for creators who want polished captions that don't draw attention to themselves.
Learn moreScale Up Effect
Scale Up animates text from approximately 80% of its final size to 100%, using a spring physics simulation that creates a characteristic overshoot-and-settle motion. The word appears small, rapidly grows past its target size (overshoots to roughly 105-110%), then bounces back and settles at 100%. This spring behavior gives the text a sense of weight and physicality — it feels like the word is being stamped onto the screen with force. Scale Up is the primary building block of the MrBeast caption aesthetic and is one of the most popular effects across YouTube, TikTok, and Instagram. It occupies the sweet spot between 'subtle' (Fade In) and 'dramatic' (Bounce, Glitch): it's attention-grabbing enough to emphasize key moments but controlled enough to use consistently without causing fatigue.
Learn moreMask Slide Effect
Mask Slide uses an overflow:hidden container as a clipping mask, with the text inside sliding upward from below the container's bottom edge to its final position. The result is text that appears to be revealed by an invisible window sliding open — only the portion of the text that has entered the visible window is shown, creating a clean, geometric reveal with hard edges. This technique is borrowed from professional motion graphics and broadcast design, where mask reveals have been a staple of title sequences and lower thirds for decades. The effect feels distinctly more 'designed' than Fade In or Scale Up because the hard-edged reveal creates geometric precision that organic effects lack. Mask Slide works exceptionally well for clean, modern aesthetics — think Apple keynote text animations, news broadcast lower thirds, or editorial video content.
Learn moreFlip Up Effect
Flip Up uses CSS 3D transforms to rotate text around its horizontal axis (rotateX), starting face-down (rotated -90 degrees) and rotating to upright (0 degrees). Combined with a perspective value that creates realistic depth foreshortening, the text appears to 'flip' up from a surface below the screen, like a departures board or a physical sign rotating into view. The 3D perspective adds a visual depth dimension that purely 2D effects (Fade In, Scale Up, Mask Slide) cannot achieve. This makes Flip Up feel cinematic and premium — it suggests three-dimensional space in what is otherwise a flat text overlay. The effect is subtle enough for regular use but distinctive enough to add real production value, occupying a unique middle ground between the simplicity of Fade In and the drama of Bounce or Glitch.
Learn moreWave Effect
The Wave effect applies a per-character vertical oscillation following a sinusoidal wave function. Each character in the word moves up and down independently, with the phase offset creating a wave that travels through the text from left to right. Like Neon Pulse, Wave is a looping effect — the oscillation continues for as long as the word is on screen, creating perpetual motion that draws and holds the eye. The wave amplitude, frequency, and phase offset per character are calculated to create a natural, fluid ripple effect. The visual result resembles text floating on water, characters dancing, or letters bouncing on an invisible trampoline. Wave is one of the most visually distinctive effects in the library because it operates at the individual character level rather than the word level, and because it never stops moving.
Learn moreScramble Effect
The Scramble effect (also called Decode) displays the word as a string of random characters that progressively 'decode' into the correct letters from left to right. At the start of the effect, every character is randomized — letters, numbers, and symbols cycle rapidly. As the effect progresses, characters lock into their correct values from left to right: the first character resolves first, then the second, and so on until the full word is readable. Characters that haven't resolved yet continue to cycle through random glyphs, creating the appearance of a cipher being cracked or a password being decoded. This aesthetic is deeply rooted in hacker/cyberpunk culture — think The Matrix's falling code, password-cracking scenes in movies, or terminal-based text adventures. Scramble pairs naturally with tech, coding, AI, cybersecurity, and gaming content where the 'digital decode' metaphor resonates with the audience.
Learn moreSlide Left Effect
Slide Left uses the same clipping mask technique as Mask Slide, but reveals text horizontally rather than vertically. A hidden container clips the text, and the reveal progresses from left to right (or right to left, following the name's directional cue). The result is a horizontal wipe that uncovers the text character by character from one side — like a curtain being drawn to reveal a sign. This horizontal directionality creates a different visual rhythm than vertical reveals: it follows the natural reading direction of left-to-right languages, making the reveal feel like the text is being 'written' or 'uncovered' in reading order. Slide Left shares Mask Slide's geometric precision and professional quality, but the horizontal axis gives it a different energy — more flowing and directional, less rising and appearing.
Learn moreFlip Card Effect
Flip Card applies a 3D rotation around the Y-axis (rotateY), starting at 180 degrees (text facing away from the viewer, invisible) and rotating to 0 degrees (facing the viewer, fully visible). With CSS perspective applied, this creates the convincing illusion of a card being flipped toward the viewer. The rotation passes through a 90-degree midpoint where the text is edge-on and invisible, then continues to face the viewer. This midpoint creates a natural 'reveal' moment — there's a split second where the word is hidden, followed by it swinging into view, which adds dramatic timing that other effects don't provide. Flip Card is the most cinematic entrance animation in the library. Its Y-axis rotation suggests that the text existed before you saw it and was simply turned to face you — a concept that adds narrative depth to what is otherwise a text overlay. This makes it ideal for reveal moments, surprise statistics, and before/after transitions.
Learn more