Transcript output guide: SRT, VTT & TXT export formats + live caption sync
This guide explains each format's structure, capabilities, and limitations, then shows you when to use SRT for maximum compatibility, WebVTT for web-based projects, or TXT for content repurposing and accessibility compliance.



When you export speech-to-text transcripts, you need to choose the right subtitle format for your specific use case. The three most common formats—SRT, WebVTT (VTT), and TXT—each serve different purposes and offer distinct capabilities. SRT provides universal compatibility across all video platforms, WebVTT delivers advanced web-based styling and positioning features, while TXT format creates searchable transcripts without timing information.
Understanding these format differences directly impacts your video accessibility, SEO performance, and viewer experience. The wrong format choice can result in broken captions, poor readability, or missing features that your platform supports. This guide explains each format's structure, capabilities, and limitations, then shows you when to use SRT for maximum compatibility, WebVTT for web-based projects, or TXT for content repurposing and accessibility compliance.
What is SRT?
SRT is a simple text file that stores subtitles with timestamps. This means you get a plain text document that tells video players exactly when to show each line of text on screen. SRT files work with almost every video platform and player because they use the most basic format possible—just text, numbers, and timing codes.
The format gets its name from SubRip, the DVD ripping software that first created this structure. Today, SRT has become the universal standard because it works everywhere from YouTube to professional video editing software.
SRT file structure
Every SRT file follows the same four-part pattern that repeats for each subtitle:
1
00:00:05,000 --> 00:00:08,000
This is the first subtitle line.
2
00:00:08,500 --> 00:00:12,000
And this is the second subtitle
that spans two lines.
Here's what each part does:
- Sequential number: Each subtitle gets a number (1, 2, 3...) to show the order
- Timestamps: Start and end times in hours:minutes:seconds,milliseconds format
- Text content: The actual words that appear on screen, usually 1-2 lines
- Blank line: An empty line that separates each subtitle block
You must save SRT files with UTF-8 encoding and the .srt extension. UTF-8 handles international characters like accents or symbols, making your subtitles work across different languages.
SRT capabilities and limitations
SRT handles the basics perfectly but can't do advanced formatting. Here's what you can and can't do:
What SRT supports:
- Plain text display
- Basic HTML tags like
<b>,<i>, and<u> - Speaker labels using brackets like [John]
- Line breaks for two-line subtitles
What SRT doesn't support:
- Text colors or custom fonts
- Positioning subtitles anywhere except bottom-center
- Animation effects or transitions
- Metadata like language information
Think of SRT as the reliable workhorse—it does exactly what you need without complications.
What is WebVTT?
WebVTT is a subtitle format designed specifically for web browsers and HTML5 video. This means it includes all of SRT's basic features plus advanced styling and positioning options that work natively in web browsers. WebVTT stands for Web Video Text Tracks and was created by the W3C (the organization that sets web standards).
The key advantage is native browser support—you don't need plugins or special software to display WebVTT captions in web video players.
WebVTT structure and syntax
WebVTT files look similar to SRT but have some important differences:
WEBVTT
00:00:05.000 --> 00:00:08.000
This is a WebVTT subtitle.
00:00:08.500 --> 00:00:12.000 position:10% align:left
<v Speaker1>This subtitle appears on the left side.
00:00:12.500 --> 00:00:16.000
<b>Bold text</b> and <i>italic text</i> are supported.The main structural differences:
- Required header: Every WebVTT file must start with "WEBVTT" on the first line
- Period for milliseconds: Uses 05.000 instead of 05,000 like SRT
- Inline positioning: You can add position and alignment settings directly in the timestamp line
- Voice tags: Use
<v Speaker>to identify who's talking
WebVTT styling and advanced features
WebVTT's real power comes from its styling capabilities that make captions look professional and accessible.
Text styling options:
- Different colors using CSS-style tags
- Font variations with standard HTML tags
- Ruby text for languages that need pronunciation guides
Positioning controls:
position:50%moves text horizontally across the screenalign:centercontrols text alignment (left, center, right)line:90%sets vertical position from the topsize:80%adjusts the width of the caption area
Speaker identification:
<v John>I think we should review the proposal.
<v Sarah>That's a great idea. When can we meet?These features make WebVTT perfect when you want captions that enhance rather than distract from your video content.
The <v Speaker> tag is a valid WebVTT feature, but AssemblyAI does not automatically include it in VTT exports. Developers must enable speaker labeling, use the transcript results, and programmatically construct a custom VTT file if they want speaker-tagged captions.
TXT format for untimed transcripts
TXT format gives you plain text without any timing information. This means you get just the spoken words in a simple text file—no timestamps, no formatting, just the content. While this isn't technically a subtitle format, TXT exports serve important purposes that timed formats can't handle.
- Accessibility compliance: Provide text alternatives for screen readers
- SEO optimization: Search engines can read and index text content from your videos
- Content repurposing: Turn video content into blog posts or social media content
- Accessibility compliance: Provide text alternatives for screen readers
- Translation workflows: Translators work faster with plain text than timed formats
- Documentation: Create searchable records of meetings or interviews
The main advantage is simplicity—anyone can open, read, edit, and copy TXT files without special software.
SRT vs VTT vs TXT comparison
Each format excels in different scenarios. Your choice depends on where you're publishing and what features you need.
When to choose SRT: You need subtitles that work everywhere without complications. SRT is your safe choice for maximum compatibility.
When to choose WebVTT: You're publishing to web platforms and want professional styling, positioning, or live streaming capabilities.
When to choose TXT: You need searchable content, accessibility documentation, or material for content repurposing.
Platform-specific requirements
Different platforms have specific preferences that affect how your subtitles display:
YouTube: Prefers WebVTT but accepts SRT. Offers auto-timing features with VTT files.
Facebook: Only accepts SRT and strips all formatting tags, so keep it simple.
Vimeo: Supports both SRT and WebVTT with full styling capabilities.
LinkedIn: Accepts SRT but limits lines to 80 characters maximum.
Zoom: Uses WebVTT for webinar recordings and replay features.
Understanding these requirements prevents formatting issues and ensures your captions display correctly.
Live caption sync for streaming
Live streaming needs real-time caption generation that appears within 1-3 seconds of spoken words. Unlike pre-recorded video where you can perfect timing afterward, live captions must process and display instantly while maintaining accuracy.
WebVTT works best for live streaming because it supports progressive delivery—captions can be sent in individual chunks as they're created rather than waiting for complete files.
Key requirements for live captions:
- Sub-second speech processing to maintain natural flow
- Buffer management that balances speed with accuracy
- Error recovery for network interruptions or delays
- Format compatibility with streaming protocols
Modern streaming platforms use specialized APIs that convert speech to text in real-time. AssemblyAI's real-time streaming API returns JSON messages, and developers must transform those responses—such as `Turn` events—into VTT or SRT caption chunks for live caption workflows.
This enables features like live accessibility for webinars, real-time captions for virtual events, and instant subtitles for social media live streams.
Subtitle export best practices
Following proven practices ensures your subtitles are readable and professional regardless of format.
Technical requirements:
- Use UTF-8 encoding for international character support
- Save files with correct extensions (.srt, .vtt, .txt)
- Test files with your target video players before publishing
Timing and readability:
- Keep subtitles between 1-5 seconds duration for comfortable reading
- Leave 0.5-1 second gaps between subtitles to prevent fatigue
- Limit lines to 32-40 characters for optimal reading speed
- Use maximum 2 lines per subtitle to avoid covering video content
Content formatting:
- Break lines at natural speech pauses or grammatical boundaries
- Maintain consistent speaker identification throughout files
- Include speaker labels for off-screen voices or multiple speakers
File management:
- Use descriptive naming like
video-title_english_v1.srt - Keep original and edited versions in separate folders
- Maintain version control for multiple edits or translations
These practices ensure your subtitles enhance rather than distract from the viewing experience.
Final words
Subtitle formats serve different needs in the modern video ecosystem. SRT provides universal compatibility when you need captions that work everywhere. WebVTT offers advanced features for web-based video with styling and positioning capabilities. TXT transcripts handle non-synchronized needs from SEO optimization to accessibility compliance.
Modern Voice AI platforms simplify the entire workflow from audio input to formatted subtitle output. AssemblyAI provides direct export endpoints for basic SRT and VTT subtitle files, but these exports do not include speaker labels. To create speaker-labeled subtitle files, developers must use the transcript JSON response—specifically the `utterances` array—and generate the subtitle file manually. The subtitle export endpoint supports limited formatting controls, such as `chars_per_caption`, which can indirectly affect subtitle segmentation. It does not offer direct controls for subtitle duration or gap timing.
Frequently asked questions
Should I use SRT or WebVTT for YouTube videos?
Use WebVTT for YouTube because it supports YouTube's advanced features like auto-timing and styling options. While YouTube accepts SRT files, WebVTT gives you better integration with YouTube's caption editor and allows for enhanced formatting that improves viewer experience.
Can I edit subtitle files in a regular text editor?
Yes, both SRT and WebVTT files are plain text formats you can edit in Notepad, TextEdit, or any text editor. Just make sure to save the file with the correct encoding (UTF-8) and file extension (.srt or .vtt) to ensure compatibility with video players.
Do subtitle files work across different video platforms?
SRT files work on virtually all video platforms and players, making them the safest choice for cross-platform use. WebVTT files work on most modern web platforms but may not be compatible with older software or non-web video players. TXT files aren't subtitle formats but can be imported into most caption editing software.
What happens if my subtitle timing is off?
Incorrect timing makes captions appear too early or too late relative to speech, creating a confusing viewing experience. You can fix timing issues by editing the timestamp values in the subtitle file or using video editing software with subtitle timing adjustment features. Most platforms also offer built-in caption editors for making timing corrections.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
