Video caption making

This page was last edited on 27 December 2022, at 21:57.

Captions in video are used to assist listeners with what is being said, to aid the hearing impaired, to provide a translation in another language, or to add additional context or commentary about the subject at hand.

When making captions for videos, the biggest choice is whether the caption will become a part of the video itself, or whether the caption information will stay separate from the video information. There isn't a unified terminology in calling these two, but some call the first option the burned-in caption, while the second option is called the embedded caption.

Embedded captions

From a video project management perspective, the embedded caption is more flexible and easier to manage that the burned-in caption method. In an embedded caption, the caption text is stored separately from the video information. Time information is added to the caption text to tell the video player when to show each piece of caption.

In a computer environment, the caption text can be an actual separate text file that is separate from the video file. It could also be one file, with the file internally storing text and video information separately. DVD and post-DVD technology also support embedded captions - the disc reserves a portion of the disk to save information about the captions, in multiple languages/versions. Pre-DVD, in the days of the VHS tapes, there was no embedded caption support, and the captions were burned in to the video itself.

The biggest advantage of having embedded captions, when compared to burned-in captions, is its flexibility. Specifically:

  • When playing a video with embedded captions, technically, the video player program is displaying two separate images: the video itself, and a layer on top of that, the captions. Because of this, the caption "layer" can be freely adjusted. Font size and color, caption location, even time sync - all can be adjusted. In theory, the captions could be shown in a completely separate screen, such as participant smartphones.
    • Often times, it turns out that a caption size is enough for viewing on a computer screen by one person, but when projected into a large screen for a group, the caption size needs to be increased.
    • Caption size may need to be increased if viewers are mostly seniors with a weaker vision.
    • Some video screening environments only can have an obstructed view towards the bottom of the scree, with peoples' heads covering some portions of the screen, and making it harder to see the captions which are at the bottom by default. With embedded captions, it's possible to move the captions to the top of the screen.
  • Multiple captions (usually multiple languages) can be provided for one video, and let the viewer choose what language caption to use. To achieve the same outcome with burned-in captions, we must create an entire copy of the video for each language/caption, which uses up more storage space.
  • Even after the initial production, the captions can be edited at any time, by anyone with no specialized software required. All that is needed is a text editor. Because of this, there is less pressure of having "the perfect captions" at the end of the production cycle, because it can always be edited.
  • Because editing the captions is so easy, creating, proofing and revising captions during the production process is much simpler than when relying on dedicated video editing software like Adobe Premiere. When there is a separate video producer (who is not bilingual in the target language), and a caption maker, the caption making process usually goes around the video producer making the video -> translator typing each line with time codes and emailing it to the producer -> producer copying & pasting the captions with each time code -> translator revising the time codes wrongly pasted -> proofreader submitting typed feedback on each correction -> producer incorporating it all back in again. With embedded captions, the producer can be completely out of the caption making process.

There are some disadvantages as well.

Not every video player supports captions, or in the case of some video players, the setting is buried deep in the preferences page. So if you go somewhere new (for example an university campus) only with your video on an USB stick, there is no guarantee that the video with captions will work out of the box. Ironically, relying on a video hosted on an online platform like YouTube is generally not a good idea since you are relying on the quality of the internet connection on-site, but for video captions YouTube can be a good backup since it supports captions.

Depending on the video player, the video player may not be able to display captions in a clear manner, with big fonts, or have a lot of flexibility in displaying the captions. Gom Video Player is extremely flexible in the way it displays embedded captions.

In general, independent video producers are more at home with making burned-in video captions, because too often their client is not technology-savy enough to play the video properly with the embedded captions. So even though they produced a normal video, the captions do not load and the video maker is asked for follow-up, which is basically tech support. After many experiences like these, it is likely that the video makers have permanently switched to burned-in.

Burned-in captions

Burned-in captions means that in the video editing software, the captions remain, technically, "embedded" (meaning it's editable), but when the video editing software generates the final video, the letters become a part of the image in the video.

The main advantage of the burned-in caption is that it will be displayed in any setting. It can be convenient for the video producer to manage the caption since it's all managed in one place (the video editing program), especially if only a few lines of caption are needed throughout the video, as opposed to captioning the entire video. Also, as the captions are a part of the video, there are no limits to how creative one is in styling these captions. Korean and Japanese entertainment programs are notorious for making partial captions (they are not full captions capturing everything that is said in the program) and marking themselves a part of the humor, and style them colorfully, in big letters, and intermixed with graphics.

How to make embedded captions

In a computer setting, the embedded caption is a simple text file. Its content follow a simple format that shows the text along with timecodes for the text. Time codes show the time during which each caption will be shown.

The internals

You do not need to know about the internal structure of a caption file. You can use one of the many tools that create the caption format automatically when you provide it with time and text information. You can skip ahead to the next section.

Below is an example of the SRT (SubRip Text) caption format, which is used in YouTube:

0:01:27.000,0:01:29.000 (A)
so we want to make sure that (B)

0:01:29.000,0:01:31.000 (C)
we're registering as many people as possible today (D)

Line A, which is the timecode for Line B, indicates that Line B will be shown from the 1 minute 27 second mark until the 1 minute 29 second mark. Line C, which is the timecode for Line D, indicates that Line D will be shown from the 1 minute 29 second mark until the 1 minute 31 second mark. Below is an example of the SMI(SAMI) caption format, which is supported by Gom Player, for the same portion as shown above;

<SYNC Start=87000><P>so we want to make sure that (B)
<SYNC Start=89000><P>we're registering as many people as possible today (D)
<SYNC START=91000><P>&nbsp;

Here, the timecode is formatted differently. 87000 in B indicates that the caption will be shown at the 87,000 milisecond mark, which is the 87 second mark, which is 1 minute (60 seconds) and 27 seconds mark. The caption from line B will be overridden by Line D, which starts at the 1 minute and 29 second mark.

Unlike the SRT format, the SMI format does not specify the end point of each caption. So when we don't want to display anything on screen (since the last caption will stay on screen forever), we put a "blank" line, and bring over the "&nbsp;" code from the HTML, to override the previous caption with a blank line.

Embedded captions using Excel

For translating and captioning an entire video using high quality translation, I recommend using Excel for the captions, which makes the process of translation and proofreading much more streamlined, moreso even than using dedicated tools.

Here are the steps involved:

Document and Spreadsheet used in the above example

These steps assume that the original video audio is in English, and that we want to add Korean captions to it. Even someone who is not technology savy will be able to follow steps 1-5. Steps 6 and 7 may require a little bit more knowledge, and Step 8 may be a bit more demanding technology-wise.

  1. Transcribe the entire video in English. Keep the text in paragraph format, if possible, as it makes it easier to do the translation. If we plan on displaying English captions on the video as well, be sure to carefully transcribe it. Otherwise, if we only plan on displaying Korean, the English transcription's quality is not as important - it's only used for production purposes. Another use for the text in both English and the translated text is to include the entire text in the video description, or if posted on a website, add it below the video. This way, there is more meta-data that can be helpful for people researching related topics (since their search keywords may not include the title of the video, but they may turn up portions of the lines said in the video, which will be in the transcript).
  2. Translate the text.
  3. Proofread the text.
  4. Split the English text into caption-friendly bits of time. Usually 2-3 seconds is best, although 1 second intervals is okay if the text is short enough. Just listen to the whole video while going over the text and pressing enter to mark each section.
  5. Split the Korean text into the same number of lines as the English.
  6. Copy and paste the English text and Korean text into a spreadsheet that has pre-entered formulas that convert timecodes and text into caption formats.
  7. Time each line. All you have to enter is the seconds. The above linked spreadsheet automatically adds a minute each time it detects from going from a higher second mark (eg 55 seconds) to a new minute (eg 3 seconds), so an experienced person could time code the entire video in 1-2 passes, entering the seconds live.
  8. Copy and paste the SMI or SBV column into a new text file. If using Korean or non-Alphabetic script, be sure to save the text file with UTF-8 encoding. (Use a fully featured editor like Notepad++ for this purpose)
    1. In the case of SBV format, replace the textstring "linebreak" with "\n"

Most video players will recognize the caption file if it has the same file name as the video, with just different file extension. You are done! The caption is finished.

YouTube allows you to upload pre-made SBV files made in this way. YouTube even has a community translation program, where YouTube channel owners can opt-in to allow any viewer to volunteer their time to translate a video and submit the captions. For example, Crash Course is a major YouTube channel that thrives on dozens of languages voluntarily submitted by its viewers.

Embedded captions using dedicated tools

There are dedicated programs that allow you to enter time codes and texts into its program interface, and generate caption files as output. The above method is better, however, because especially when reviewing translation for complex sentence structures, language works better in context, which the line-by-line nature of these programs makes harder to follow, whereas in a Word document, a translator is in their "natural environment', so to speak.

YouTube and Facebook also have their own online interface through which it is possible to add captions.

How to make burned-in captions

Usually the video editing programs include some functionality to generate burned-in captions. Adobe Premiere allows the user to export the captions generated as either burned-in or embedded captions. It even allows users to import external embedded caption files, although it doesn't seem to support unicode in its imports.