Anatomy of a Laugh Track

Introduction

A laugh track is a commonly heard part of sitcoms and other comedy shows found on television since the early 1950s. A laugh track can be defined as a prerecorded superposition of samples of people laughing mixed into a television show to "enhance" it. A single laugh track can be composed of a variety of male laughter and female laughter. Often individual, distinct laughs are added in at the end to make laugh tracks sound like the laughter of a live audience instead of simply something generated from prerecorded samples. All of this makes it very difficult to detect a laugh track based on relatively simple heuristics.

Frequency Analysis

Looking at the Discrete Fourier Transforms of a laugh track(Fig. 1) and a two minute sample of a television show(Fig. 2), we notice some characteristics of the laugh track such as the slightly larger spike in one of the mid-range frequencies, which is difficult to detect, there is not much difference in the spectrum in the two signals. The spikes we see are also not characteristic of every laugh track, so it would be difficult to create a method of detecting laugh tracks solely by looking in the frequency domain. Factoring in variability such as the dominance of male or female voices in the individual laugh track as well as varying lengths and intensities of a laugh, the problem becomes even more difficult when attempting detection using DFTs and a bandpass filtering scheme.

TV Show Sample
Laugh Track Sample

Time Analysis

It is much easier to look for a laugh track in the time domain because the envelope that is characteristic of a laugh track as can be seen in the figure below is much more prominent than the spikes that we see in the DFT of the two signals. This envelope follows the magnitude of a normal laugh track. After a joke, people abruptly start laughing and then the laughter slowly dies down. This is found to be the case in almost every single laugh track instance found in the TV shows that we looked at. Even if a laugh track is short, we can see the same envelope, just compressed more so than in the example. If a laugh track is quieter or louder, we will still be able to see the envelope, though this makes detection based on height thresholds more difficult as regular speech may look very similar to our envelope when a laugh track is shorter and quieter.

Laugh Track in the Time Domain
A laugh track in the time domain, with the characteristic envelope drawn in red on the original waveform.

Conclusion

Other approaches at detecting laugh tracks were also considered, but envelope detection in the time domain proved to be the most effective manner of detecting laugh tracks. Matched filtering and looking for distinctive characteristics in the frequency domain proved to be fruitless. The envelope in the figure above is quite distinctive and relatively easy to detect, even with a fairly simple algorithm, so this approach was used in our detection and removal scheme.