Recently I was working on an algorithm to detect untagged commercials aired on different TV channels.
The purpose was simple. Once a commercial is detected, it can be tagged for later recognition.
The task was quite challenging since the audio stream of a commercial, does not have any noticeable difference in the spectrogram or any other audio property which allows finding them automatically.
So I decided to apply a simple approach. Fingerprint the entire content that is running on TV channel A during 24h, and then query it against the content from TV channel B. The goal was simple, detecting content that’s repeating. I assumed that the repeating content would contain lots of commercials.
This assumption paid off. I started to “catch” lots of commercials aired on different TV channels.
Let’s start with plotting a density plot of the repetitions. It was generated by fingerprinting and comparing 24h period of CNN vs CNBC streams. On X-axis you see the length of repeating chunks. On Y-axis, how often they occur.
By looking at the density plot, we quickly spot more repetitions 5, 15, 30, 60 seconds long. It makes total sense, as commercials are generally of “discrete” length that divides 60 evenly.
From the graph, we can assert that most frequently we get repetitions of 15 and 30 seconds. After checking those time frames, my assumption proved to be correct. All of them were commercials.
Interestingly enough there are no repeating regions of 40 and 50 seconds.
Next, let’s plot the same graph for CNN vs FOX across same day.
The plot is slightly different as we see more dense regions of non-discrete repetitions specifically in the 6, 7, 8 seconds regions. That’s because CNN and FOX share a lot more news content compared to CNN and CNBC. These are merely news footages of the same day event.
By the way, after running repeating content detection on four different TV channels, the longest repetition was a 4 minute long Donald Trump interview about China and trade war.
To analyze just the commercials, I kept only the 5, 10, 15, 20, 30, 45, 60 seconds long repetitions. Next, to catch more ads, I’ve fingerprinted FOX, CNBC, MSNBC vs CNN data.
Querying CNN data against three other major TV channels, provided enough repeating regions, that helped finding peak hours and other interesting data.
Just a note: the described approach identified only commercials that are running across different TV channels. Channel specific commercials go undetected.
Below is the heatmap of repeating commercials on (FOX + CNBC + MSNBC) vs CNN , during a 24h period.
Lets plot also the heatmap of (CNN + CNBC + MSNBC) vs FOX. See if we can spot any similarities.
Both FOX and CNN do not prefer putting commercials in the first 10 minutes of the hour. These slots receive the minimum number of repetitions.
Interestingly enough, both FOX and CNN have a decreased amount of cross-commercials during 8pm-10pm time frame.
Now let’s see at what hour of the day, we get more ads aired. Below is a plot of how many repeating seconds occur at different hours of the day on CNN, FOX, and MSNBC. All hours are ET.
This graph doesn’t reveal much information, aside from the fact that different TV channels prefer almost the same commercial time across different hours of the day.
Having this info we can now take a look on how many hours of commercials are aired during the day.
|Station||Length in seconds for 24h period|
One more interesting insight is what is the preferred length of the commercial, during different times of the day. Let’s plot a scatterplot of FOX commercial lengths during different hours of the day
There is definitely more preference towards shorter ads in the 8pm-10pm time slot. Also a 45 seconds commercial is a rare breed.
Final plot will be a wordmap of words used during repetitions detected across CNN and CNBC.
If you watch this word cloud long enough, you can start hearing a typical TV commercial in the background.