Jogging TV, Sprinting YouTube

Oleg Sobchuk
6 min readMay 16, 2018

Recently I learned about Vox Borders, a travel show on YouTube, and immediately became a fan. I binge-watched all the episodes in one breath (fortunately, they were short), exhaled, and then asked myself: Why am I so attracted to it— almost addicted? This personal (and thus not too interesting) question lead me to a more general (and much more interesting) problem: Does television evolve, and if so, in what direction?

It doesn’t take much thinking to figure out that the coolness of Vox Borders doesn’t exclusively lie in its content. Yes, its topics are entertaining, but so are the topics of many other travel shows. Instead, it is cool because of the form, because of how this show is made. You’ll notice this yourself if you watch this 4-minute episode — one of the shortest.

So: not content, but form. That is, not the figure of the presenter, not the vending machines, not the chopsticks or robots, but shots, camera movement, and music. In most TV shows, all of those are carefully hidden and hard to notice: they are simply a scaffolding for telling a compelling story. That’s how it usually is.

Usually — but not in this case. Here, the form is not hidden, it is put at the forefront. Almost as if this episode wasn’t about Japan, but about film editing, and Japan was just an excuse: as good as any other raw material.

For a comparison, check out another video — also about Japanese vending machines. This is an excerpt from Departures, a Canadian travel TV show from 2008. Ten years ago, it used to be popular, even acclaimed, but for a contemporary viewer (me) it looks as entertaining as cave paintings. (I am not an archaeologist, by the way, so this isn’t a compliment.)

My personal taste whispers me that ten years that separate Departures and Vox Borders were not wasted: there’s been a clear progress on pretty much all the levels of video storytelling. For example, the overall plot schema. Instead of two random guys poking at unusual things in a foreign country, we have a narrative structured around an intriguing question: Why does Japan have so many vending machines? Such question-oriented narration is a proven technique of arousing viewers’ curiosity. But even more importantly, Vox Borders seem much more dynamic. There is many more things happening per minute of screen time. Can we somehow measure this “many more things” — quantitatively?

Luckily, there exists cinemetrics: a relatively new research project of a scientific study of cinema. Scholars of cinemetrics measure various aspects of movies and then use this data for explaining film production, aesthetics, or history. One of the key aspects measured (actually, the key aspect) is average shot length. Shots — the chunks of film divided by cuts — play pretty much the same role for a cinemetrics scholar as words for a linguist: they are the central units of analysis.

I used a simple tool for counting shots: to compare average shot length (ASL) in Vox Borders and Departures. Not unexpectedly, ASL in Vox Borders turned out to be much shorter: 2.1 seconds, versus 3.6 seconds in Departures. We can get a slightly better idea of the difference between the two videos if we look at full distributions of their ASL. Look, the Vox Borders episode includes sequences of 5–7 shots with the ASL of less than a second! “Many more things,” then, may mean “many more shots.”

Vox Borders

Nothing like this happens in the Departures episode. Most shots are longer than a second; besides, shorter shots here aren’t lumped together in sequences, like in Vox. They are evenly alternated with the longer shots.

Departures

Of course, this preliminary data is not enough to make any confident comparisons between these two shows in general, and even less so — to make any confident statements about the evolution of television (or YouTube-vision). Still, I think, these plots allow us to present a hypothesis: TV/YouTube shows are becoming more rapid in their editing techniques, using faster montage.

This hypothesis matches our knowledge of the history of editing in film. Cinemetrists have shown that, over the past century, ASL in movies was shrinking. Look at this graph from a recent study by James Cutting and Ayse Candan. Each point is ASL (or mean shot duration, as the authors called it) of a single movie.

Shot length decreased from ~16 seconds to ~4. Interestingly, we see very few films with the ASL of ~2 seconds or less. Just a few dots in the bottom right corner. And yet, this is the place occupied by the Vox Borders episode: the frontier of video editing, testing the speed limits of our perception.

I haven’t looked into it, but I expect that ASL is not the only parameter of Vox Borders that is exceptionally quick. We can probably find something similar in the amount of onscreen motion, that is “optical change created by moving objects, people, and shadows,” which Cutting et al. 2011 measured with their “visual activity index.” In fact, you might have noticed that there is a lot of motion in Vox Borders. The presenter is always walking, camera makes sharp (and sudden) turns… Some fish need to be in constant movement in order not to drown. Vox Borders reminds me these fish. And, like these fish, Vox certainly don’t want to drown in the ocean called YouTube.

Anyway, I will stop my — very preliminary — analysis here. Instead, let’s ask: How could we explain this quickness?

Here is my best guess. Vox Borders episodes have rapid editing because they can afford it. In other words: because these episodes are short. (The shortest one is only 1 min 39 secs long, the longest is 15 mins 52 secs.) My experience as a viewer tells me that, while watching short Vox videos is highly entertaining, it is also demanding. To get the most of this rapidly edited sequences, I have to put some effort in focusing attention: to actually keep noticing what is happening on all these half-a-second shots. But attention is limited. Being highly focused for a short time may be as exhausting as being somewhat less focused for a long time.

It resembles running: you can sprint for only several minutes, but you can jog for hours. I wouldn’t be surprised if attention worked similarly: we can be very focused for only a short time (Vox Borders), but we can be less focused for much longer (Departures). Sprinting-like YouTube versus jogging-like TV — this may be our distinction.

This hypothesis could explain why we observe “sprinting” shots in other brief genres: TV commercials or music videos. They too can afford it. Traditional television shows, however, can’t. They need to hold viewers’ attention for longer periods of time (up to 60 minutes), and thus can’t exhaust them with the sprinting shots.

Even if this explanation is correct (which we don’t know yet), many questions remain. For example, is this really a recent tendency? If yes, then when did it begin? And, most importantly, why? These questions deserve some research, and I may actually do it in the future.

In the meantime, check out other Vox Borders episodes. They may be worth your attention!

--

--

Oleg Sobchuk

I write about the evolution of art. Graphs, long-term trends, and speculative ideas are included. Max Planck Institute for the Science of Human History, Jena.