How our VAMP project will tackle the challenges of the voice/audio revolution

8 min readJul 26, 2018

The shift towards audio and voice-based digital user interfaces forces publishers to develop new tools, adapt workflows and ensure new ways of monetization. This week, Google DNI has approved to co-fund a €1.2 million open-source project with a collaborative approach we now will be working on until the end of 2020. It is called VAMP — pun intended: As AMP has become a standard for fast mobile news, the Voice and Audio Monetization Platform strives to standardize and innovate audio content, ads and technology through several modules.

  • Content Delivery: enabling editors to deliver customized audio content to different outlets, e.g. smart speakers, audio platforms, social media.
  • Content Discovery: allowing users to search and find audio content according to their interests and consumption needs, e.g. while commuting, during housework, for a quick update.
  • Monetization: establishing revenue streams through innovative ad formats and ad serving, e.g. automated, performance-based ad insertion, and new paid offerings.
  • Analytics: providing reliable metrics for content and ad performance, e.g. completion rates and drop-off points.

Our goal is to deliver and monetize journalistic audio content on a multitude of platforms including voice-based interfaces. And with its modular approach, VAMP will enable the inclusion of existing tools, cooperation with partners — ultimately stimulate innovation in the audio/voice ecosystem.

Main fields of work for VAMP: An open infrastructure that provides components for delivering and monetizing journalistic audio/voice content

Right now, the shift towards voice UI and audio content — reflected in the rise of podcasts and voice-operated personal assistants — is more or less overwhelming the publishing industry. The paradigm shift towards voice UI and the boom of audio content is far from being completed. Most of the digital news ecosystem right now is based on text, and classical CMS solutions are fit to publish text or to stream video, but not fit to simultaneously distribute audio content, on a classic and voice-based UI.

Beyond that, audio or voice content creators rely heavily on a small set of dominant platforms that do not necessarily share publishers’ business interests; and they only provide over-simplified tools that do not fully meet publishers’ requirements, e.g. advanced analytics or monetization opportunities. Global technology companies are creating voice-interfaces, and a multitude of startups and established players in the media market are experimenting with this technology and potentially relevant content — but none really addresses the need for a comprehensive framework that enables delivery, discovery, analysis and monetization of audio content. Even the best products in this field do not necessarily help in establishing a stable holistic ecosystem. This is true for audio content, but even more true for voice-UI offerings.

All in all, the ecosystem for digital audio content and voice interfaces is still in development, and VAMP aims to generally open this emerging market to publishers. We want to foster journalistic product innovation by working with existing platforms and other interested media outlets — and by tackling the challenges in these four main fields that require immediate action, as the following examples and first cases illustrate.

Modular framework for optimized monetization: Built on efficient production, delivery and analytics of audio and voice content

1. Content Delivery

  • Bridging the audio-text gap: For us, voice-engineering our primarily text-based content by adding standardized features for audio is a clear first case. Each article will be supplied with voice-optimized abstracts that can be machine-read. News briefings could be generated from homepages. Editors should be supported by linguistic content checks.
  • Rebundling audio: This is about structuring audio content for repackaging and reusing — automatically slicing up podcasts into sections that can be delivered according to the demands of different devices.
  • Distribution: For publishers, it is essential to control the distribution of content; for editors, this process needs to be easy. A custom database and player to distribute audio to different platforms facilitates the collection of data for our analytics module.

2. Content Discovery

First examples for making audio content searchable and easy to discover:

  • Metadata and tagging: Adding metadata and tags to each audio snippet (using speech-to-text technology) to make audio content searchable. This also builds the basis to deliver audio in topic-based dossiers according to user demands or queries.
  • Social media integration: Developing a simple workflow to clip soundbites and combine them with compelling visuals and subtitles to expand content reach.
First outline of content model and architecture: Creating agile journalistic audio by adding metadata and adjustable ads

3. Monetization

We are addressing two overlapping markets in different stages of maturity. On the one hand, we are further developing existing audio offerings. On the other hand, we are tackling a nascent voice-UI market where monetization is in its infancy. The advertising-based monetization of audio content and podcasts is currently focused on selling sponsorships. This approach relies heavily on the specific native-ad-like character of the advertisement, as moderators typically deliver the sponsor’s message themselves. The limits of this system are obvious, however, as it lacks possibilities to target ads or buy a certain quantifiable reach. Clients spend their ad money without being fully able to control whether goals are met. To address this, we suggest e.g.:

  • Consumption-driven ad sales and pay-per-hour ad offerings. Packages such as pay-per-hour — in which ad clients pay for real minutes listened to — are more attractive to advertisers and enable publishers to optimize their inventory. Reliability of audio ad delivery is a key factor in increasing its attractiveness for clients. Customers will buy a specific amount of listeners’ time; ads are dynamically rendered into streams based on this sales mechanism, standardizing the ad product.
  • Establishing new forms of audio ads/integrations. Ad clients would like to target their message more effectively to specifically desired audiences and have them delivered in a more standardized way. However, they do not want to lose the specific touch of podcast ad formats (classic radio ads would be no alternative). A versatile audio ad management combined with a tailored player and distribution technology is key for that.
  • Establishing new forms of voice ads/integrations. The same challenges exist with voice UI, but they are even harder to address due to the limited number of ad slots on current platforms. From a publisher’s perspective, these ad limitations diminish the appeal of this technology. We want to apply versatile ad management and player/distribution technology not only to audio content and ad offerings, but also to voice UI products.
  • Human voice ad solutions: Establishing native-speaker ad solutions for voice. As some platforms only allow ad inserts spoken by a human voice, a dynamic audio ad server can also solve voice UI monetization issues.

In the field of transactional and subscription-based monetization, we for example currently work with Audible on a paid podcast that could be used behind a paywall in an audio app and plan to integrate audio versions of articles in our paid content model. For the VAMP project, we want to take these approaches to the next level and see the following opportunities:

  • Establishing paid podcasts in a freemium audio/voice environment to be built in the VAMP context, either as a separate product for users or as part of existing subscription offerings. With our flexible paid content technology we could experiment with selling different types of podcasts (journalistic formats, service offerings, niche products for target groups etc.) and ascertain whether there is potential for paid offerings.
  • Packaging news content with audiobooks and other long formats. We could explore the potential of creating platforms for audio longformats or promoting and selling our editors’ books; selling audiobooks by integrating our bestseller lists; developing a freemium cross-promotion system for audiobooks etc.

4. Analytics

Developing an analytics and engagement model that enables deep understanding of listeners will be key for success:

  • Consolidated audio metrics: We will create a data hub that uses APIs and website scraping to collect and consolidate metrics.
  • Content performance analytics: The absence of tools such as Google Analytics or Chartbeat in the audio realm would be eased by basic stats tools that analyze listener curves, exit moments or top voice request keywords.

As funny as it may sound, the success of VAMP will also be measured by the degree it helps to establish KPIs for success — some examples of possible audio/voice metrics:

  • Journalistic/Delivery KPIs: amount/percentage of voice-enabled text-first content, minutes of newly produced and voice-enabled voice-first content, minutes of accessible voice-enabled content etc.
  • Consumption-driven KPIs: (growth of) listened minutes of our audio offerings, (growth of) number of voice UI interactions etc.
  • Technical KPIs: percentage of text-first content items, e.g. regular articles that are fully voice-enabled automatically, percentage of audio-first content items, e.g. podcasts that are managed and served through our open and automated system, etc.
  • Business-related KPIs: Of course, revenue and profit growth are the primary goals of all commercial enterprises. In the nascent market of voice-enabled audio content we face the following challenges with classic KPIs: First, there are no standard metrics established for the monetization of digital audio content on voice-interfaces. Second, market developments are highly unpredictable and revenue streams depend on the behavior of dominant tech players. For these reasons, defining monetization metrics is one focal point of VAMP itself, as described above. Additionally, we want to use the following proxies to measure commercial success of VAMP: growth of potential ad integration points, (growth of) automated advertising insertion, growth of paid content revenue.

All analytics data will be focused on the interests of the different target groups or stakeholders in the audio/voice ecosystem. They vary greatly, as the following sketch of different personas illustrates, and a more holistic view needs to be established in the years to come:

First outline of VAMP personas: User centric approach to content production, ad placement and delivery

Our basic goals — besides the ultimate goal of new monetization opportunities — are publisher-friendly standards and a push for innovation in the voice/audio content ecosystem. For us, collaboration is key to reach that goal. Amongst the first partners we talk to are Verlagsgruppe Random House and several news websites from different countries. VAMP wants to address a multitude of needs and approaches, which led us to this open setting. The international partner network for VAMP is being built right now, the project itself will start afterwards in a few months.

For the same reason — broadening VAMP’s impact on market standards — our technology will be open-source. Each finished building block will be published in a public repository and may be used by everyone and every company that also shares further developments. We want to spread effective and efficient standards for publishers. To work seamlessly with existing solutions of publishers, the modular backbone of our system will ideally be able to integrate a broad variety of external modules and tools, e.g. production and editing tools, content management systems, speech-to-text transcription and text-to-speech software. We will use modular technology inspired by modern frameworks such as ESC that host a multitude of differently structured types of content. Our technological framework will make use of adaptable infrastructures and modern programming languages. Thus, we are planning to use cloud technology for storage and analysis of information/data as well as technologies such as Firebase for real-time event tracking and communication. We aim to automate as much as possible and will spend a significant share of the production budget on tech.

The “deliverable product” Google DNI asked us for will ideally be a flexible solution for a new business opportunity; a solution that ideally will be developed by others alongside with us — and it will surely be work-in-progress in a dynamic environment.

— 26. Juli 2018, by Christina Elmer, Kerstin Fröhlich, Kurt Jansson, Charlotte Meyer-Hamme, Stefan Plöchinger, Matthias Streitz




DER SPIEGEL × Devblog. Wie wir unsere Produkte weiterentwickeln, was wir dabei lernen.