top of page
  • FIBEP

FIBEP 02/2024 Newsletter - CPO Spotlight 

Updated: Feb 28

Trevor Back, Chief Product Officer, Speechmatics 

 

  • What is your background and what is included in your current role at Speechmatics? 

 

I loved space as a kid, so I ended up doing a PhD in computational astrophysics. At the end of my PhD, I realized that about ten people in the world would care about the research I was doing, and it's probably never observable in the real world. So, I decided to go looking for a bit more real-world impact. 

 

Fortunately for me, I came across the founders of DeepMind back in 2012, and joined them when the company was about 15 people as their first Product Manager. I then spent the next nine years there looking at a variety of applications of AI, including image search for fashion and iOS games. I also spent a lot of time on healthcare and science applications, including on AlphaFold. 

 

I then left DeepMind to begin my own startup called Shift Lab. There, we were doing a variety of other applications of AI to different industries including manufacturing, e-commerce, and supply chain. I then joined Speechmatics in 2023, and it's been an exciting journey so far. Here, I am responsible for product strategy and execution, leading the team on product development , ensuring products meet market needs and align with our company goals. My role is collaborative, working closely with other departments such as engineering, marketing, sales, and customer success to ensure that products are successfully brought to market and meet customer expectations. 

 

  • What differentiates Speechmatics from other LLM /STT companies? 

A commitment to accuracy built on deep expertise and innovation – Speechmatics has pioneered several approaches to providing highly accurate speech-to-text APIs, including our self-supervised learning approach to Automatic Speech Recognition (ASR). Our core focus on ASR accuracy will not stop as we continue our mission to ‘Understand Every Voice’. We understand that any ‘downstream’ activity (i.e. anything that uses a transcript to do something else of value) will only be valuable if they are built on accurate foundations. 

Language and translation coverage – Speechmatics currently offers transcription in 50 languages (including local dialects and accents) and also offers fast, low-latency translation in 30+ languages. Both transcription and translation are offered with a single API call. This combination of broad coverage with high accuracy with a single API call is a potent combination. 

A broad and growing range of capabilities built on top of transcription – Speechmatics is rapidly adding capabilities on top of our transcription. In the last quarter of 2023, we added speaker and channel diarization, advanced punctuation, summarization (pulling out the most salient parts of a transcript and making them easier to read), and sentiment analysis (determining whether the transcribed audio was positive, neutral, or negative in emotion) and will continue to add new capabilities over time. 

Our real-time transcription – Speechmatics has built all our underlying code and algorithms to work as effectively for real-time as for pre-recorded audio. Something unheard of in the world of ASR. Real-time allows customers to transcribe in any of our supported languages before their audio has finished, getting transcripts within seconds without compromising on accuracy. It means that they can start adding value on top of the transcript immediately opening up competitive advantages for themselves. 

  • What are your greatest challenges ahead Speechmatics when it comes to serving your customers and developing your offer? 

Our biggest challenge is also our company mission statement – to ‘Understand Every Voice’. For us, we want to be able to transcribe voices not just in every language, but in every dialect, accent, and in any environment (including in noisy environments, for example a reporter on the streets). This is tricky because there are over 7,000 languages spoken, and therefore understanding each of these presents a unique set of technical challenges. We also want to always be able to do this as fast as possible to enable real-time or live transcription. 

This is a technical challenge but also challenging due to our diverse customer base across industries. Each customer’s needs and requirements are unique given their business and customer priorities, so ensuring we always deliver maximum value to all our customers with everything we do is always a challenge (but a welcome one). 

There is also a primary challenge for Media Intelligence, which is the ability to monitor a massive amount of data. The storm of media creation (podcasts, social media) is vast, spanning multiple languages and dialects, making it almost impossible to read all those transcripts. The way to address this is with intelligence capabilities. Summaries, chapters, translation, and topic categorization—all of these will help to be efficient and quickly report the mention back to customers. 

 

  • What is the focus for Speechmatics in 2024 and how will you get there? 

We will continue to combine the best of Automatic Speech Recognition (ASR) with the latest breakthroughs in self-supervised learning, Large Language Models (LLMs) and other developments from the world of AI. 

Our goal is to create a seamless interaction between people and the technology they use, harnessing the power of our voices. This will include adding more languages to our offering as well as a continued improvement to the accuracy of our transcription, to being able to better understand not just the words being said but the meaning and intention behind them. 

 

  • When it comes to the actual data behind media intelligence, what kind of data or media not currently used can be interesting in the future? 

 

In the realm of media intelligence, future data trends may witness a rise in multilingual conversations. As AI continues to evolve, conversations conducted in one language might prompt responses in a different language, underscoring the need for improved multilingual language processing capabilities. 

Social media platforms, particularly YouTube and TikTok, are on the rise, with regional variations requiring a multi-platform approach to social media marketing. Podcasts are also contributing to real-time media consumption. We are seeing a shift from text written media to video and audio generated video which includes lots of accents, dialects, and languages. 

 

 

  • How do you think the media intelligence industry will change in the next five years, and what are the greatest challenges ahead? 

 

The media intelligence industry is poised for substantial changes in the next five years. A notable transformation may involve a shift towards advanced AI technologies, resulting in enhanced data analysis and interpretation. However, this progress brings challenges, including ensuring ethical data use, addressing privacy concerns, and adapting to rapidly evolving technological landscapes. The main one is to monitor real content with the storm of AI-generated content. What is real and what isn’t? There will be more volumes to cover and more fake news or false positives. 

The volumes of content in audio and video will exponentially grow. Content creators will continue to appear at all levels across all regions as social media becomes more readily available and accessible to everyone in real-time. 

 

  • How Could Speechmatics help the industry regarding the above point? 

 

Speechmatics can play a pivotal role in adapting to the evolving nature of media intelligence. We are continuously expanding our language portfolio, helping to bring products to the largest audience possible – in real-time – without the hassle of managing multiple different language APIs and lengthy setup times. Unified APIs mean workstreams are simplified whilst delivering accuracy in downstream tasks. 

Through our cutting-edge speech recognition and language processing technologies, including advancements in multilingual and real-time translation models, Speechmatics can facilitate the industry's adaptation to multilingual dialogues. Its sophisticated features significantly improve the precision and speed of extracting information from spoken material, thereby providing comprehensive insights into media intelligence. 

 

 

  • How would you like to see FIBEP develop over the next five years? 

 

Looking ahead, it would be great to see FIBEP evolve by embracing emerging technologies and industry trends. This could involve developing standardized practices for handling new types of media data, addressing ethical considerations, and fostering innovation within the industry. The continuation of promoting global collaboration among its members will ensure the organization remains at the forefront of advancements in the media intelligence industry.  

 

 

Trevor Bio: 

Trevor is the Chief Product Officer at Speechmatics. He is an established product leader with over a decade of experience in machine learning and AI. A former AI startup founder himself, Trevor was also an early DeepMind employee and was led the team that commercialised AlphaFold, which in turn gave rise to the DeepMind spin-out, Isomorphic Labs. 

 



Comments


bottom of page