According to TechCrunch, Meta has unveiled an 'open' implementation of the popular generate-a-podcast feature found in Google's NotebookLM. Named NotebookLlama, this project leverages Meta's proprietary Llama models for much of its processing. Similar to NotebookLM, NotebookLlama can create podcast-style digests from text files uploaded to it. The process begins with generating a transcript from a file, such as a PDF of a news article or blog post. It then adds dramatization and interruptions before converting the transcript into speech using open text-to-speech models.
However, the audio quality of NotebookLlama's output does not match that of NotebookLM. The samples reviewed exhibit a distinctly robotic tone, with voices occasionally talking over each other at inappropriate moments. Meta researchers acknowledge that the text-to-speech model is a limiting factor in achieving natural-sounding results. They suggest that the quality could be enhanced with more advanced models. Additionally, they propose an alternative approach where two agents debate the topic to create a podcast outline, as opposed to the current method of using a single model.
NotebookLlama is not the first attempt to replicate NotebookLM's podcast feature. Various projects have tried, with varying degrees of success. Nonetheless, a common issue persists across all AI-generated podcasts, including NotebookLM: the problem of hallucination, where the AI generates inaccurate or fabricated information. This remains a significant challenge for developers working on AI podcast generation.