Like nearly everybody, we had been impressed by the power of NotebookLM to generate podcasts: Two digital folks holding a dialogue. You can provide it some hyperlinks, and it’ll generate a podcast primarily based on the hyperlinks. The podcasts had been fascinating and interesting. But in addition they had some limitations.
The drawback with NotebookLM is that, when you can provide it a immediate, it largely does what it’s going to do. It generates a podcast with two voices—one male, one feminine—and provides you little management over the consequence. There’s an non-obligatory immediate to customise the dialog, however that single immediate doesn’t will let you do a lot. Specifically, you’ll be able to’t inform it which matters to debate or in what order to debate them. You can attempt, nevertheless it received’t hear. It additionally isn’t conversational, which is one thing of a shock now that we’ve all gotten used to chatting with AIs. You can’t inform it to iterate by saying “That was good, but please generate a new version changing these details” like you’ll be able to with ChatGPT or Gemini.
Can we do higher? Can we combine our data of books and know-how with AI’s capability to summarize? We’ve argued (and can proceed to argue) that merely studying find out how to use AI isn’t sufficient; it’s essential to learn to do one thing with AI that’s higher than what the AI may do by itself. You have to combine synthetic intelligence with human intelligence. To see what that may appear like in apply, we constructed our personal toolchain that offers us far more management over the outcomes. It’s a multistage pipeline:
- We use AI to generate a abstract for every chapter of a ebook, ensuring that every one the essential matters are lined.
- We use AI to assemble the chapter summaries right into a single abstract. This step basically provides us an prolonged define.
- We use AI to generate a two-person dialogue that turns into the podcast script.
- We edit the script by hand, once more ensuring that the summaries cowl the appropriate matters in the appropriate order. This can be a possibility to right errors and hallucinations.
- We use Google’s speech-to-text multispeaker API (nonetheless in preview) to generate a abstract podcast with two contributors.
Why are we specializing in summaries? Summaries curiosity us for a number of causes. First, let’s face it: Having two nonexistent folks focus on one thing you wrote is fascinating—particularly since they sound genuinely and excited. Hearing the voices of nonexistent cyberpeople focus on your work makes you’re feeling such as you’re residing in a sci-fi fantasy. More virtually: Generative AI is certainly good at summarization. There are few errors and virtually no outright hallucinations. Finally, our customers need summarization. On O’Reilly Answers, our clients often ask for summaries: summarize this ebook, summarize this chapter. They wish to discover the knowledge they want. They wish to discover out whether or not they really want to learn the ebook—and if that’s the case, what elements. A abstract helps them do this whereas saving time. It lets them uncover rapidly whether or not the ebook can be useful, and does so higher than the again cowl copy or a blurb on Amazon.
With that in thoughts, we needed to assume by way of what probably the most helpful abstract could be for our members. Should there be a single speaker or two? When a single synthesized voice summarized the ebook, my eyes (ears?) glazed over rapidly. It was a lot simpler to take heed to a podcast-style abstract the place the digital contributors had been excited and enthusiastic, like those on NotebookLM, than to a lecture. The give and take of a dialogue, even when simulated, gave the podcasts power {that a} single speaker didn’t have.
How lengthy ought to the abstract be? That’s an essential query. At some level, the listener loses curiosity. We may feed a ebook’s whole textual content right into a speech synthesis mannequin and get an audio model—we might but do this; it’s a product some folks need. But on the entire, we anticipate summaries to be minutes lengthy relatively than hours. I would hear for 10 minutes, possibly 30 if it’s a subject or a speaker that I discover fascinating. But I’m notably impatient after I take heed to podcasts, and I don’t have a commute or different downtime for listening. Your preferences and your scenario could also be a lot completely different.
What precisely do listeners anticipate from these podcasts? Do customers anticipate to be taught, or do they solely wish to discover out whether or not the ebook has what they’re in search of? That is determined by the subject. I can’t see somebody studying Go from a abstract—possibly extra to the purpose, I don’t see somebody who’s fluent in Go studying find out how to program with AI. Summaries are helpful for presenting the important thing concepts introduced within the ebook: For instance, the summaries of Cloud Native Go gave a great overview of how Go could possibly be used to deal with the problems confronted by folks writing software program that runs within the cloud. But actually studying this materials requires examples, writing code, and working towards—one thing that’s out of bounds in a medium that’s restricted to audio. I’ve heard AIs learn out supply code listings in Python; it’s terrible and ineffective. Learning is extra probably with a ebook like Facilitating Software Architecture, which is extra about ideas and concepts than code. Someone may come away from the dialogue with some helpful concepts and probably put them into apply. But once more, the podcast abstract is simply an summary. To get all the worth and element, you want the ebook. In a current article, Ethan Mollick writes, “Asking for a summary is not the same as reading for yourself. Asking AI to solve a problem for you is not an effective way to learn, even if it feels like it should be. To learn something new, you are going to have to do the reading and thinking yourself.”
Another distinction between the NotebookLM podcasts and ours could also be extra essential. The podcasts we generated from our toolchain are all about six minutes lengthy. The podcasts generated by NotebookLM are within the 10- to 25-minute vary. The longer size may permit the NotebookLM podcasts to be extra detailed, however in actuality that’s not what occurs. Rather than discussing the ebook itself, NotebookLM tends to make use of the ebook as a leaping off level for a broader dialogue. The O’Reilly-generated podcasts are extra directed. They observe the ebook’s construction as a result of we offered a plan, an overview, for the AI to observe. The digital podcasters nonetheless categorical enthusiasm, nonetheless usher in concepts from different sources, however they’re headed in a route. The longer NotebookLM podcasts, in distinction, can appear aimless, looping again round to choose up concepts they’ve already lined. To me, not less than, that appears like an essential level. Granted, utilizing the ebook because the jumping-off level for a broader dialogue can be helpful, and there’s a stability that must be maintained. You don’t need it to really feel such as you’re listening to the desk of contents. But you additionally don’t need it to really feel unfocused. And if you need a dialogue of a ebook, you must get a dialogue of the ebook.
None of those AI-generated podcasts are with out limitations. An AI-generated abstract isn’t good at detecting and reflecting on nuances within the unique writing. With NotebookLM, that clearly wasn’t underneath our management. With our personal toolchain, we may definitely edit the script to replicate no matter we wished, however the voices themselves weren’t underneath our management and wouldn’t essentially observe the textual content’s lead. (It’s debatable that reflecting the nuances of a 250-page ebook in a six-minute podcast is a shedding proposition.) Bias—a type of implied nuance—is an even bigger problem. Our first experiments with NotebookLM tended to have the feminine voice asking the questions, with the male voice offering the solutions, although that appeared to enhance over time. Our toolchain gave us management, as a result of we offered the script. We received’t declare that we had been unbiased—no person ought to make claims like that—however not less than we managed how our digital folks introduced themselves.
Our experiments are completed; it’s time to point out you what we created. We’ve taken 5 books, generated brief podcasts summarizing every with each NotebookLM and our toolchain, and posted each units on oreilly.com. We’ll be including extra books in 2025. Listen to them—see what works for you. And please tell us what you assume!