←back to Blog

A Coding Implementation to Build an Interactive Transcript and PDF Analysis with Lyzr Chatbot Framework

«`html

Building an Interactive Transcript and PDF Analysis with the Lyzr Chatbot Framework

In this tutorial, we introduce a streamlined approach for extracting, processing, and analyzing YouTube video transcripts using Lyzr, an AI-powered framework designed to simplify interaction with textual data. Leveraging Lyzr’s intuitive ChatBot interface alongside the youtube-transcript-api and FPDF, users can convert video content into structured PDF documents and conduct insightful analyses through dynamic interactions. This tutorial is ideal for researchers, educators, and content creators seeking to derive meaningful insights, generate summaries, and formulate creative questions directly from multimedia resources.

Setting Up the Environment

To get started, we need to set up the necessary environment. The following command installs essential Python libraries:

!pip install lyzr youtube-transcript-api fpdf2 ipywidgets

We also ensure the DejaVu Sans font is installed on the system to support full Unicode text rendering within the generated PDF files:

!apt-get update -qq && apt-get install -y fonts-dejavu-core

Configuring OpenAI API Access

Next, we configure OpenAI API key access:

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY_HERE"

Importing Essential Libraries

We import essential libraries required for the tutorial:

import json
from lyzr import ChatBot
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript
from fpdf import FPDF
from ipywidgets import Textarea, Button, Output, Layout
from IPython.display import display, Markdown
import re

Function: Converting Transcript to PDF

The transcript_to_pdf function automates converting YouTube video transcripts into clean, readable PDF documents. It retrieves the transcript using the YouTubeTranscriptApi, handles exceptions gracefully, and formats the text to avoid layout issues.

def transcript_to_pdf(video_id: str, output_pdf_path: str) -> bool:
    try:
        entries = YouTubeTranscriptApi.get_transcript(video_id)
    except (TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript):
        try:
            entries = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
        except Exception:
            print(f"[!] No transcript for {video_id}")
            return False
    except Exception as e:
        print(f"[!] Error fetching transcript for {video_id}: {e}")
        return False

    text = "\n".join(e['text'] for e in entries).strip()
    if not text:
        print(f"[!] Empty transcript for {video_id}")
        return False

    pdf = FPDF()
    pdf.add_page()

    font_path = "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf"
    try:
        if os.path.exists(font_path):
            pdf.add_font("DejaVu", "", font_path)
            pdf.set_font("DejaVu", size=10)
        else:
            pdf.set_font("Arial", size=10)
    except Exception:
        pdf.set_font("Arial", size=10)

    pdf.set_margins(20, 20, 20)
    pdf.set_auto_page_break(auto=True, margin=25)

    def process_text_for_pdf(text):
        text = re.sub(r'\s+', ' ', text)
        text = text.replace('\n\n', '\n')

        processed_lines = []
        for paragraph in text.split('\n'):
            if not paragraph.strip():
                continue

            words = paragraph.split()
            processed_words = []
            for word in words:
                if len(word) > 50:
                    chunks = [word[i:i+50] for i in range(0, len(word), 50)]
                    processed_words.extend(chunks)
                else:
                    processed_words.append(word)

            processed_lines.append(' '.join(processed_words))

        return processed_lines

    processed_lines = process_text_for_pdf(text)

    for line in processed_lines:
        if line.strip():
            try:
                pdf.multi_cell(0, 8, line.encode('utf-8', 'replace').decode('utf-8'), align='L')
                pdf.ln(2)
            except Exception as e:
                print(f"[!] Warning: Skipped problematic line: {str(e)[:100]}...")
                continue

    try:
        pdf.output(output_pdf_path)
        print(f"[+] PDF saved: {output_pdf_path}")
        return True
    except Exception as e:
        print(f"[!] Error saving PDF: {e}")
        return False

Function: Creating Interactive Chat

The create_interactive_chat function creates a simple interactive chat interface:

def create_interactive_chat(agent):
    input_area = Textarea(
        placeholder="Type a question…", layout=Layout(width='80%', height='80px')
    )
    send_button = Button(description="Send", button_style="success")
    output_area = Output(layout=Layout(
        border='1px solid gray', width='80%', height='200px', overflow='auto'
    ))

    def on_send(btn):
        question = input_area.value.strip()
        if not question:
            return
        with output_area:
            print(f">> You: {question}")
            try:
                print("<< Bot:", agent.chat(question), "\n")
            except Exception as e:
                print(f"[!] Error: {e}\n")

    send_button.on_click(on_send)
    display(input_area, send_button, output_area)

Main Function

The main function serves as the core driver for the entire tutorial pipeline:

def main():
    video_ids = ["dQw4w9WgXcQ", "jNQXAC9IVRw"]
    processed = []

    for vid in video_ids:
        pdf_path = f"{vid}.pdf"
        if transcript_to_pdf(vid, pdf_path):
            processed.append((vid, pdf_path))
        else:
            print(f"[!] Skipping {vid} — no transcript available.")

    if not processed:
        print("[!] No PDFs generated. Please try other video IDs.")
        return

    first_vid, first_pdf = processed[0]
    print(f"[+] Initializing PDF-chat agent for video {first_vid}…")
    bot = ChatBot.pdf_chat(
        input_files=[first_pdf]
    )

    questions = [
        "Summarize the transcript in 2–3 sentences.",
        "What are the top 5 insights and why?",
        "List any recommendations or action items mentioned.",
        "Write 3 quiz questions to test comprehension.",
        "Suggest 5 creative prompts to explore further."
    ]
    responses = {}
    for q in questions:
        print(f"[?] {q}")
        try:
            resp = bot.chat(q)
        except Exception as e:
            resp = f"[!] Agent error: {e}"
        responses[q] = resp
        print(f"[/] {resp}\n" + "-"*60 + "\n")

    with open('responses.json','w',encoding='utf-8') as f:
        json.dump(responses,f,indent=2)
    md = "# Transcript Analysis Report\n\n"
    for q,a in responses.items():
        md += f"## Q: {q}\n{a}\n\n"
    with open('report.md','w',encoding='utf-8') as f:
        f.write(md)

    display(Markdown(md))

    if len(processed) > 1:
        print("[+] Generating comparison…")
        _, pdf1 = processed[0]
        _, pdf2 = processed[1]
        compare_bot = ChatBot.pdf_chat(
            input_files=[pdf1, pdf2]
        )
        comparison = compare_bot.chat(
            "Compare the main themes of these two videos and highlight key differences."
        )
        print("[+] Comparison Result:\n", comparison)

    print("\n=== Interactive Chat (Video 1) ===")
    create_interactive_chat(bot)

Execution Control

We ensure that the main() function runs only when the script is executed directly:

if __name__ == "__main__":
    main()

Conclusion

By integrating Lyzr into our workflow as demonstrated in this tutorial, we can transform YouTube videos into actionable knowledge. Lyzr’s intelligent PDF-chat capability simplifies extracting core themes and generating comprehensive summaries while enabling engaging, interactive exploration of content through a conversational interface. Adopting Lyzr empowers users to unlock deeper insights and enhances productivity when working with video transcripts, whether for academic research, educational purposes, or creative content analysis.

For further exploration, feel free to follow us on Twitter and join our ML SubReddit community.

```