←back to Blog

LangGraph Tutorial: A Step-by-Step Guide to Creating a Text Analysis Pipeline

«`html

LangGraph Tutorial: A Step-by-Step Guide to Creating a Text Analysis Pipeline

Estimated reading time: 5 minutes

Table of contents

Introduction to LangGraph

LangGraph is a framework by LangChain designed for creating stateful, multi-actor applications with large language models (LLMs). It provides the structure and tools needed to build sophisticated AI agents through a graph-based approach. This allows us to design how different capabilities will connect and how information will flow through our agent.

Key Features

  • State Management: Maintain persistent state across interactions
  • Flexible Routing: Define complex flows between components
  • Persistence: Save and resume workflows
  • Visualization: See and understand your agent’s structure

Setting Up Our Environment

Before diving into the code, let’s set up our development environment.

Installation

Install the required packages:

pip install langgraph langchain langchain-openai python-dotenv

Setting Up API Keys

To use OpenAI’s models, you will need an API key. Obtain it from OpenAI.

Understanding the Power of Coordinated Processing

LangGraph allows us to create a multi-step text analysis pipeline. This pipeline will include:

  • Text Classification: Categorizing input text into predefined categories
  • Entity Extraction: Identifying key entities from the text
  • Text Summarization: Generating a concise summary of the input text

Building Our Text Analysis Pipeline

We will import the necessary packages and design our agent’s memory using a TypedDict to track information.


class State(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str

Next, we initialize our language model:

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

Creating Our Agent’s Core Capabilities

We will create functions for each type of analysis:


def classification_node(state: State):
    # Classify text into categories
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Classify the following text into one of the categories: News, Blog, Research, or Other.\n\nText:{text}\n\nCategory:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    classification = llm.invoke([message]).content.strip()
    return {"classification": classification}

Similarly, we will define the entity_extraction_node and summarization_node functions.

Bringing It All Together

We will connect these capabilities into a coordinated system using LangGraph:


workflow = StateGraph(State)
workflow.add_node("classification_node", classification_node)
workflow.add_node("entity_extraction", entity_extraction_node)
workflow.add_node("summarization", summarization_node)
workflow.set_entry_point("classification_node")
workflow.add_edge("classification_node", "entity_extraction")
workflow.add_edge("entity_extraction", "summarization")
workflow.add_edge("summarization", END)
app = workflow.compile()

Try with Your Own Text

Test the pipeline with your own text samples:


sample_text = """ OpenAI has announced the GPT-4 model... """
state_input = {"text": sample_text} 
result = app.invoke(state_input) 

Adding More Capabilities (Advanced)

We can enhance our pipeline by adding a sentiment analysis node. This requires updating the state structure:


class EnhancedState(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str
    sentiment: str

Define the new sentiment node and update the workflow accordingly.

Adding Conditional Edges (Advanced Logic)

Conditional edges allow our graph to act intelligently based on the data in the current state. We will create a routing function to manage this logic.


def route_after_classification(state: EnhancedState) -> str:
    category = state["classification"].lower()
    return category in ["news", "research"]

Define the conditional workflow and compile it:


conditional_workflow = StateGraph(EnhancedState)
conditional_workflow.add_node("classification_node", classification_node)
conditional_workflow.add_node("entity_extraction", entity_extraction_node)
conditional_workflow.add_node("summarization", summarization_node)
conditional_workflow.add_node("sentiment_analysis", sentiment_node)
conditional_workflow.set_entry_point("classification_node")
conditional_workflow.add_conditional_edges("classification_node", route_after_classification, path_map={True: "entity_extraction", False: "summarization"})
conditional_app = conditional_workflow.compile()

Conclusion

In this tutorial, we’ve built a text processing pipeline using LangGraph, exploring its capabilities for classification, entity extraction, and summarization. We also enhanced our pipeline with additional capabilities and conditional edges for dynamic processing.

Next Steps

  • Add more nodes to extend your agent’s capabilities
  • Experiment with different LLMs and parameters
  • Explore LangGraph’s state persistence features for ongoing conversations

All credit for this research goes to the researchers of this project. Feel free to follow us on Twitter and join our community on various ML platforms.

«`