←back to Blog

A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow

A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow

In this tutorial, we demonstrate how to leverage ScrapeGraph’s powerful scraping tools in combination with Gemini AI to automate the collection, parsing, and analysis of competitor information. By using ScrapeGraph’s SmartScraperTool and MarkdownifyTool, users can extract detailed insights from product offerings, pricing strategies, technology stacks, and market presence directly from competitor websites. The tutorial then employs Gemini’s advanced language model to synthesize these disparate data points into structured, actionable intelligence. Throughout the process, ScrapeGraph ensures that the raw extraction is both accurate and scalable, allowing analysts to focus on strategic interpretation rather than manual data gathering.

Prerequisites

To get started, ensure you have the necessary libraries installed. You will need:

  • langchain-scrapegraph for advanced web scraping /
  • langchain-google-genai for integrating Gemini AI /
  • pandas for data analysis /
  • matplotlib for data visualization /
  • seaborn for statistical data visualization /

Run the following command to install or upgrade these libraries:

%pip install --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn

Setting Up Your Environment

Begin by importing the essential Python libraries for creating a secure, data-driven pipeline. These libraries include:

  • getpass and os for managing passwords and environment variables /
  • json for handling serialized data /
  • pandas for robust DataFrame operations /
  • typing for type hints /
  • datetime for timestamping /
  • matplotlib.pyplot and seaborn for visualization tools /

Environment Configuration

Check if your ScrapeGraph and Google API keys are set. If not, the script will prompt you securely for these keys:

if not os.environ.get("SGAI_API_KEY"):
    os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:\n")

if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:\n")

Importing Tools

Import and instantiate ScrapeGraph tools:

  • SmartScraperTool /
  • SearchScraperTool /
  • MarkdownifyTool /
  • GetCreditsTool /

We also configure the ChatGoogleGenerativeAI with the model “gemini-1.5-flash” to facilitate our analysis:

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.1,
    convert_system_message_to_human=True
)

Defining the CompetitiveAnalyzer Class

The CompetitiveAnalyzer class orchestrates end-to-end competitor research, enabling the scraping of detailed company information using ScrapeGraph tools, compiling and cleaning results, and leveraging Gemini AI to generate structured competitive insights.

Key Functions

  • scrape_competitor_data: Scrape comprehensive data from a competitor’s website. This function extracts information such as company name, product offerings, pricing, and technology stack.
  • analyze_competitor_landscape: Analyze multiple competitors and generate insights based on the scraped data.
  • generate_summary_stats: Produce summary statistics from the analysis, including success rates.
  • export_results: Export results to JSON and CSV files.

Running Competitive Analyses

To initiate the analysis, you can define specific competitor groups, such as AI/SaaS or e-commerce platforms, and run the respective analysis functions:

AI/SaaS Analysis Function

def run_ai_saas_analysis():
    analyzer = CompetitiveAnalyzer()
    ai_saas_competitors = [
        {"name": "OpenAI", "url": "https://openai.com"},
        {"name": "Anthropic", "url": "https://anthropic.com"},
        {"name": "Hugging Face", "url": "https://huggingface.co"},
        {"name": "Cohere", "url": "https://cohere.ai"},
        {"name": "Scale AI", "url": "https://scale.com"},
    ]
    results = analyzer.analyze_competitor_landscape(ai_saas_competitors)
    ...

E-commerce Analysis Function

def run_ecommerce_analysis():
    analyzer = CompetitiveAnalyzer()
    ecommerce_competitors = [
        {"name": "Shopify", "url": "https://shopify.com"},
        {"name": "WooCommerce", "url": "https://woocommerce.com"},
        {"name": "BigCommerce", "url": "https://bigcommerce.com"},
        {"name": "Magento", "url": "https://magento.com"},
    ]
    results = analyzer.analyze_competitor_landscape(ecommerce_competitors)
    ...

Social Media Monitoring

Additionally, the social_media_monitoring_chain function monitors competitor social media presence and engagement strategies. It analyzes aspects such as platform presence, content strategy patterns, and community building approaches:

def social_media_monitoring_chain(company_urls: List[str], config: RunnableConfig):
    ...

Conclusion

Integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a traditionally time-consuming competitive intelligence workflow into an efficient, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based information, while Gemini’s language understanding turns that raw data into high-level strategic recommendations. This automation allows businesses to rapidly assess market positioning, identify feature gaps, and uncover emerging opportunities with minimal manual intervention.

For further resources, check the GitHub repository and follow the discussion on relevant platforms.