A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow
In this tutorial, we demonstrate how to leverage ScrapeGraph’s powerful scraping tools in combination with Gemini AI to automate the collection, parsing, and analysis of competitor information. By using ScrapeGraph’s SmartScraperTool and MarkdownifyTool, users can extract detailed insights from product offerings, pricing strategies, technology stacks, and market presence directly from competitor websites. The tutorial then employs Gemini’s advanced language model to synthesize these disparate data points into structured, actionable intelligence. Throughout the process, ScrapeGraph ensures that the raw extraction is both accurate and scalable, allowing analysts to focus on strategic interpretation rather than manual data gathering.
Prerequisites
To get started, ensure you have the necessary libraries installed. You will need:
- langchain-scrapegraph for advanced web scraping /
- langchain-google-genai for integrating Gemini AI /
- pandas for data analysis /
- matplotlib for data visualization /
- seaborn for statistical data visualization /
Run the following command to install or upgrade these libraries:
%pip install --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn
Setting Up Your Environment
Begin by importing the essential Python libraries for creating a secure, data-driven pipeline. These libraries include:
getpassandosfor managing passwords and environment variables /jsonfor handling serialized data /pandasfor robust DataFrame operations /typingfor type hints /datetimefor timestamping /matplotlib.pyplotandseabornfor visualization tools /
Environment Configuration
Check if your ScrapeGraph and Google API keys are set. If not, the script will prompt you securely for these keys:
if not os.environ.get("SGAI_API_KEY"):
os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:\n")
if not os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:\n")
Importing Tools
Import and instantiate ScrapeGraph tools:
- SmartScraperTool /
- SearchScraperTool /
- MarkdownifyTool /
- GetCreditsTool /
We also configure the ChatGoogleGenerativeAI with the model “gemini-1.5-flash” to facilitate our analysis:
llm = ChatGoogleGenerativeAI(
model="gemini-1.5-flash",
temperature=0.1,
convert_system_message_to_human=True
)
Defining the CompetitiveAnalyzer Class
The CompetitiveAnalyzer class orchestrates end-to-end competitor research, enabling the scraping of detailed company information using ScrapeGraph tools, compiling and cleaning results, and leveraging Gemini AI to generate structured competitive insights.
Key Functions
- scrape_competitor_data: Scrape comprehensive data from a competitor’s website. This function extracts information such as company name, product offerings, pricing, and technology stack.
- analyze_competitor_landscape: Analyze multiple competitors and generate insights based on the scraped data.
- generate_summary_stats: Produce summary statistics from the analysis, including success rates.
- export_results: Export results to JSON and CSV files.
Running Competitive Analyses
To initiate the analysis, you can define specific competitor groups, such as AI/SaaS or e-commerce platforms, and run the respective analysis functions:
AI/SaaS Analysis Function
def run_ai_saas_analysis():
analyzer = CompetitiveAnalyzer()
ai_saas_competitors = [
{"name": "OpenAI", "url": "https://openai.com"},
{"name": "Anthropic", "url": "https://anthropic.com"},
{"name": "Hugging Face", "url": "https://huggingface.co"},
{"name": "Cohere", "url": "https://cohere.ai"},
{"name": "Scale AI", "url": "https://scale.com"},
]
results = analyzer.analyze_competitor_landscape(ai_saas_competitors)
...
E-commerce Analysis Function
def run_ecommerce_analysis():
analyzer = CompetitiveAnalyzer()
ecommerce_competitors = [
{"name": "Shopify", "url": "https://shopify.com"},
{"name": "WooCommerce", "url": "https://woocommerce.com"},
{"name": "BigCommerce", "url": "https://bigcommerce.com"},
{"name": "Magento", "url": "https://magento.com"},
]
results = analyzer.analyze_competitor_landscape(ecommerce_competitors)
...
Social Media Monitoring
Additionally, the social_media_monitoring_chain function monitors competitor social media presence and engagement strategies. It analyzes aspects such as platform presence, content strategy patterns, and community building approaches:
def social_media_monitoring_chain(company_urls: List[str], config: RunnableConfig):
...
Conclusion
Integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a traditionally time-consuming competitive intelligence workflow into an efficient, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based information, while Gemini’s language understanding turns that raw data into high-level strategic recommendations. This automation allows businesses to rapidly assess market positioning, identify feature gaps, and uncover emerging opportunities with minimal manual intervention.
For further resources, check the GitHub repository and follow the discussion on relevant platforms.