«`html

Build Custom AI Tools for Your AI Agents that Combine Machine Learning and Statistical Analysis

The ability to build custom tools is essential for developing customizable AI agents. This tutorial demonstrates how to create a powerful data analysis tool using Python, which can be integrated into AI agents powered by LangChain. By defining a structured schema for user inputs and implementing key functionalities like correlation analysis, clustering, outlier detection, and target variable profiling, this tool transforms raw tabular data into actionable insights. Leveraging the modularity of LangChain’s BaseTool, the implementation illustrates how developers can encapsulate domain-specific logic and build reusable components that enhance the analytical capabilities of autonomous AI systems.

Installation of Required Packages

To begin, install the essential Python packages for data analysis, visualization, machine learning, and LangChain tool development:

!pip install langchain langchain-core pandas numpy matplotlib seaborn scikit-learn

Input Schema Definition

We define the input schema for the custom analysis tool using Pydantic’s BaseModel. The DataAnalysisInput class ensures that incoming data follows a structured format, allowing users to specify the dataset, type of analysis, an optional target column, and the maximum number of clusters for clustering tasks. It serves as a clean interface for validating inputs before analysis begins.

class DataAnalysisInput(BaseModel):
   data: List[Dict[str, Any]] = Field(description="List of data records as dictionaries")
   analysis_type: str = Field(default="comprehensive", description="Type of analysis: 'comprehensive', 'clustering', 'correlation', 'outlier'")
   target_column: Optional[str] = Field(default=None, description="Target column for focused analysis")
   max_clusters: int = Field(default=5, description="Maximum clusters for clustering analysis")

Intelligent Data Analyzer Class

The IntelligentDataAnalyzer class is a custom tool built using LangChain’s BaseTool, designed to perform comprehensive data analysis on structured datasets. It integrates multiple analytical methods, including correlation matrix generation, K-Means clustering with silhouette scoring, outlier detection using IQR and z-score, and descriptive statistics on a target column, into a unified pipeline. The tool not only extracts valuable insights but also auto-generates recommendations and a summary report, making it highly useful for building AI agents that require decision-support capabilities grounded in data.

class IntelligentDataAnalyzer(BaseTool):
   name: str = "intelligent_data_analyzer"
   description: str = "Advanced data analysis tool that performs statistical analysis, machine learning clustering, outlier detection, correlation analysis, and generates visualizations with actionable insights."
   args_schema: type[BaseModel] = DataAnalysisInput
   response_format: str = "content_and_artifact"
  
   def _run(self, data: List[Dict], analysis_type: str = "comprehensive", target_column: Optional[str] = None, max_clusters: int = 5) -> Tuple[str, Dict]:
       try:
           df = pd.DataFrame(data)
           if df.empty:
               raise ToolException("Dataset is empty")
          
           insights = {"dataset_info": self._get_dataset_info(df)}
          
           if analysis_type in ["comprehensive", "correlation"]:
               insights["correlation_analysis"] = self._correlation_analysis(df)
           if analysis_type in ["comprehensive", "clustering"]:
               insights["clustering_analysis"] = self._clustering_analysis(df, max_clusters)
           if analysis_type in ["comprehensive", "outlier"]:
               insights["outlier_detection"] = self._outlier_detection(df)
          
           if target_column and target_column in df.columns:
               insights["target_analysis"] = self._target_analysis(df, target_column)
          
           recommendations = self._generate_recommendations(df, insights)
           summary = self._create_analysis_summary(insights, recommendations)
          
           artifact = {
               "insights": insights,
               "recommendations": recommendations,
               "data_shape": df.shape,
               "analysis_type": analysis_type,
               "numeric_columns": df.select_dtypes(include=[np.number]).columns.tolist(),
               "categorical_columns": df.select_dtypes(include=['object']).columns.tolist()
           }
          
           return summary, artifact
          
       except Exception as e:
           raise ToolException(f"Analysis failed: {str(e)}")

Sample Data Analysis

We initialize the IntelligentDataAnalyzer tool and feed it a sample dataset comprising demographic and satisfaction data. By specifying the analysis type as “comprehensive” and setting “satisfaction” as the target column, the tool performs a full suite of analyses, including statistical profiling, correlation checking, clustering, outlier detection, and target distribution analysis. The final output is a human-readable summary and structured insights that demonstrate how an AI agent can automatically process and interpret real-world tabular data.

data_analyzer = IntelligentDataAnalyzer()

sample_data = [
   {"age": 25, "income": 50000, "education": "Bachelor", "satisfaction": 7},
   {"age": 35, "income": 75000, "education": "Master", "satisfaction": 8},
   {"age": 45, "income": 90000, "education": "PhD", "satisfaction": 6},
   {"age": 28, "income": 45000, "education": "Bachelor", "satisfaction": 7},
   {"age": 52, "income": 120000, "education": "Master", "satisfaction": 9},
]

result = data_analyzer.invoke({
   "data": sample_data,
   "analysis_type": "comprehensive",
   "target_column": "satisfaction"
})

print("Analysis Summary:")
print(result)

Conclusion

In conclusion, we have created an advanced custom tool to integrate with AI agents. The IntelligentDataAnalyzer class handles a diverse range of analytical tasks, from statistical profiling to machine learning-based clustering, and also presents insights in a structured output with clear recommendations. This approach highlights how custom LangChain tools can bridge the gap between data science and interactive AI, making agents more context-aware and capable of delivering rich, data-driven decisions.

Check out the Codes. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

«`