«`html

AI and the Brain: How DINOv3 Models Reveal Insights into Human Visual Processing

Introduction

Understanding how the brain builds internal representations of the visual world is a significant challenge in neuroscience. Over the past decade, deep learning has transformed computer vision, producing neural networks that achieve human-level accuracy in recognition tasks and exhibit processing methods resembling those of the human brain. This overlap raises an intriguing question: can studying AI models enhance our understanding of how the brain learns to perceive visual stimuli?

Researchers at Meta AI and École Normale Supérieure explored this question by examining DINOv3, a self-supervised vision transformer trained on billions of natural images. They compared DINOv3’s internal activations with human brain responses to the same images, utilizing two complementary neuroimaging techniques. fMRI provided high-resolution spatial maps of cortical activity, while MEG captured the precise timing of brain responses. This combination of datasets offered a comprehensive view of visual information processing in the brain.

Technical Details

The research team investigated three factors potentially influencing brain-model similarity: model size, the volume of training data, and the type of images used for training. They trained multiple versions of DINOv3, varying these factors independently.

Brain-Model Similarity

The study revealed strong evidence of convergence between DINOv3 and human brain responses. The model’s activations predicted fMRI signals in both early visual regions and higher-order cortical areas, with peak voxel correlations reaching R = 0.45. MEG results indicated that alignment began as early as 70 milliseconds after image onset and persisted for up to three seconds. Notably, early DINOv3 layers aligned with regions such as V1 and V2, while deeper layers correlated with activity in higher-order areas, including parts of the prefrontal cortex.

Training Trajectories

Tracking these similarities throughout the training process revealed a developmental trajectory. Low-level visual alignments emerged early, after only a small fraction of training, while higher-level alignments required billions of images. This mirrors the human brain’s development, where sensory areas mature earlier than associative cortices. The study indicated that temporal alignment emerged fastest, spatial alignment more slowly, and encoding similarity appeared in between, highlighting the layered nature of representational development.

Role of Model Factors

The influence of model factors was also significant. Larger models consistently achieved higher similarity scores, particularly in higher-order cortical regions. Extended training improved alignment across the board, with high-level representations benefiting the most from prolonged exposure. The type of images used in training was crucial; models trained on human-centric images exhibited the strongest alignment, while those trained on satellite or cellular images showed partial convergence in early visual regions but weaker similarity in higher-order areas. This underscores the importance of ecologically relevant data for capturing the full spectrum of human-like representations.

Links to Cortical Properties

Interestingly, the timing of DINOv3’s representation emergence aligned with structural and functional properties of the cortex. Regions with greater developmental expansion, thicker cortex, or slower intrinsic timescales aligned later in training, while highly myelinated regions aligned earlier, reflecting their role in rapid information processing. These correlations suggest that AI models can provide insights into the biological principles underlying cortical organization.

Nativism vs. Empiricism

The study highlights a balance between innate structure and learning. DINOv3’s architecture features a hierarchical processing pipeline, but full brain-like similarity only emerged with extended training on ecologically valid data. This interplay between architectural priors and experience resonates with ongoing debates in cognitive science regarding nativism and empiricism.

Developmental Parallels

The parallels to human development are striking. Just as sensory cortices in the brain mature quickly and associative areas develop more slowly, DINOv3 aligned with sensory regions early in training and with prefrontal areas much later. This suggests that training trajectories in large-scale AI models may serve as computational analogues for the staged maturation of human brain functions.

Beyond the Visual Pathway

The results extend beyond traditional visual pathways. DINOv3 demonstrated alignment in prefrontal and multimodal regions, raising questions about whether such models capture higher-order features relevant for reasoning and decision-making. While this study focused solely on DINOv3, it points toward exciting possibilities for using AI as a tool to test hypotheses about brain organization and development.

Conclusion

This research indicates that self-supervised vision models like DINOv3 are not just powerful computer vision systems; they also approximate aspects of human visual processing. By studying how models learn to perceive, we gain valuable insights into how the human brain develops the ability to interpret the world.

Check out the PAPER for more detailed insights.

«`