Researchers at UT Austin Introduce Panda: A Foundation Model for Nonlinear Dynamics Pretrained on 20,000 Chaotic ODEs Discovered via Evolutionary Search
Chaotic systems, including fluid dynamics and brain activity, demonstrate high sensitivity to initial conditions, complicating long-term predictions. Small errors in modeling can escalate quickly, limiting the efficacy of many scientific machine learning (SciML) approaches. Traditional forecasting methods often depend on models trained on specific time series or datasets that lack true dynamical structure. Recent advancements have shown that local forecasting models can predict chaotic systems with greater accuracy over extended periods by learning the numerical rules that govern these systems.
The challenge remains to achieve out-of-domain generalization—developing models that can adapt and effectively forecast new, unseen dynamical systems. This necessitates the integration of prior knowledge with local adaptability. Current methods are often constrained by the need for task-specific data, which may overlook vital properties of dynamical systems such as ergodicity, channel coupling, and conserved quantities.
Machine learning for dynamical systems (MLDS) leverages the unique attributes of these systems as inductive biases, including fixed relationships among variables and invariant statistical measures. MLDS models utilize these properties to create more accurate and generalizable models, sometimes incorporating probabilistic or latent variable techniques. Although datasets of dynamical systems are curated and new systems generated through parameter tweaks or symbolic methods, these approaches do not always ensure diversity or stability in dynamics. Structural stability poses a challenge: minor changes may not yield new behaviors, while significant alterations can lead to trivial dynamics.
Foundation models aim to tackle these issues by enabling transfer learning and zero-shot inference; however, most current models perform similarly to standard time series models or are limited in generating meaningful dynamic variety. Progress has been made through techniques like embedding spaces and symbolic discovery, but the need for a richer and more diverse sampling of dynamical behaviors remains an ongoing challenge.
Panda: Overview and Innovations
Researchers at the Oden Institute, UT Austin, have introduced Panda (Patched Attention for Nonlinear Dynamics), a pretrained model solely trained on synthetic data derived from 20,000 algorithmically-generated chaotic systems. These systems were created using an evolutionary algorithm based on known chaotic ordinary differential equations (ODEs). Despite being trained on low-dimensional ODEs, Panda demonstrates strong zero-shot forecasting capabilities on real-world nonlinear systems, including fluid dynamics and electrophysiology, and generalizes unexpectedly to partial differential equations (PDEs).
Panda integrates innovations such as masked pretraining, channel attention, and kernelized patching to effectively capture dynamical structure. A neural scaling law emerges, linking Panda’s forecasting performance to the diversity of training systems.
The researchers generated 20,000 new chaotic systems using a genetic algorithm that evolves from a curated set of 135 known chaotic ODEs. These systems were mutated and recombined using a skew product approach, retaining only truly chaotic behaviors through rigorous testing. Augmentations like time-delay embeddings and affine transformations expanded the dataset while preserving its dynamics. A separate set of 9,300 unseen systems was held out for zero-shot testing. Panda is built on PatchTST and enhanced with features such as channel attention, temporal-channel attention layers, and dynamic embeddings inspired by Koopman operator theory.
Performance and Generalization
Panda exhibits robust zero-shot forecasting capabilities on previously unseen nonlinear dynamical systems, outperforming models like Chronos-SFT across various metrics and prediction horizons. Trained exclusively on 3D systems, it effectively generalizes to higher-dimensional dynamics due to its channel attention mechanism. Remarkably, Panda succeeds on real-world experimental data and chaotic PDEs, such as the Kuramoto-Sivashinsky and von Kármán vortex street, despite never encountering PDEs during training.
Architectural ablations confirm the significance of channel attention and dynamic embeddings. The model demonstrates neural scaling with increased dynamical system diversity, revealing interpretable attention patterns, which suggest resonance and attractor-sensitive structures. This indicates Panda’s capacity for broad generalization across complex dynamical behaviors.
Conclusion
Panda is a pretrained model aimed at uncovering generalizable patterns in dynamical systems. Trained on a vast and diverse set of synthetic chaotic systems, Panda shows impressive zero-shot forecasting on unseen real-world data and even partial differential equations, despite being trained solely on low-dimensional ODEs. Its performance improves with the diversity of the systems, revealing a neural scaling law, and exhibits emergent nonlinear resonance in attention patterns. Although primarily focused on low-dimensional dynamics, the approach could extend to higher-dimensional systems by leveraging sparse interactions. Future research may explore alternative pretraining strategies to enhance rollout performance in forecasting chaotic behaviors.
For further details, check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.