Anthrogen Introduces Odyssey: A 102B Parameter Protein Language Model
Target Audience Analysis
The target audience for Anthrogen’s Odyssey model includes researchers and professionals in the fields of bioinformatics, computational biology, and protein engineering. This audience typically faces several pain points, including:
- Difficulty in designing proteins that accurately represent their sequence and structure under functional constraints.
- Constraints related to data scarcity and efficiency in training models for protein design.
- Challenges in optimizing protein properties such as potency, specificity, stability, and manufacturability.
Their goals are to leverage advanced modeling techniques to innovate in protein design, enhance research outcomes, and potentially discover new therapeutic proteins. This audience values detailed technical specifications and peer-reviewed data over marketing language and has a preference for direct, clear communication that focuses on practicality and results.
Problem Targeted by Odyssey
Odyssey addresses the challenges in protein design by integrating amino acid sequences with their 3D structures and functional contexts. Traditional models often use self-attention mechanisms that do not adequately account for the geometric constraints of proteins, where long-range effects occur through local neighborhoods in 3D. Anthrogen reframes this as a locality problem, proposing a novel propagation rule termed Consensus, which aligns better with the physical realities of protein folding and function.
Input Representation and Tokenization
Odyssey is a multimodal model that combines sequence tokens, structure tokens, and lightweight functional cues into a unified representation. It employs a finite scalar quantizer (FSQ) to convert 3D geometry into compact tokens, enabling the model to interpret structure with the same ease as sequences. Functional cues can comprise domain tags, secondary structure hints, orthologous group labels, or descriptive text, allowing Odyssey to access both local sequence patterns and long-range geometric relations simultaneously.
Backbone Change: Consensus Instead of Self-Attention
Odyssey replaces the traditional global self-attention mechanism with iterative, locality-aware updates based on a sparse contact or sequence graph. This design allows nearby neighborhoods to agree initially, which subsequently propagates that consensus outward through the chain and contact graph. Notably, the computational scaling of Consensus is O(L), as opposed to the O(L²) of self-attention, making it more efficient for longer sequences and multi-domain constructs.
Moreover, Anthrogen reports that the Consensus mechanism exhibits improved robustness against learning rate variations, minimizing issues related to brittle training runs.
Training Objective and Generation: Discrete Diffusion
Odyssey employs discrete diffusion for training on sequence and structure tokens. The forward diffusion process introduces masking noise that simulates mutations, while the reverse time denoiser learns to reconstruct consistent sequences and coordinates. During inference, this reverse process enables conditional generation and protein editing capabilities, allowing for specific modifications while preserving the integrity of sequences and structures.
Anthrogen provides evidence that diffusion models consistently outperform masked language models based on matched evaluations, demonstrating lower training perplexities in comparison to both complex and simple masking approaches. This aligns with Odyssey’s aim to co-design proteins by integrating sequence and structural information effectively.
Key Takeaways
- Odyssey is a multimodal protein model family that integrates sequence, structure, and functional context, with production models available at 1.2B, 8B, and 102B parameters.
- The Consensus mechanism enhances locality-aware propagation while scaling as O(L) and displaying better robustness in learning rates at larger scales.
- FSQ translates 3D coordinates into discrete structure tokens for combined sequence and structure modeling.
- Discrete diffusion trains a reverse time denoiser, outperforming masked language models during evaluations and demonstrating exceptional data efficiency.
Odyssey operationalizes joint sequence and structure modeling effectively, facilitating conditional design and editing within practical constraints. With scalability up to 102B parameters and significant improvements in computation cost and data efficiency, it stands to significantly advance research in protein engineering.
For more detailed insights, refer to the original paper to explore technical specifications and findings.