The diffusion model has grasped enormous attention in the computer vision field and emerged as a promising algorithm in protein design for precise structure and sequence generation. Here PRO-LDM is introduced: a modular multi-tasking framework combining design fidelity and computational efficiency, by integrating the diffusion model in latent space. The model learns biological representations at local and global levels, to design natural-like species with enhanced diversity, or optimize protein properties and functions. Its modular nature also enables the integration with alternative pre-trained encoders for enhanced generalization capability. Outlier design can be implemented by adjusting the classifier-free g... More
The diffusion model has grasped enormous attention in the computer vision field and emerged as a promising algorithm in protein design for precise structure and sequence generation. Here PRO-LDM is introduced: a modular multi-tasking framework combining design fidelity and computational efficiency, by integrating the diffusion model in latent space. The model learns biological representations at local and global levels, to design natural-like species with enhanced diversity, or optimize protein properties and functions. Its modular nature also enables the integration with alternative pre-trained encoders for enhanced generalization capability. Outlier design can be implemented by adjusting the classifier-free guidance that enables PRO-LDM to sample vastly different regions in the latent space. The approach is demonstrated in generating a novel green-fluorescence-protein variant with notably enhanced fluorescence in multiple working scenarios along with increased solubility and stability. The model provides a versatile tool to effectively extract physicochemical and evolutionary information in sequences for designing new proteins with optimized performances.