Students will compare different NLP models and their generated protein sequence vectors (for possible empirical/statistical correlations to performance and stability metrics relevant to biopharmaceutical development). Vectors would then be organized into a structured database.
Please see the PDF for a detailed project description. When registering for this project in UniTime, look for 'Merck (NLP on Protein Sequences)' in the Note section, and select the appropriate CRN.