Problem Statement: Eroding U.S. military advantage over China and Russia undermines our ability to deter aggression and coercion in the Indo-Pacific. Central Idea: A credible military advantage translates into continued economic and political freedom.
The project's primary goals include: 1. Researching and understanding the latest LLM fine-tuning techniques, such as mixed experts, ensemble learning, and LoRa adapters. 2. Collecting open-source code from resources such as GitHub, data science blogs, and other online platforms. 3. Cleaning and transforming multimodal, unstructured data (e.g., word and PDF files) into a structured CSV format using LLMs. 4. Building a training framework for the LLM, including reward functions and evaluation metrics. 5. Utilizing AFRL HPC resources to fine-tune the model once the training framework, data cleaning, and evaluation metrics are established. 6. Evaluating the trade-offs between using larger LLM models like LLama 400B and fine- tuning smaller models. The project requires a cross-functional team including some with strong Python skills and all with the ability to independently learn and self-teach.