• Lecture time: Tuesday: 13:30:00 - 14:20:00 ET
  • Lab time: Thursday: 13:30:00 - 15:20:00 ET
  • Domain: Aerospace
  • Keywords: Data Cleaning, Data Modeling, Geospatial Analysis, Large Language Models
  • Tools: Data Analytics, Geolocation, Geospatial Data Analysis, Large Language Models
  • Citizenship: U.S. Citizens Required
Summary

Problem Statement: Eroding U.S. military advantage over China and Russia undermines our ability to deter aggression and coercion in the Indo-Pacific. Central Idea: A credible military advantage translates into continued economic and political freedom.

Description

The project's primary goals include: 1. Researching and understanding the latest LLM fine-tuning techniques, such as mixed experts, ensemble learning, and LoRa adapters. 2. Collecting open-source code from resources such as GitHub, data science blogs, and other online platforms. 3. Cleaning and transforming multimodal, unstructured data (e.g., word and PDF files) into a structured CSV format using LLMs. 4. Building a training framework for the LLM, including reward functions and evaluation metrics. 5. Utilizing AFRL HPC resources to fine-tune the model once the training framework, data cleaning, and evaluation metrics are established. 6. Evaluating the trade-offs between using larger LLM models like LLama 400B and fine- tuning smaller models. The project requires a cross-functional team including some with strong Python skills and all with the ability to independently learn and self-teach.