Skip to main content

Industry & alumni

University of Alaska - Alaska Center for Energy and Power

Efficient Energy Research: Building an Advanced Language Model and Interface

This student team will work to develop a robust Large Language Model (LLM) capable of analyzing energy-related documents. The LLM the student team works to create will extract valuable insights from energy-related PDFs, perform labeling and cleaning tasks, and provide researchers with actionable information. Additionally, this student team will work to create a user-friendly web application to facilitate researchers' access to the LLM's capabilities. Energy researchers often struggle to unearth pertinent information from a multitude of documents. This student team will work to streamline their efforts by deploying an advanced LLM, alleviating the data discovery challenge. The student team's goal extends beyond the University of Alaska – it envisions an open source solution that benefits multiple universities, reinforcing collaborative knowledge sharing. Successful project completion holds the potential to streamline energy research by providing rapid data analysis. The project stands to facilitate efficient research practices and broader knowledge dissemination, both within academic circles and industries influenced by energy trends. The design this student team will work to incorporate includes: - Design Architecture and System Design: Develop a comprehensive system architecture, outlining the interaction between the Large Language Model (LLM), the web application, and cloud infrastructure. This student team will work to define component roles, interfaces, and data flows to ensure seamless integration. - Design for fault tolerance and high availability. This student team will work to implement redundancy and failover mechanisms to minimize service disruptions. This student team will also work to structure the architecture with cost efficiency in mind. This student team will work to utilize resource allocation strategies that balance performance requirements with budget considerations. The student team will work to optimize system components for responsiveness and efficiency. This student team will work to employ caching, load balancing, and other performance-enhancing techniques. The results phase this student team will work to accomplish include: - Data Collection and Preprocessing. This student team will work to gather a diverse set of energy-related documents in PDF and other formats. - This student team will work to perform Optical Character Recognition (OCR) to extract text from PDFs and documents. - This student team will work to clean and preprocess the text data, including handling noise, formatting issues, and errors. - Data Labeling and Annotation: This student team will work to manually label and annotate a subset of the data for training and validation. Labels could include categorization (e.g., renewable energy, fossil fuels), key information extraction (e.g., dates, quantities), sentiment analysis, etc. - Model Selection and Training: This student team will work to choose a suitable pre-trained language model architecture (e.g., BERT, GPT-3) as a starting point. This student team will work to fine-tune the selected model using the labeled energy data to make it domain-specific. This student team will work to experiment with hyperparameters, optimization techniques, and training strategies to achieve desired performance. - Web Application Development: This student team will work to develop a user-friendly web interface using a framework like Flask or Django. - This student team will work to implement a mechanism for users to upload energy-related documents and receive analysis results. - This student team will work to integrate the trained LLM into the application to process user input and generate insights. - Model Evaluation and Iteration: This student team will work to evaluate the performance of the trained LLM using validation datasets and metrics relevant to your project's goals (e.g., accuracy, information extraction precision/recall), and to iterate on the model training and fine-tuning based on evaluation results to improve accuracy and relevance. - This student team will work to deploy the web application on a suitable server or cloud platform and ensure the application is accessible to researchers and can handle a reasonable amount of user traffic. - This student team will work to provide comprehensive documentation on how to use the web application and interpret the model's results. - This student team will work to gather feedback from potential users or researchers on the web application's usability and functionality. The student team will also work to refine the web interface based on user feedback to ensure it meets the researchers' needs effectively. Ultimately, this student team is working to create an Energy Large Language Model that will deliver a fully trained Large Language Model (LLM) specialized in energy data analysis. The LLM will excel in semantic search, information retrieval, and summarization and classification tasks. The student team will also work to create a intuitive web application tailored for energy researchers. The application's integration with the LLM will empower researchers to access energy insights efficiently. The team will also provide in-depth user documentation resources to ensure strong tool utilization. This project result will be Open Source: This initiative will foster collaboration among universities, encouraging knowledge sharing, innovations, and advancements within academia.

Faculty Adviser

Rajesh Subramanyan, Affiliate Professor, Electrical & Computer Engineering

Students

(Gerald) Ichiro Nakata
Aaron Hong
Akash Shetty
Benjamin Jiang
Brian Han
Joni Nguyen
Najib Haidar
Whitney Waldinger