Welcome to My Project Page
Hello there! I'm excited to share a personal project I've been pouring my time into—an AI model that tackles advanced math competition questions from the AI Mathematical Olympiad (AIMO) Progress Prize 2. I have a strong interest in bridging top-tier language models with real-world challenges, and math competitions provide an excellent testing ground.
Why I'm Doing This
- Practical Experience Through Competitions: Kaggle gives me a chance to test my ideas about AI and math reasoning in a direct, real-world setting. Submitting solutions and reviewing the results isn't just about the leaderboard—every submission helps me understand what's working and what isn't.
- Aiming for 47/50 Accuracy: There are about 50 difficult math problems in this competition, and aiming for at least 47 correct answers is a formidable target. My plan involves using recent AI innovations—particularly models built for reasoning—to handle multi-step math questions in a logical, organized way.
- Learning Through AI Papers and Reimplementation: I believe that reading AI research papers, and even trying to reimplement or adapt some of their methods, is incredibly valuable. It's one of the best ways to see how top researchers approach the same types of problems. By applying what I learn, I can keep refining my own workflow, stay current with cutting-edge methods, and pick up insights that might give me an edge in the competition.
What I've Tried So Far
- Starting with an Agentic Setup: I initially used Agent Zero to give my model access to tools like code execution. The idea was to let the AI "decide" if it needed to run code or look something up. In practice, math-focused LLMs and agent instructions didn't mesh well at first, so I had to rethink how they should interact.
- Collecting and Cleaning Data:
- Began with about 600k questions from NuminaMath CoT and NuminaMath TIR. Although this was a strong start, I realized it only covered certain difficulty ranges.
- Added harder items from Omni Math (set at a specific difficulty filter) and more varied questions from the AoPS dataset.
- Generating Synthetic Solutions: I tested different open-source math models like Macro-o1 and QwQ-32B Preview. Each was asked to solve the same questions multiple times, often with partial hints or final answers provided to spark alternative approaches.
Fine-Tuning My Model
- Step-by-Step Reasoning: I'm preparing to train a model that lays out its thought process instead of jumping straight to the answer. This approach should make debugging easier, since I can trace where the logic might slip up.
- Tool Integration: Although I've moved away from a fully agentic setup for now, I'm keeping track of how a model might benefit from code execution or symbolic math libraries. If it proves useful down the line, I can revisit and incorporate these tools more formally.
- Quality Checks: I'm gradually expanding the dataset with additional questions and tidying up any questionable labels. Over time, these periodic updates will help me maintain better consistency and accuracy as I refine (and eventually fine-tune) the model.
Where Things Stand
- Data is Ready to Go: The combined sets from Numina, OmniMath, and AoPS are all merged and filtered. I've also added synthetic solutions for many of these items.
- Model Training Ongoing: I'm still training and testing the fine-tuned math model. Initial runs look good, but I'm aiming for more consistent accuracy before making a formal competition submission.
Future Plans
- Incorporating Extra Tools: I'd like to include a system that automatically checks partial steps, possibly by running code or using symbolic math libraries.
- Smarter Validation: Creating a small test set to quickly confirm a new idea works, so I don't waste competition submissions on unproven experiments.
Relevant Links
I've always believed math provides a perfect test for an AI's reasoning skills. This project is my way of pushing the limits of that idea and contributing to the open-source community.
Thank you for visiting my page and for your interest in this project!
For more information about this project, including code samples and detailed implementation: