I put OpenAI's new o3-mini model to the test — and the results are staggering