Hi. Sorry to tell you, your test is flawed. You are not using DeepSeek R1, you are using the much outdated old V3 model. They are both available on DeepSeek app but you need to switch on R1 model usage by clicking / toggling ON the blue "Thinking (R1)". The "Thinking" logo message will appear with the output when R1 model is used. The old V3 is good but R1 is so much better. Comparing with the old handicapped V3 is like using old early OpenAI models to compare against a new model. I am surprised eventhough the old V3 was used, still performs comparable. Can you update this title to show it's a comparison of V3 vs Gemini so it's clear and not misleading. And do a comparison with the R1 model? Possibly also, let's compare with OpenAI's most recent model release, along with Qwen 2.5's (new model, not the old model with the same name)