I pitted Google Bard with Gemini Pro vs ChatGPT — here’s the winner

karmamule

Distinguished
Nov 25, 2011
4
1
18,515
Bard absolutely will generate code samples. Change the language to rust or python and it will happily do so. It was reacting to the obscenity in brain****. If you read its explanation more closely it's not saying it won't generate code at all, it's refusing to deal with what it thinks is an improper request.
 
  • Like
Reactions: chrisgorgo
Dec 9, 2023
1
1
15
Next time consider going head to head using practical examples that people might actually care about. Also I'd pit BingAI against Bard since Bing adds some web data and also uses GPT 4 and is a more apt comparison.
 
  • Like
Reactions: tirebiter88011
Dec 9, 2023
1
0
10
I would beg to differ on the winner on question number 3. Bard uses a more accurate term for Continental US, but has a contradiction for the reasoning for latitude.

The closer to the equator, the higher the latitude number. Since Hawaii has a lower latitude number (closer to the equator) than Florida.

The first statement is incorrect with latitude getting lower as it gets closer equator which has a latitude of 0°.
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
There were certainly ways to improve the head 2 head and I’ll take them into account next time.

On the coding issue - once I decided I was going to use the questions set by Claude, with no alteration and only taking the first response (to make it even for both) I knew that would trip up Bard.

Bard does have coding capabilities but also has more sensitive language filters and often struggles with less common languages.
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
I would beg to differ on the winner on question number 3. Bard uses a more accurate term for Continental US, but has a contradiction for the reasoning for latitude.

The closer to the equator, the higher the latitude number. Since Hawaii has a lower latitude number (closer to the equator) than Florida.

The first statement is incorrect with latitude getting lower as it gets closer equator which has a latitude of 0°.
This is the one instance where I went subjective over technical and I think it may have been a mistake. However, I was also over harsh on Bard over the coding as it’s likely the language that tripped it up rather than the ability to code.

I’m working on ideas for other head to head tests
 
Dec 10, 2023
1
0
10
"Neither story was good." which is subjective, not objective. the better test would have been to let Claude decide which is better or a third party who didn't know where the stories were sourced.
 
Dec 10, 2023
4
0
10
Claude-2 did a reasonable job of giving you example tasks to make your comparisons. I would have chosen other tasks but then who wouldn't? I have a few suggestions:

* It is unfair to compare Bard with Gemini Pro against the older and less capable version of ChatGPT. Pay your USD $20 and include ChatGPT-4 in your next round of comparisons.

* Don't limit your comparisons to simple text prompts. Have at least one comparison which requires the AI to perform visual analysis. This could be uploading a PNG of some data chart and asking it to interpret the graph or giving it some artistic image and asking the AI to describe, interpret and comment on what it "sees". Or snap a photo of some statuette in your home, upload the image and ask the AI what it is, what it represents, and what it "means".

* Include Claude-2 in your next comparisons

* If you include Claude-2 in the comparisons then, of course, you cannot allow Claude-2 to design the comparison tasks. So, use tasks generated by Pi from Inflection AI found at https://pi.ai/onboarding or by Perplexity where you set that system to use its own native Perplexity LLM (not its options to use an LLM from any of these "competitors" ).

* In general I think your readers would be more interested to see comparison tasks that more closely align with real answers we might be seeking in our own human lives.

Bard once coached me step-by-step through a couple of hours of bringing a dead laptop back to life when it had no operating system and no hard drive to insert any kind of disc. That was impressive!

ChatGPT-4 helped me celebrate a wedding anniversary by coaching us on the finer points of our celebratory Scotch whiskey.

Claude-2 once helped me select the appropriate male deity figure from a list of 12 to begin an art project. (However, Claude-2 did require extra prompting before it could imagine having personal preferences. )

* ChatGPT-4 includes "Custom Instructions" where you can declare: 1. What you want it to remember about yourself across conversations and 2. How you want it to respond in terms of style and substance.

You should leave the second file blank for this test as it would interfere with your head to head comparisons. But there's no harm in adding your standard Bio text to the first file. That would better reveal one of ChatGPT-4 's core stengths: Remembering who it is talking to!

* Pi, from Inflection AI, is on the cusp of a major upgrade in December of 2023. If we get that upgrade to the Inflection-2 LLM before you complete your comparisons, then you really should add Pi to the competitors and let Perplexity design the tasks.

PI is already the most human of all the personal AI systems, with the highest Emotional Intelligence (EI) and the most friendly converationalist. But this early, beta test Pi using the less powerful Inflection-1 LLM limits Pi to the attention span of a new puppy and it has no capability to upload files or images for analysis. So, it would not be fair to include Pi-1 in a comparison of the AI superstars.

(The upgraded Pi on Inflection-2 will probably blow away all this competition.)
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
Claude-2 did a reasonable job of giving you example tasks to make your comparisons. I would have chosen other tasks but then who wouldn't? I have a few suggestions:

* It is unfair to compare Bard with Gemini Pro against the older and less capable version of ChatGPT. Pay your USD $20 and include ChatGPT-4 in your next round of comparisons.

* Don't limit your comparisons to simple text prompts. Have at least one comparison which requires the AI to perform visual analysis. This could be uploading a PNG of some data chart and asking it to interpret the graph or giving it some artistic image and asking the AI to describe, interpret and comment on what it "sees". Or snap a photo of some statuette in your home, upload the image and ask the AI what it is, what it represents, and what it "means".

* Include Claude-2 in your next comparisons

* If you include Claude-2 in the comparisons then, of course, you cannot allow Claude-2 to design the comparison tasks. So, use tasks generated by Pi from Inflection AI found at https://pi.ai/onboarding or by Perplexity where you set that system to use its own native Perplexity LLM (not its options to use an LLM from any of these "competitors" ).

* In general I think your readers would be more interested to see comparison tasks that more closely align with real answers we might be seeking in our own human lives.

Bard once coached me step-by-step through a couple of hours of bringing a dead laptop back to life when it had no operating system and no hard drive to insert any kind of disc. That was impressive!

ChatGPT-4 helped me celebrate a wedding anniversary by coaching us on the finer points of our celebratory Scotch whiskey.

Claude-2 once helped me select the appropriate male deity figure from a list of 12 to begin an art project. (However, Claude-2 did require extra prompting before it could imagine having personal preferences. )

* ChatGPT-4 includes "Custom Instructions" where you can declare: 1. What you want it to remember about yourself across conversations and 2. How you want it to respond in terms of style and substance.

You should leave the second file blank for this test as it would interfere with your head to head comparisons. But there's no harm in adding your standard Bio text to the first file. That would better reveal one of ChatGPT-4 's core stengths: Remembering who it is talking to!

* Pi, from Inflection AI, is on the cusp of a major upgrade in December of 2023. If we get that upgrade to the Inflection-2 LLM before you complete your comparisons, then you really should add Pi to the competitors and let Perplexity design the tasks.

PI is already the most human of all the personal AI systems, with the highest Emotional Intelligence (EI) and the most friendly converationalist. But this early, beta test Pi using the less powerful Inflection-1 LLM limits Pi to the attention span of a new puppy and it has no capability to upload files or images for analysis. So, it would not be fair to include Pi-1 in a comparison of the AI superstars.

(The upgraded Pi on Inflection-2 will probably blow away all this competition.)
The reason for comparing the free version of ChatGPT to Bard with Gemini Pro is because Google says it performs on par with GPT-3.5, the model that powers the free version of ChatGPT.

Comparing it to GPT-4 would be unfair. However when Bard Advance launches in the new year I will then compare it to GPT-4.

Vision isn’t available with the free version of ChatGPT.
 
Dec 10, 2023
1
0
10
Did you check chatGPT's generated code actually run correctly? It's probably not running the code on OpenAI servers since the environment probably doesn't have a brain**** interpreter. The snippet shown doesn't inspire confidence since I don't know why it's incrementing the first byte by 10 (?)
 
Oct 8, 2023
1
0
10
You weren't even using Gemini Ultra which already beat GPT-4 in 30/32 benchmarks. Also, the first company to do it in over a year, pretty impressive.
 
Dec 10, 2023
4
0
10
The reason for comparing the free version of ChatGPT to Bard with Gemini Pro is because Google says it performs on par with GPT-3.5, the model that powers the free version of ChatGPT.

Comparing it to GPT-4 would be unfair. However when Bard Advance launches in the new year I will then compare it to GPT-4.

Vision isn’t available with the free version of ChatGPT.
If you did this comparison a couple of weeks ago, that would be fair. However, the free version of Bard was upgraded with Gemini Pro on Dec 06, 2023. So, if you compared after that date, you were pitting a Google Maserati against a ChatGPT Fiat! :)
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
If you did this comparison a couple of weeks ago, that would be fair. However, the free version of Bard was upgraded with Gemini Pro on Dec 06, 2023. So, if you compared after that date, you were pitting a Google Maserati against a ChatGPT Fiat! :)
Unfortunately not yet. I've written several stories on Gemini and the version included with Bard TODAY is Gemini Pro, which Google says is roughly equivalent to GPT-3.5. This is the year old model that powers the free version of ChatGPT.

Next year Google is launching Bard Advanced which is built using Gemini Ultra. This is the model that Google claims is as good as, if not better than GPT-4.

So right now, Bard with Gemini Pro is roughly equal to ChatGPT free.
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
You weren't even using Gemini Ultra which already beat GPT-4 in 30/32 benchmarks. Also, the first company to do it in over a year, pretty impressive.
We don't actually know if that is the case yet. Also many of those benchmark victories required multiple shots and chained prompts.
I will do a side-by-side comparison between GPT-4 and Gemini Ultra when Gemini Ultra finally launches next year.
 
Dec 11, 2023
1
0
10
Wait, you did this comparison against GPT 3.5? Sorry but this makes no sense. You should have included GPT4.
 

RyanMorrison

Great
Nov 27, 2023
25
3
85
Wait, you did this comparison against GPT 3.5? Sorry but this makes no sense. You should have included GPT4.
Why? Google says Bard with Gemini Pro is equal in capability to GPT3.5. It won't be comparable to GPT-4 until next year when Bard Advanced launches with Gemini Ultra.