ChatGPT Shambles for Gary Marcus Prompt

Sat, Feb 8, 2025 Read as Markdown

Gary Marcus recently wrote an article titled ChatGPT in Shambles. The prompt instructed chatgpt to produce a tabular table of median house hold income across U.S. states.

Make a table of every state in U.S., including population, area, median house hold income, sorted in order of median household income.

chatgpt-original-output The output contained only twenty states and interrupted. The final row contained only name of the state.

ChatGPT - My Attempt

The same prompt returned all the states and income, when I tried and logged into the ChatGPT. I skipped verifying the data quality and checked only the structure.

chatgpt-1 chatgpt-2 chatgpt-3

My guess is fine-tuned(don’t think so in the short interval) or non-deterministic output based on logged in user vs anonymnous ask.

I tried the same prompt in other models

Claude

Produced well structured output with an extra summary and further asking for more task.

claude-output

Deepseek

deepseek-output Similar to Claude’s output Deepseek did produce all states including a summary.

Gemini 2.0 Flash

gemini-flash-pro-2.0-output By the far the Gemini output is well-structured with rank column, option to export the results to google sheets and summary at the end.

Le Chat

lechat Le Chat produced all the fifty states with the sources.

In the overall exercise it’s clear to see small variation across models and clearly other models produce better output compared to ChatGPT. It’s confusing to see different behaviour from ChatGPT.