Study shows AI struggles with NYT Connections game despite advanced capabilities


Judith Brown Clarke Vice President for Equity and Inclusion Chief Diversity Officer | Stony Brook University

A recent study led by Tuhin Chakrabarty, an assistant professor at Stony Brook's Department of Computer Science, in collaboration with researchers from Columbia University, has revealed insights into the capabilities of AI models when faced with abstract reasoning challenges. The research focused on the New York Times word game 'Connections,' which presents a unique benchmark for testing Large Language Models (LLMs).

Despite the prowess of AI and machine learning in defeating top chess players, the study found that even the most advanced LLM, Claude 3.5 Sonnet, could fully solve only 18% of 'Connections' games. This was based on an analysis of over 400 games where both novice and expert human players outperformed AI.

In 'Connections,' players must organize a 4x4 grid of 16 words into four groups based on shared characteristics. For instance, words like 'Followers,' 'Sheep,' 'Puppets,' and 'Lemmings' can be grouped as 'Conformists.' Success in this task requires reasoning across various knowledge forms, including semantic and encyclopedic understanding.

Chakrabarty explained, "While the task might seem easy to some, many of these words can be easily grouped into several other categories." He noted how potential groupings serve as red herrings designed to add complexity to the game.

The research highlighted that LLMs show relative strength in tasks involving semantic relations but struggle with more complex knowledge types such as multiword expressions and understanding combined word form and meaning. Five different LLMs were tested: Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, OpenAI's GPT4 Omni, Meta's Llama 3.1 405B, and Mistral Large 2 (Mistral-AI, 2024). The results indicated that while these models could partially solve some puzzles, their overall performance was lacking compared to humans.

For further details on this study, readers are directed to visit the AI Innovation Institute website.

Organizations Included in this History


Daily Feed

Local

Community Library Budget Approved; Kimmerling and Maiorana Returned as Trustees

Voters in the Mastics-Moriches-Shirley Community Library District overwhelmingly approved the library’s 2025-2026 operating budget on Tuesday, April 22, by a vote of 400 to 249 — a decisive 62 percent margin of support.


Local

Suffolk Sheriff’s Office Arrests Fugitive Wanted For Child Exploitation Charges In Kansas

Suffolk County Sheriff’s Office Warrant Bureau Deputy Sheriff Investigators, in conjunction with US Marshal Regional Task Force arrested 32-year-old Peter Czech, a fugitive from justice from the State of Kansas.


Local

Stump Pond Update: Romaine, Leg. Kennedy Keen on Blydenburgh Dam Revival

"I am sure that we will one day welcome Stump Pond back to our community," said Suffolk County Executive Ed Romaine.