This Puzzle Shows Just How Far LLMs Have Progressed in a Little Over a Year

Contents

The puzzle Interaction with Sonnet 4.5 The final answer Summary

that the capabilities of LLMs have progressed dramatically in the last few years, but it’s hard to quantify just how good they’ve become.

That got me thinking back to a geometric problem I came across on a YouTube channel last year. This was in June 2024, and I tried to get the leading large language model at the time (GPT-4o) to solve the puzzle. It didn’t go that well and required a lot of effort to find a solution, and I wondered how the latest LLMs would fare with the same puzzle.

The puzzle

Here’s a quick reminder of what I was asking the LLM to solve back then. Assume we have the following grid of dots/nodes. In the x and y plane, each node is exactly one unit away from its adjacent neighbour. It looks like this,

Now, the question I wanted to answer was this,

How many distinct squares can be drawn on this diagram?

It quickly became clear that GPT-4o did not know the answer, so I changed tack slightly and instead asked it this.

I would like a Python program that plots out all the squares we can 
draw on the attached diagram, assuming that the corners of any square 
must lie on one of the spots on the diagram. Assume each adjacent spot is 
1 unit apart in both the x and y directions. Also print out a summary of 
the number of squares of the same size and what their side lengths are

Long story short, I eventually got GPT-4o to come up with a correct Python-based solution. Still, it took me around two hours and well over 40 iterations of me going back and forth with the model to refine its answer until it came up with something that worked.

NB, have you worked out the answer yet? Even now, I still can’t quite believe that it’s 21 squares you can draw on this graph.

Interaction with Sonnet 4.5

This brings us back to the present day and Anthropic’s recent release of what is considered the best coding model available today, namely Claude Sonnet 4.5. It’s been around 16 months since I last looked at this, so just how much have LLMs improved in the intervening period? Let’s see how Sonnet 4.5 copes with the puzzle.

To use Sonnet 4.5, I visited Anthropic’s homepage.

https://www.anthropic.com

You should see a page like this.

Click the large black “Try Claude” button located near the top right of the screen. You will be asked to log in at this point. If you don’t already have an account with Anthropic, please set one up now. It’s completely free to do so.

Once you are logged in, you will be taken to a screen displaying a familiar chat-based window that allows you to interact with and ask questions of Sonnet 4.5.

Before I asked the model to provide a Python program to solve the puzzle, I asked it a straight question to see if it already knew the answer.

It came back with,

Looking at this grid, I need to count all possible squares of different 
sizes that can be formed by connecting the dots.
Let me first identify the grid structure. The dots form:

2 dots in the top row
2 dots in the second row
6 dots in the third row
6 dots in the fourth row
2 dots in the fifth row
2 dots in the sixth row

This appears to be a cross or plus-sign shaped grid pattern.
Now I'll count squares by size:
1×1 squares: These are formed by any 4 dots that make a unit square.

In the horizontal middle section (rows 3-4): 5 squares per row × 1 = 5 squares
In the vertical middle section (columns where the cross intersects): counting 
the vertical segments gives us 5 more squares
Total: 10 small squares

2×2 squares: These need 4×4 arrangements of dots.

In the horizontal section: 4 squares
In the vertical section: 4 squares
Overlap in center: -1 (counted twice)
Total: 7 squares of size 2×2

3×3 squares and larger: The grid pattern is only 6 dots wide at its maximum, but the cross shape limits where larger squares can fit. Checking the geometry:

One 3×3 square can fit in the center where the arms of the cross overlap
Total: 1 square of size 3×3

Total: 10 + 7 + 1 = 18 squares

So, confidently incorrect. Curious and intrigued about the model’s reference to a 3×3 square (which is impossible on this grid), I asked it to show me the square on the diagram. It replied with this.

Looking at the grid more carefully, I need to reconsider whether a 3×3 square 
actually exists.
For a 3×3 square to exist, I would need a 4×4 arrangement of dots 
(since a 3×3 square requires 4 dots on each side). Let me check the grid 
structure:
The grid appears to be:

Rows 1-2: only 2 dots each (centered)
Rows 3-4: 6 dots each (full width)
Rows 5-6: only 2 dots each (centered)

For a 3×3 square, I would need 4 consecutive rows and 4 consecutive columns all with dots at the intersection points. However, looking at the cross/plus pattern:

The top only has 2 columns of dots
The middle has 6 columns of dots
The bottom only has 2 columns of dots

Since rows 1-2 and rows 5-6 only have 2 dots each (not spanning the full width)
, there isn't actually a 4×4 block of dots anywhere on this grid.
I need to correct my answer: There is NO 3×3 square possible on this grid.
The total should be: 10 (1×1 squares) + 7 (2×2 squares) = 17 squares
I apologize for the error in my initial count. The cross-shaped pattern 
prevents any square larger than 2×2 from being formed.

Alright, it semi-redeemed itself with that answer, but fundamentally, it didn’t correctly say there were 21 squares in total.

The final answer

To keep things fair with the model comparison over the 16 months, I asked Sonnet 4.5 the exact same question I had asked GPT-4o, which, as a reminder, was this.

I would like a Python program that plots out all the squares we can draw on 
the attached diagram, assuming that the corners of any square must lie on one 
of the spots on the diagram. Assume each adjacent spot is 1 unit apart in 
both the x and y directions. Also print out a summary of the number of 
squares of the same size and what their side lengths are

Sonnet 4.5 thought for all of 5 seconds before generating its Python code.

import matplotlib.pyplot as plt
import numpy as np
from itertools import combinations
from collections import defaultdict

# Define the grid points based on the image
points = [
    # Top two rows (2 dots each, centered)
    (3, 6), (4, 6),
    (3, 5), (4, 5),
    # Middle two rows (6 dots each, full width)
    (1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4),
    (1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3),
    # Bottom two rows (2 dots each, centered)
    (3, 2), (4, 2),
    (3, 1), (4, 1),
]

def is_square(p1, p2, p3, p4):
    """Check if four points form a square and return side length if true."""
    points = [p1, p2, p3, p4]
    
    # Calculate all pairwise distances
    distances = []
    for i in range(len(points)):
        for j in range(i + 1, len(points)):
            dist = np.sqrt((points[i][0] - points[j][0])**2 + 
                          (points[i][1] - points[j][1])**2)
            distances.append(dist)
    
    distances.sort()
    
    # For a square: 4 equal sides and 2 equal diagonals
    # Diagonals should be sqrt(2) times the side length
    if len(distances) == 6:
        sides = distances[:4]
        diagonals = distances[4:]
        
        # Check if all sides are equal
        if np.allclose(sides, [sides[0]] * 4, atol=1e-6):
            # Check if diagonals are equal
            if np.allclose(diagonals, [diagonals[0]] * 2, atol=1e-6):
                # Check if diagonal = side * sqrt(2)
                if np.allclose(diagonals[0], sides[0] * np.sqrt(2), atol=1e-6):
                    return True, sides[0]
    
    return False, 0

def order_square_points(p1, p2, p3, p4):
    """Order points of a square in counterclockwise order starting from bottom-left."""
    points = [p1, p2, p3, p4]
    # Find centroid
    cx = sum(p[0] for p in points) / 4
    cy = sum(p[1] for p in points) / 4
    
    # Sort by angle from centroid
    def angle(p):
        return np.arctan2(p[1] - cy, p[0] - cx)
    
    points.sort(key=angle)
    return points

# Find all squares
squares = []
squares_by_size = defaultdict(list)

for combo in combinations(points, 4):
    is_sq, side_length = is_square(*combo)
    if is_sq:
        ordered = order_square_points(*combo)
        squares.append(ordered)
        squares_by_size[round(side_length, 2)].append(ordered)

# Print summary
print("=" * 50)
print("SUMMARY OF SQUARES FOUND")
print("=" * 50)
total_squares = 0
for side_length in sorted(squares_by_size.keys()):
    count = len(squares_by_size[side_length])
    total_squares += count
    print(f"Side length {side_length}: {count} square(s)")

print(f"\nTotal squares: {total_squares}")
print("=" * 50)

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes = axes.flatten()

# Plot 1: All dots
ax = axes[0]
x_coords = [p[0] for p in points]
y_coords = [p[1] for p in points]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title('Original Grid', fontsize=14, fontweight='bold')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 2: All squares overlaid
ax = axes[1]
ax.scatter(x_coords, y_coords, c='black', s=100, zorder=3)
colors = plt.cm.rainbow(np.linspace(0, 1, len(squares)))
for square, color in zip(squares, colors):
    square_closed = square + [square[0]]
    xs = [p[0] for p in square_closed]
    ys = [p[1] for p in square_closed]
    ax.plot(xs, ys, c=color, linewidth=2, alpha=0.6)
ax.set_xlim(0, 7)
ax.set_ylim(0, 7)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(f'All {total_squares} Squares Overlaid', fontsize=14, fontweight='bold')
ax.set_xlabel('X')
ax.set_ylabel('Y')

# Plot 3 & 4: Squares by size
plot_idx = 2
for side_length in sorted(squares_by_size.keys()):
    if plot_idx >= len(axes):
        break
    
    ax = axes[plot_idx]
    ax.scatter(x_coords, y_coords, c='lightgray', s=100, zorder=2)
    
    squares_of_size = squares_by_size[side_length]
    colors = plt.cm.viridis(np.linspace(0, 1, len(squares_of_size)))
    
    for square, color in zip(squares_of_size, colors):
        square_closed = square + [square[0]]
        xs = [p[0] for p in square_closed]
        ys = [p[1] for p in square_closed]
        ax.plot(xs, ys, c=color, linewidth=2.5, marker='o', markersize=8)
    
    ax.set_xlim(0, 7)
    ax.set_ylim(0, 7)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    ax.set_title(f'Squares with side length {side_length} ({len(squares_of_size)} total)', 
                 fontsize=14, fontweight='bold')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    plot_idx += 1

plt.tight_layout()
plt.show()

At first sight, I thought there was no way Sonnet could have come up with a correct solution so quickly. GPT-4o had taken hours. However, I installed all the required libraries the code needed and copied and pasted it into a Jupyter Notebook cell, running it and obtaining the following output. I have omitted the graphs for now, which were also output.

==================================================
SUMMARY OF SQUARES FOUND
==================================================
Side length 1.0: 9 square(s)
Side length 1.41: 4 square(s)
Side length 2.24: 2 square(s)
Side length 2.83: 4 square(s)
Side length 3.61: 2 square(s)

Total squares: 21
==================================================

#
# Plus some graphs that I'm not showing here
#

That shocked me. The answer was absolutely spot on.

The only slight thing the model didn’t quite get right was that it didn’t output a plot of each set of differently sized squares. It just did the nine 1x1s and the four √2x√2 ones. I solved that by asking Sonnet to include those, too.

Can you print the graphs in square side order. Also can you have two graphs  
side by side on each "line"

This is what it produced.

Beautiful.

Summary

To demonstrate just how dramatically LLMs have advanced in about a year, I decided to revisit a challenging geometric puzzle I first tried to solve with GPT-4o back in June 2024. The puzzle was to write a Python program that finds and plots all possible squares on a specific cross-shaped grid of dots.

My experience a little over a year ago was a struggle; it took me roughly two hours and over 40 prompts to guide GPT-4o to a correct Python solution.

Fast forward to today, and I tested the new Claude Sonnet 4.5. When I first asked the model the question directly, it failed to calculate the correct number of squares. Not a great start, however, the real test was giving it the exact same prompt I used on GPT-4o.

To my surprise, it produced a complete, correct Python solution in one shot. The code it generated not only found all 21 squares but also correctly categorised them by their unique side lengths and generated detailed plots to visualise them. While I needed one quick follow-up prompt to perfect the plots, the core problem was solved instantly.

Could it be that the very act of my trying to solve this puzzle last year and publishing my findings introduced it to the web-o-sphere, meaning Anthropic have simply crawled it and incorporated it into their model knowledge base? Yes, I suppose that could be it, but then why couldn’t the model answer the first direct question I asked it about the total number of squares correctly?

To me, this experiment starkly illustrates the incredible leap in LLM capability. What was once a two-hour iterative struggle with the leading model of its time 16 months ago is now a five-second, one-shot success with the leading model today.