Claude’s Latest Model is a Game-changer for Calculations and Code

Claude 3.7 Sonnet promises to write code and solve math better than any Anthropic AI model yet. In this post, we'll put it to the test to see how good it really is.

Written by
Matt Jasinski
and
Tom Nassr

March 3, 2025

Anthropic has unveiled Claude 3.7 Sonnet, its latest AI model, featuring a groundbreaking "hybrid reasoning" system. 

Unlike previous iterations that relied on separate models for different tasks, this new approach enables the model to seamlessly switch between rapid responses and deliberate, step-by-step reasoning. 

This dual capability makes Claude 3.7 not just faster but also more effective at handling complex, multi-step problems in math, physics, and coding. 

In this post, we’ll take a look at what makes 3.7 Sonnet different under the hood. Then, we’ll run some real-world tests to see how the Extended and Normal ‘Thinking Modes’ perform with a couple of math and coding challenges. 

What Makes Claude 3.7 Sonnet Different?

One of the standout features of this model is its Extended Thinking Mode, which allows for deeper, more structured reasoning when enabled. This mode is particularly useful for tackling complex mathematical proofs, intricate programming tasks, and high-stakes problem-solving. 

While 3.7 Sonnet is available to all users, the Extended thinking mode is currently only accessible for users with a pro account. 

Users can control how much processing time the model dedicates to internal deliberation by setting a "thinking budget", determining the number of tokens allocated for reflection before generating a response. 

According to benchmark tests, Claude 3.7 Sonnet significantly outperforms its predecessors in software development tasks. For example, it scored 62.3% accuracy on the SWE-bench Verified benchmark, which evaluates AI performance on real-world software issues. This makes it one of the most competent AI models for debugging and code generation today.

A New Era for AI-Powered Development

Beyond general improvements, Anthropic is also rolling out Claude Code, an experimental AI coding assistant designed to streamline programming tasks directly from the developer’s terminal. 

This marks a shift toward AI tools that actively assist in real-world coding environments rather than merely offering suggestions in chat-based interactions. Currently, Claude Code is in a limited research preview but is expected to become a valuable tool for software engineers and automation experts. 

Putting Claude 3.7 Sonnet to the test: Extended vs. Normal thinking modes

With these improvements in mind, the big question is: how well does Claude 3.7 Sonnet actually perform in real-world applications? In the next section of this post, we'll run practical tests to evaluate its ability to solve mathematical problems and write production-ready code efficiently. 

Simple math and logic

We’ll begin with a simple problem:

How many times does the letter P appear in the following sentence?

Alan picked three ripe apples. 

While it’s trivially easy for most people to count 4 of the letter P in that sentence, it’s the kind of basic question that frequently stumps generative AI models. 

3.7 Sonnet Normal Thinking Mode

And sure enough, 3.7 Sonnet failed to give a correct answer in our tests with the Normal thinking mode enabled, reporting only 3 P’s in the sentence. The double P in “apple” proved to be a point of confusion for the AI, as is often the case. 

3.7 sonnet struggles with math and logic on normal thinking mode

3.7 Sonnet Extended Thinking Mode

However, enabling the Extended thinking mode instantly provides the correct answer. 

3.7 Sonnet extended easily answers math and logic problems

As you can see in the screenshot above, the Extended thinking model makes sure to break down this simple problem into granular components. In this case, it analyzes each individual letter to assess whether or not it’s a P, and tallies up the result to give us an answer. 

Of course, running the same prompt several times will likely produce different results, but this early test is a clear indication that the Extended version of 3.7 Sonnet is more reliable when it comes to math and logic. 

Advanced math

Now, let’s see how both modes of the model handle a more complex problem. Here’s a question we’ve borrowed from Q-51’s GMAT practice set:

If x is a positive integer such that (x - 1)(x - 3)(x - 5)....(x - 93) < 0, how many values can x take?

Per Q-51, the correct answer is 23. Let’s see how both modes of 3.7 Sonnet handle this problem. 

3.7 Sonnet Normal Thinking Mode

When we ask 3.7 Sonnet to solve this problem using the Normal thinking mode, we can see it instantly try to break the problem down in detail to solve it. 

3.7 Sonnet breaks down its thinking in real time

However, its final answer is not correct. It offers up 48 as the solution instead of 23. 

3.7 Sonnet struggles with complex math problems

3.7 Sonnet Extended Thinking Mode

When we give the exact same prompt to 3.7 Sonnet Extended, we can see that it takes much more time to try and solve the problem. 

3.7 Sonnet Extended takes nearly 2 minutes to solve a complex math problem

It ultimately takes about 1 minute and 45 seconds to arrive at an answer, but this time it’s the correct result of 23. 

3.7 Sonnet extended solves advanced math problems

As we can see, ratcheting up the complexity causes 3.7 Sonnet Extended to take a much longer time in answering, but it’s worth it in the end as it’s able to provide an accurate solution. 

Coding challenge

Next, let’s see how the two modes of 3.7 Sonnet handle a coding problem. We’ll ask both to make a website with a scrolling text animation using the following prompt:

Create a website with HTML, CSS, and Javascript. The website is for a fictional widget company, Wetzel's Widgets. In the hero section of the site, include the text "Our Widgets can handle __". In the blank at the end of the sentence, animate a series of nouns like "Productivity", "Synergy", "Markets", etc. The animation should scroll through the list of nouns. As one noun scrolls up and disappears, the next scrolls up and takes its place.

3.7 Sonnet Normal Thinking Mode

Our first attempt with the normal thinking mode results in a rather well-designed web page, but no animation at all. It doesn’t even include any of the text that’s meant to be in the animated scroll. 

3.7 sonnet struggles to animate text with code on normal thinking mode

A second attempt with the Normal thinking mode performs a bit better, but the scrolling text is misaligned and floats down the page with each rotation.  

3.7 sonnet produces a flawed CSS animation on normal thinking mode

Unfortunately, neither of these attempts are going to be usable without significant edits. 

3.7 Sonnet Extended Thinking Mode

When given the same prompt, the Extended thinking mode delivers a much better result. The scrolling animation works exactly as described in the prompt. 

3.7 sonnet creates the requested animation on extended thinking mode

This is an excellent starting point for a website with the animation we described. 

Claude 3.7 Sonnet: Finally, a Reliable AI for Code and Math

After putting Claude 3.7 Sonnet Extended through its paces, the results are clear: this version finally delivers the reliability developers and automation experts have been waiting for.

In Extended Thinking Mode, Claude consistently produced working automation scripts, accurate formulas, and stable API configurations. Whether you need to transform data, set up tool integrations, or generate custom functions for your low-code platforms, Claude 3.7 Extended delivers code that actually works the first time—a major improvement over previous iterations.

The difference was especially stark when comparing normal thinking mode with extended mode. While the default setting sometimes struggled with multi-step logic, the extended version methodically worked through complex math problems and debugged its own mistakes, making it a much more practical tool for real-world applications.

The Bottom Line

Claude is finally evolving into the reliable AI partner we've wanted for coding and mathematical tasks, saving teams hours of debugging, troubleshooting, and rewriting. If you're building automated workflows, this model can drastically reduce frustration and wasted time.

Taking AI Automation to the Next Level

Of course, even with these advancements, having the right implementation partner makes all the difference. That’s where we come in.

At XRay.Tech, we specialize in turning AI and automation into real productivity gains for your team. Instead of wrestling with repetitive tasks or spending hours fixing broken scripts, we help you:

• Identify your biggest time-wasters
• Streamline clunky processes
• Build custom low-code tools that actually fit how your team works

The result? Your team gets back to what actually matters, while AI and automation handle the rest.

Want to see how much time AI-powered automation can save your business?Book a free 15-minute consultation with a solutions engineer today.

Similar Blog Posts

Not sure where to start with automation?

Hop on a 15-minute call with an XRay automation consultant to discuss your options and learn more about how we can help your team to get more done.

Schedule a 15-Minute Call