AI Coding Agents — Websites Tested

jason arbon
7 min read1 hour ago

--

Quality Grades of AI Coding Agent Websites

The hottest demos these days are AI Coding Agents — and the competition among them is fierce. These AI coding assistants don’t just help with a few lines of code here and there; they can now create a basic marketing webpage from just a prompt — even deploy and host it. And they’re getting better at generating more complex code and apps every day. But are they generating quality products? Most folks’ intuition is “not yet.” Let’s see what AI thinks about it.

One CEO recently claimed their coding agents write the majority of the code in their company. We should be able to assume these AI agents are building their own marketing sites, right? If their AI agents didn’t build their marketing pages — well, that’s even stranger and might imply the agents aren’t even quite ready for basic websites yet. We’ll give them the benefit of the doubt.

So, how good are these AI Coding Agents’ marketing pages? There’s probably no nerdier way to find out than to have AI Testing Agents check them out. The Checkie.AI testing agents analyzed the marketing pages and found only one two of them earned a quality grade of A — congratulations to Cursor, and GitHub Copilot! Here’s a screenshot of the Checkie.AI quality dashboard for Cursor’s marketing page. We can see that the Cursor site has great performance — even the slowest page is loading around 100ms. The worst bug discovered was just a missing alt text, and while the simulated user feedback looks pretty positive overall, there are some accessibility issues that need addressing.

How does Cursor stack up against other apps on quality? Let’s break it down. Cursor significantly outperforms its competitors when it comes to content quality and emotional connection with users. Their marketing pages do an exceptional job conveying their value proposition and building trust with visitors through clear, engaging content.

However, the picture isn’t uniformly rosy. While Cursor excels in many areas, it lags somewhat behind the average when it comes to accessibility features. This highlights an important reality in the current state of AI-generated websites: even the best performers still have significant room for improvement.

It’s worth emphasizing that we’re still far from perfection. Even Cursor’s impressive A-grade performance shows there’s substantial room for growth. This gap between current capabilities and ideal implementation suggests that while AI coding agents have made remarkable progress, they’re still maturing technologies rather than fully polished solutions.

Let’s examine Qodo, the AI Coding agent that specifically markets itself as focused on delivering quality software. Ironically, the Testing Agents uncovered several issues with their marketing site, particularly in Accessibility and Content areas.

One fascinating discovery was a broken countdown timer for an upcoming webinar. This finding actually showcases one of the most powerful aspects of AI-driven testing: its ability to detect issues through intuitive observation rather than predefined test cases. The AI testing bots identified that something was “off” about the timer, even though there weren’t any obvious errors on the page and it was detected without the need for a hand-crafted traditional test case. The timer rendered without error — but the AI recognized it didn’t “look right” — much like how a human tester might notice something feels wrong before being able to pinpoint the exact issue.

This type of intuitive testing demonstrates a significant advantage of AI testing tools. They can identify potential problems that might slip through conventional testing approaches, which typically rely on specific, predefined test cases — and only cover those checks. Traditional automated tests would have likely missed this issue entirely since it required a more nuanced understanding of how the component should appear and behave.

Qodo: Broken Countdown Timer on main page

Qodo’s recent rebranding effort revealed another interesting quality control issue: inconsistent brand capitalization across their site, appearing variously as ‘Qodo’, ‘qodo’, and ‘QODO’. While not a critical error, this type of inconsistency is exactly the kind of detail that should be addressed during a rebranding transition — and continuously monitored going forward. It’s particularly ironic for a tool that positions itself as quality-focused to have these basic brand consistency issues on its website.

Even more concerning is what the AI testing agents discovered on v0’s marketing site: a scripting error that completely broke the page rendering. This is more than just a minor oversight — it’s a fundamental functionality issue that undermines v0’s credibility to generate quality code. While AI coding agents are impressive in demos with their speed and capabilities, this kind of basic failure highlights a crucial point: rapid development needs to be balanced with proper testing and quality assurance. And in the world of AI Generated Code — only AI Testing agents can keep up.

The v0 situation is particularly telling. For a tool that’s meant to help developers write better code, having a broken marketing page due to a script error doesn’t inspire confidence in their product. It raises a valid question: if they can’t ensure their own marketing site works properly, how can users trust their tool to generate reliable code for production applications?

v0 Generated — an exception!

The AI testing agents’ simulated user feedback on Devin’s website reveals an interesting disconnect between their intended edgy, distinctive brand and actual user reception. By generating diverse virtual users with different perspectives and expectations, the testing agents uncovered that Devin’s attempts at being different and edgy might actually be missing the mark with their target audience.

This simulated feedback provides valuable insights that raw technical metrics alone couldn’t capture. While Devin clearly aims to stand out from the crowd with their unconventional approach, the testing suggests that what they view as edgy and innovative, users might perceive as confusing or off-putting. This highlights a crucial lesson in web design and branding: being different for difference’s sake doesn’t always resonate with users, even in the cutting-edge AI development space.

The gap between Devin’s creative vision and user reception raises an important question about balancing distinctive branding with user expectations and usefulness. Even in a field as innovative as AI development tools, users still value clarity and familiarity in their interactions with a product’s website.

The Checkie.AI Agents’ comprehensive scan across AI Code Generation Tool websites reveals some interesting patterns in the types of issues that consistently appear. The most common problems cluster around responsive design, usability, networking, functionality, and content issues. This pattern suggests specific areas teams should focus on when leveraging AI-generated code.

What’s particularly intriguing is what’s not showing up as frequent problems: Authentication, GDPR compliance, and Privacy-related issues are notably absent or minimal across these sites. This could be an early indicator of AI’s capabilities in generating code — perhaps suggesting that AI models are actually quite good at implementing security and privacy-related patterns correctly, likely due to their training on modern best practices and standardized security implementations.

For teams adopting AI-generated code, this analysis points to clear testing priorities:
1. Pay extra attention to responsive design testing across different devices and screen sizes
2. Thoroughly test functionality and network interactions
3. Carefully review usability and content
4. While still important, security and privacy features might require less intensive testing focus

This data provides valuable insights into the current state of AI code generation — showing both where the technology excels and where it requires careful inspection, review and testing.

The analysis is clear: AI Code Generation Agents, while powerful and promising, aren’t yet producing perfect code — and they definitely benefit from the oversight of emerging AI Testing Agents. The issues uncovered across these platforms’ own marketing sites serve as a real-world case study for both the potential and current limitations of AI-generated code.

These findings demonstrate that even as AI coding tools become more sophisticated, quality assurance remains crucial. AI Testing Agents can play a vital role here, offering comprehensive analysis that goes beyond traditional testing approaches — from detecting subtle UI inconsistencies to evaluating user experience through simulated interactions.

For anyone using AI code generation tools, or teams simply wanting to ensure their applications meet high-quality standards, thorough testing is essential. AI Testing Agents can provide detailed quality analysis and help catch issues before they impact your users.

Signup at Checkie.AI if you’d like the AI Testing Agents to start testing your webite, or your customers websites.

Jason Arbon

Founder, Checkie.AI

--

--

jason arbon
jason arbon

Written by jason arbon

blending humans and machines. co-founder @testdotai eater of #tunamelts

No responses yet