Tech Giants in Silicon Valley Build Amazon

Tech Giants in Silicon Valley Build Amazon and Gmail-Style Platforms for AI Training

Silicon Valley’s New Obsession: Training A.I. on “Fake” Versions of the Internet

Several young tech companies are quietly building near-perfect imitations of some of the world’s most recognizable websites — not to scam users, but to teach artificial intelligence how to navigate the web and eventually take over tasks now handled by white-collar workers.

One of those replicas recently caught the attention of United Airlines’ legal team.

Earlier this summer, the airline discovered a website that looked almost indistinguishable from its own. The copycat version mirrored United’s booking flows for flights, hotels and rental cars. It offered the same kinds of links for frequent-flier accounts and special offers. It even used the United name and logo.

United’s lawyers responded with a formal takedown notice, claiming copyright infringement.

The site’s creator, entrepreneur Div Garg, quickly rebranded the page as “Fly Unified” and stripped out the United branding. He said he wasn’t trying to pass himself off as the airline. His goal was something very different: building a realistic training environment for A.I.

Garg’s start-up, AGI, is one of a cluster of Silicon Valley companies that have spent the past year cloning well-known websites so that A.I. systems can practice using them. The idea is that if an A.I. model learns how to complete tasks on a convincing copy of United.com, it should be able to perform the same steps on the real thing — booking a flight, changing an itinerary or browsing loyalty perks.

These so-called “shadow sites” are a key ingredient in the industry’s push to turn today’s chatbots into full-fledged A.I. agents: systems that don’t just answer questions, but can carry out tasks like scheduling, shopping, number crunching and data entry with minimal human oversight. Many tech executives believe that, as these agents improve, they could begin to displace portions of white-collar jobs.

“We want to create training environments that encompass entire jobs people do,” said Robert Farlow, the founder of another start-up, Plato, which also builds replicas of popular web apps and business software.

Tech Giants in Silicon Valley Build Amazon

From Scraping the Web to Rebuilding It

The rise of these replica sites reflects just how far tech companies are willing to go in search of more data to feed A.I. systems.

In the first wave of generative A.I., companies vacuumed up text, audio and images from across the open internet. When publishers and platforms started blocking automated scraping, the industry began hunting for fresh sources of material. Now, some start-ups are taking a different approach: instead of grabbing data from existing sites, they’re recreating the sites themselves and generating new interaction data from scratch.

Backed by about $10 million from Menlo Ventures and other investors, Garg’s company has spun up clones of Amazon, Airbnb and Gmail, among others. The look-alikes sport playful names like Omnizon (for Amazon), Staynb (for Airbnb) and Go Mail (for Gmail).

These duplicate interfaces serve as playgrounds for A.I. systems using reinforcement learning — a technique where software agents learn by trial and error, repeatedly attempting tasks and adjusting based on rewards or penalties. Instead of training purely on recordings of how humans use websites, the A.I. can produce millions of its own interactions inside these synthetic environments.

In theory, large A.I. models could practice directly on real sites. In practice, that runs into a wall. Commerce and travel platforms such as Amazon and Airbnb routinely block automated bots, especially when they hammer the same actions over and over — exactly what reinforcement learning requires.

“When you’re training, you want to run thousands of agents in parallel so they can explore every corner of a site and try all sorts of behaviors,” Garg said. “If you attempt that on a production website, you’ll be shut down very quickly.”

Tech Giants in Silicon Valley Build Amazon

Why These Shadow Sites Matter for A.I. Agents

Modern A.I. tools are built on neural networks — mathematical systems that excel at finding patterns in large quantities of text, images and audio. But in the last year, major labs like OpenAI have effectively burned through most of the readily available English-language text on the public web.

To keep improving their systems, companies have leaned more heavily on reinforcement learning. The method first gained traction in areas like math and software engineering, where models can grind through countless problems and gradually learn which sequences of steps produce correct answers.

Now, big players like OpenAI, Google, Amazon and Anthropic are applying the same playbook to A.I. agents.

Initially, they trained models using recordings of real people performing tasks online — ordering food on DoorDash, entering numbers into Microsoft Excel, or filling out forms. By watching cursor movements and keystrokes, the systems learned the basic mechanics of web and app interfaces.

To accelerate that process, the big A.I. labs are now hiring lesser-known companies such as AGI, Plato and another replica-builder called Matrices to construct large catalogs of practice environments.

“You want the A.I. to test all the different ways to complete a task, even weird or inefficient ones, so it can discover the best routes,” said John Qian, chief executive of Matrices. “You can’t realistically do that level of experimentation on a live production site.”

Although these training sites are mainly intended for internal use, some start-ups have left them accessible on the open web to showcase their capabilities to potential customers like Google, Amazon or OpenAI.

Legal Gray Zones and Copyright Fights

That visibility has also attracted legal scrutiny.

After AGI removed corporate logos and exact brand names from its clones, Garg said he was no longer worried about United Airlines pursuing additional action. Qian voiced a similar view but acknowledged that A.I. research is venturing into unsettled legal territory. Farlow declined to address the legal questions.

Robin Feldman, a professor at U.C. Law San Francisco and author of “AI Versus IP,” said that using these shadow sites as training grounds could still infringe on the copyrights of the companies whose interfaces are being copied. But courts might eventually rule that this kind of use is allowed under existing law, she added.

“These companies are essentially shooting first and trying to sort out the legal consequences afterward,” Feldman said. “The technology is moving much faster than the legal system, and some of the choices made now may come back to haunt the companies making them.”

The broader A.I. world is already embroiled in intellectual-property battles. The New York Times has sued OpenAI and Microsoft, claiming their A.I. models misused copyrighted news content; the companies have denied the allegations. More recently, Amazon filed suit against Perplexity, a start-up whose A.I. tools aim to automate shopping on Amazon’s platform.

Tech Giants in Silicon Valley Build Amazon

Today’s A.I. Agents Still Stumble — and Lag Behind Humans

Despite the ambition, early A.I. agents remain glitchy.

Companies such as OpenAI and Anthropic have publicly demoed systems that can order groceries on Instacart or take notes in cloud word processors like Google Docs. But in real-world tests, these tools often misclick, get stuck, or fail to complete the requested task.

“There’s a big gap between what companies want these agents to do and what they’re actually capable of today,” said Rayan Krishnan, chief executive of Vals AI, which evaluates cutting-edge A.I. technologies. “Right now they’re far too slow and error-prone to be truly useful. In many cases, it’s faster to just click through the site yourself.”

Experts are divided on how quickly these shortcomings will be ironed out. Some believe that with enough training data and better model architectures, agents will become reliable co-workers. Others question whether people and businesses really want this level of automation — or whether the owners of major websites will ultimately tolerate fleets of bots operating on their platforms, even indirectly.

Automating White-Collar Work

For the start-ups building these replicas, the endgame is clear: recreate the digital tools that define office life today so that A.I. systems can learn to use them as well as, or better than, humans.

“If you can reconstruct the software and websites people rely on for their jobs, you can train A.I. to do those jobs — and eventually to outperform human workers,” Farlow said.

Whether that vision becomes reality will depend on legal rulings, technical breakthroughs and the willingness of businesses and regulators to let software agents roam the web. But for now, Silicon Valley is betting that the fastest way to teach A.I. how to work online is to rebuild the internet — one clone at a time.

Related post