AI This Week: ChatGPT Wows with Stunning Image Magic While Google Teaches Gemini to “Think”

8 mins

While you were going about your typical week, AI took several significant leaps forward. OpenAI’s ChatGPT can now transform your ideas into stunning visuals with its GPT-4o upgrade, while Google’s new Gemini 2.5 models “think” before answering. Not to be outdone, Anthropic finally equipped Claude with the ability to search the web. Meanwhile, the industry continues to surprise, with AI search upstart Perplexity making a bold play for TikTok and Cloudflare devising an ingenious trap for unauthorized data scrapers. 

OpenAI Upgrades ChatGPT’s Image Generation Capabilities with GPT-4o

OpenAI has unveiled the first major enhancement to ChatGPT’s image generation capabilities in more than a year. The upgrade allows ChatGPT to leverage the GPT-4o model to natively create and modify images, expanding beyond its previous text-only generation capabilities.

The enhanced image generation feature is immediately available for subscribers to OpenAI’s $200 monthly Pro plan and Plus and Team subscribers. However, just one day after the announcement, Altman revealed that the rollout to free users would be delayed indefinitely, citing that the feature is “more popular than we expected.” The feature will eventually expand to developers utilizing the company’s API service as well.

Featured Image: OpenAI

According to OpenAI, GPT-4o’s image generation process takes slightly longer than the previous DALL-E 3 model but produces more accurate and detailed images. The new system can edit existing images, including those containing people, by transforming them or “inpainting” details like foreground and background objects.

What makes GPT-4o’s approach distinct is its “autoregressive” image generation method, which builds images sequentially from left to right and top to bottom rather than creating the entire image simultaneously. Since its launch, the feature has sparked a viral trend of users transforming photos into Studio Ghibli-style images across social media platforms—a phenomenon that even Altman himself has participated in.


Featured Image: Our co-founders, Shawn and Anthony, giving Studio Ghibli vibes thanks to ChatGPT’s new image editing powers with GPT-4o

OpenAI disclosed to the Wall Street Journal that GPT-4o was trained on “publicly available data” and proprietary data from partnerships with companies like Shutterstock. Chief Operating Officer Brad Lightcap emphasized the company’s commitment to respecting artists’ rights, stating that policies are in place to prevent the generation of images that directly mimic living artists’ work. OpenAI maintains an opt-out form allowing creators to request the removal of their works from training datasets and respects requests to block its web-scraping bots.

This update follows Google’s recent experimental native image output for Gemini 2.0 Flash, which gained viral attention but faced criticism for insufficient guardrails that allowed users to remove watermarks and create images of copyrighted characters.

Google Debuts Gemini 2.5 Family with Advanced Reasoning Capabilities

Google has introduced its new Gemini 2.5 family of AI models, featuring built-in reasoning capabilities that allow the systems to “think” before generating responses. The first release in this lineup, Gemini 2.5 Pro Experimental, is being positioned as Google’s most intelligent model to date and is immediately available through Google AI Studio and to Gemini Advanced subscribers ($20/month).

The company has committed to incorporating reasoning capabilities into all its future AI models, joining the industry-wide race to develop systems that can carefully work through problems and fact-check before delivering answers. Since OpenAI launched the first AI reasoning model (o1) in September 2024, multiple companies, including Anthropic, DeepSeek, and xAI, have introduced similar technologies.

According to Google’s benchmarks, Gemini 2.5 Pro outperforms several competing models in specific areas. On the Aider Polyglot evaluation for code editing, it achieved a 68.6% score, reportedly surpassing offerings from OpenAI, Anthropic, and DeepSeek. However, on the SWE-bench Verified test measuring software development capabilities, it scored 63.8%, falling behind Anthropic’s Claude 3.7 Sonnet (70.3%) while still outperforming OpenAI’s o3-mini and DeepSeek’s R1.

The new model launches with an impressive 1 million token context window (approximately 750,000 words), with plans to double this capacity to 2 million tokens soon. Google has emphasized the model’s particular strengths in creating visually appealing web applications and supporting agentic coding applications.

While Google has not yet disclosed API pricing for Gemini 2.5 Pro, the company has promised to provide this information in the coming weeks. 

Anthropic Brings Web Search to Claude, Catching Up with Competitors

Anthropic has introduced web search capabilities to its AI chatbot Claude, a significant addition that brings it in line with major competitors like ChatGPT, Google’s Gemini, and Mistral’s Le Chat. This new feature allows Claude to access and incorporate current information from the internet when responding to user queries.

Featured Image: Anthropic

The web search functionality is currently available in preview for paid Claude users in the United States, with plans to expand access to free users and additional countries in the near future. Users can enable the feature through their profile settings in the Claude web app. Significantly, the capability is limited to Claude 3.7 Sonnet, Anthropic’s latest model.

When utilizing web search, Claude provides direct citations within its responses, allowing users to verify the sources of information. This approach transforms traditional search by delivering relevant information in a conversational format rather than requiring users to sift through search results themselves.

Early testing reveals that while the feature doesn’t consistently activate for all current events queries, it does provide cited answers when triggered, pulling from diverse sources including social media platforms and news outlets like NPR and Reuters.

Perplexity AI Makes Surprise Bid for TikTok Amid Divestment Deadline

AI search startup Perplexity has publicly confirmed its interest in acquiring TikTok, joining the race to purchase the video-sharing platform as it faces a potential US ban if it doesn’t divest from Chinese owner ByteDance.

In a blog post, the San Francisco-based company outlined its vision for integrating its AI-powered search capabilities with TikTok’s video platform. Perplexity argued that combining its “answer engine” with TikTok’s extensive video library would create “the best search experience in the world” while avoiding monopolistic concerns that might arise if a larger tech competitor acquired the platform.

Featured Image: Perplexity AI

Perplexity’s announcement follows comments from President Donald Trump, who stated that the US is in discussions with four different groups interested in TikTok. While Trump didn’t name the potential buyers, other reported contenders include “The People’s Bid for TikTok” (launched by real estate and sports entrepreneur Frank McCourt’s Project Liberty), Microsoft, Oracle, and a group involving internet personality MrBeast (Jimmy Donaldson).

The AI company’s proposal includes building US-based infrastructure for TikTok with American oversight, rebuilding TikTok’s recommendation algorithm “from the ground up,” making the “For You” feed open-source, and enabling fact-checking features that would allow users to cross-reference information in videos.

Cloudflare Introduces ‘AI Labyrinth’ to Counter Unauthorized Web Scrapers

Cloudflare has unveiled a creative approach to combating unauthorized web scraping with its new “AI Labyrinth” tool. Rather than simply blocking web-crawling bots that collect data without permission, the service purposely lures them into an elaborate maze of AI-generated decoy content designed to waste their resources.

The company, which processes over 50 billion web crawler requests daily, developed this free, opt-in tool as an alternative to the traditional robots.txt system—a protocol that many AI companies have reportedly ignored. When the system detects suspicious bot behavior, it redirects the crawler to follow links leading to fake pages filled with scientifically accurate but irrelevant content that has no connection to the website’s actual data.

Cloudflare positions AI Labyrinth as both a defensive measure and a “next-generation honeypot.” By enticing bots to follow links deeper into the labyrinth—something human visitors wouldn’t do—the system can better identify and fingerprint malicious crawlers while discovering new bot patterns. The company emphasizes that these trap links remain invisible to normal human visitors.

Website administrators can activate the feature through their Cloudflare dashboard’s Bot Management settings. The company describes this release as just the first iteration of its generative AI-powered bot deterrence strategy, with plans to create entire networks of linked URLs that would be difficult for bots to identify as decoys. The approach bears similarities to Nepenthes, another tool designed to trap crawlers in AI-generated junk data for extended periods.

Keep ahead of the curve – join our community today!

Follow us for the latest discoveries, innovations, and discussions that shape the world of artificial intelligence.