The pace of frontier model development hit a new high this week with Google launching Gemini 3 just days after OpenAI’s GPT 5.1 release. But the more striking shift is happening in how these models are being deployed. AI is moving into operating systems, running in taskbars, making financial decisions, and conducting months-long scientific research in single runs. Meanwhile, platforms like Stack Overflow are fundamentally reorganizing to serve as infrastructure for these agents rather than destinations for humans. The transformation from AI tools to AI actors is accelerating faster than most predicted.
Listen to the AI-Powered Audio Recap
This AI-generated podcast is based on our editor team’s AI This Week posts. We use advanced tools like Google NotebookLM, Descript, and Elevenlabs to turn written insights into an engaging audio experience. While the process is AI-assisted, our team ensures each episode meets our quality standards. We’d love your feedback—let us know how we can make it even better.
🛠️ Development Tools
Google Launches Gemini 3 with Coding Interface
Google released Gemini 3 on Tuesday, its most advanced foundation model to date, now available through the Gemini app and AI search interface. The launch arrives just seven months after Gemini 2.5 and less than a week after OpenAI’s GPT 5.1 release, underscoring the rapid pace of frontier model development.
Gemini 3 achieved the highest score ever recorded on Humanity’s Last Exam benchmark with 37.4 points, surpassing the previous record of 31.64 held by GPT-5 Pro. The model also topped LMArena’s human-led satisfaction rankings. Google’s head of product for Gemini, Tulsee Doshi, noted that the model shows unprecedented depth and nuance in its reasoning.

A research-focused variant, Gemini 3 Deepthink, will launch for Google AI Ultra subscribers after additional safety testing. Google reports the Gemini app now has over 650 million monthly active users, with 13 million software developers integrating the model into their workflows.
Alongside the base model, Google introduced Antigravity, a Gemini-powered coding interface that enables multi-pane agentic coding similar to tools like Warp or Cursor 2.0. Antigravity combines a prompt window, a command-line interface, and a browser view, allowing developers to see the real-time impact of code changes. DeepMind CTO Koray Kavukcuoglu explained that the agent can work across editors, terminals, and browsers to help build applications more effectively.
Nano Banana Pro Brings 4K Image Generation
Google also launched Nano Banana Pro, its new image generation model powered by Gemini 3. The upgrade delivers significantly higher-resolution output—up to 4K, compared to the original Nano Banana’s 1024×1024-pixel limit—along with improved text rendering across multiple languages and styles. The model includes web search capabilities, allowing users to request, for example, recipe lookups and flashcard generation.
Nano Banana Pro targets professional use cases with granular controls over camera angles, lighting, depth of field, focus, and color grading. The system can incorporate up to 6 high-fidelity reference shots or blend up to 14 objects into a single image while maintaining visual consistency across up to 5 people. However, the enhanced quality comes at a cost: $0.139 per 2K image and $0.24 per 4K image, compared to $0.039 per 2K image for the original model.
The new image model is rolling out across Google’s product ecosystem. The Gemini app now defaults to Nano Banana Pro for image generation, though free-tier users face generation limits before reverting to the original model. Paid subscribers get higher thresholds and access to the model in NotebookLM, while Ultra subscribers can use it in Flow, Google’s video tool. Workspace customers will find it integrated into Slides and Vids, and developers can access it through the Gemini API, Google AI Studio, and Antigravity.
Google is also embedding SynthID, its watermarking technology, into the Gemini app. Users can upload images to verify whether they were created or modified by Google’s image models, with plans to support C2PA content credential detection for broader verification capabilities.
🧬 Scientific Discovery
FutureHouse Unveils Kosmos AI Scientist
FutureHouse has launched Kosmos, an AI research system that represents a significant advance over its predecessor, Robin. The organization is also spinning out a commercial entity, Edison Scientific, to manage the platform while maintaining a free tier for academics.
Kosmos addresses a key limitation of earlier AI scientists: the inability to synthesize large amounts of information due to finite context windows. The system uses structured world models to maintain coherence across tens of millions of tokens, enabling it to process far more information than previous agents. A typical run involves analyzing 1,500 papers and executing 42,000 lines of code.
Beta testers report that Kosmos completes in one day what would typically require six months of their time, with the system achieving 79.4% accuracy in its conclusions. The researchers validated this estimate by tracking how long it took humans to make the same discoveries Kosmos reproduced, which averaged around four months per finding.
The system has already produced seven scientific discoveries across neuroscience, materials science, and genetics. Three reproduced findings from unpublished work or papers outside its training data, while four made novel contributions. One notable discovery identified that neurons vulnerable to Alzheimer’s disease show reduced flippase gene expression with age, potentially triggering neuronal degradation—a finding validated in human tissue samples.
Kosmos differs from typical chatbots in its operation. It requires careful prompting and takes time to run, making it more comparable to lab equipment than a conversational tool. The system can pursue irrelevant leads or chase statistical noise, so researchers often run it multiple times on the same objective. Edison Scientific charges $200 per pricing run, with founding subscribers locking in that rate before future increases.
⚙️ Enterprise & Developer Infrastructure
Microsoft Reimagines Windows as an ‘Agentic OS’
Microsoft announced plans to transform Windows 11 into what it calls an “agentic OS” by embedding AI agents throughout the operating system. The company is starting with taskbar integration that lets agents run in the background while users work on other tasks.
The new Ask Copilot feature in the taskbar combines local file search with Copilot capabilities and provides access to both Microsoft 365 Copilot and third-party AI agents. When users assign tasks to an agent, it minimizes to the taskbar and continues working independently. Users can hover over the taskbar icon to check progress, with visual indicators showing status: yellow exclamation marks when the agent needs input and green checkmarks when tasks are complete.
Microsoft built the system using the Model Context Protocol, which provides a standardized framework for agents to discover tools and other agents through a secure on-device registry. For security purposes, each agent operates in its own workspace using a separate Windows account, creating a sandboxed environment isolated from the main Windows session.
Beyond the taskbar, Microsoft is integrating Copilot directly into File Explorer, enabling users to summarize documents, answer questions about files, or draft emails based on document content with a single click. The company is also enhancing Click to Do on Copilot Plus PCs, allowing users to convert any table they see into an Excel spreadsheet for manipulation.
Windows chief Pavan Davuluri and corporate VP Navjot Virk emphasized that all AI features are opt-in, giving users control over when and how they engage with agents. The company is pursuing a hybrid approach that combines local AI processing on Copilot Plus PCs with cloud-powered capabilities through Copilot.
Stack Overflow Pivots to Enterprise AI Infrastructure
Stack Overflow announced a suite of products at Microsoft Ignite that repositions the developer Q&A platform as an enterprise AI data provider. The company’s new focus centers on Stack Overflow Internal, which transforms its traditional forum format into a system that feeds information to AI agents.
Stack Overflow Internal functions as an enterprise version of the public forum with enhanced security and administrative controls, but its key feature is integration with the model context protocol. The product emerged after the company noticed that enterprise customers were already using its API for training.

Stack Overflow has also established content licensing agreements with AI labs, allowing them to train models on public forum data for flat fees. CEO Prashanth Chandrasekar compared these arrangements to Reddit’s AI licensing deals, which have generated over $200 million for that platform, though he declined to provide specific figures for Stack Overflow’s agreements.
A crucial element of the new system is the metadata layer that accompanies each question-answer pair. Beyond basic information like authorship and timestamps, Stack Overflow exports content tags and assessments of internal coherence. These factors combine to create reliability scores that help AI agents determine how much to trust each answer.
CTO Jody Bailey explained that customers can establish their own tagging systems or let Stack Overflow generate them automatically. The company plans to build knowledge graphs that connect concepts and information, reducing the burden on AI systems to make those connections independently.
While Stack Overflow isn’t building AI agents itself, Bailey highlighted the potential for agents to write their own Stack Overflow queries when they encounter knowledge gaps or can’t answer questions, essentially enabling AI systems to crowdsource expertise from human developers when needed.
🌎 Consumer Applications
Google Takes AI Travel Tools Global
Google is expanding its AI-powered travel features, making them available to users worldwide. The Flight Deals tool, which launched in August for the US, Canada, and India, now works in over 200 countries and supports more than 60 languages. The tool helps travellers find affordable destinations by analyzing their travel preferences and surfacing the best available deals.
The company is also adding travel planning to Canvas, its AI-powered organizational tool in Search. Users can describe their ideal trip, and Canvas will pull together flight and hotel data, Google Maps information, and web content to create a complete itinerary. The system suggests hotels based on specific criteria and recommends restaurants and activities based on proximity to accommodations. Canvas is currently available on desktop for US users who’ve opted into the AI Mode experiment.
Google is also expanding its agentic booking features to all US users. The system can search across multiple reservation platforms to find real-time availability for restaurants based on party size, date, time, and cuisine preferences. Google plans to extend this capability to flight and hotel bookings, allowing users to compare options and complete reservations directly within AI Mode.
Intuit Partners with OpenAI in $100M+ Integration Deal
Intuit has secured a multi-year agreement with OpenAI worth over $100 million to embed its financial software suite within ChatGPT. The partnership will make TurboTax, Credit Karma, QuickBooks, and Mailchimp accessible through ChatGPT’s interface, allowing users to handle tasks like calculating potential tax refunds, exploring credit products, and managing business operations.
The integration will enable Intuit’s applications to access user financial data with permission, performing actions like sending marketing campaigns or issuing invoice reminders directly through ChatGPT. Users will also be able to evaluate credit cards, personal loans, and mortgage options within the chat interface.

This arrangement differs from other ChatGPT integrations because it involves financial decisions where accuracy is critical. Intuit spokesperson Bruce Chan explained that the company employs multiple validation techniques and leverages large domain-specific datasets to reduce the risk of errors or false information. He emphasized that responses draw on Intuit’s accumulated expertise and comprehensive customer data to ensure relevance and accuracy.
While Intuit maintains it stands behind the accuracy guarantees of its products, the company didn’t specify who would be responsible for errors resulting from AI-generated advice or recommendations.
The deal extends beyond ChatGPT integration. Intuit will expand its use of OpenAI’s models throughout its business operations. The company already employs a mix of OpenAI models alongside other commercial and open-source options. Intuit views the partnership as an additional distribution channel for its small business and consumer finance tools, providing access to ChatGPT’s user base.
The agreement also covers Intuit’s ongoing use of ChatGPT Enterprise for internal employee workflows. This builds on Intuit’s broader AI strategy, which includes Intuit Assist, an AI assistant launched in 2023 that operates across the company’s product lineup.
Keep ahead of the curve – join our community today!
Follow us for the latest discoveries, innovations, and discussions that shape the world of artificial intelligence.
