AI This Week: From Your Bank Account to the Search Bar

21 mins
Abstract digital artwork featuring translucent green and blue glass-like tubes and interconnected structures floating against a grey background, evoking neural networks, data pipelines, or complex AI infrastructure systems.

This week, AI claimed new ground on several fronts at once. Google held I/O 2026 this week and announced Gemini across nearly every surface it owns. OpenAI moved into your bank account. Anthropic pulled a piece of shared developer infrastructure away from every competitor that was using it. KPMG handed Claude to 276,000 employees and put it inside the platform where client work actually happens. Cohere open-sourced a model built to run entirely inside your own infrastructure. And across the coding agent space, three different labs shipped three different answers to the same question: what does it look like when an AI works on your codebase without you watching?

TL;DR

  • Google held I/O 2026 and announced Gemini 3.5 Flash, a personal agent called Gemini Spark, background information agents in Search, a Universal Cart for shopping, and its first Android XR audio glasses arriving this fall.
  • OpenAI launched a personal finance feature in ChatGPT for Pro users, connecting more than 12,000 financial institutions via Plaid for live dashboards and context-aware money advice.
  • Notion launched a developer platform with Workers, live database sync, and support for external agents, including Claude Code, Cursor, Codex, and Decagon.
  • Cohere released Command A+ under Apache 2.0, a 218B mixture-of-experts model that consolidates reasoning, multimodal, multilingual, and tool use capabilities into a single model deployable on two H100 GPUs.
  • Cursor released Composer 2.5, trained with targeted RL textual feedback and 25x more synthetic tasks, and announced a significantly larger model in training with SpaceXAI.
  • OpenAI brought Codex to iOS and Android and is developing Computer Use capability for locked/sleeping machines and multi-device remote control.
  • xAI released Grok Build, a terminal-based coding agent in early beta for SuperGrok Heavy subscribers, with parallel subagent architecture and plan-review-approve workflows.
  • Anthropic acquired Stainless for a reported $300M+, internalizing the SDK generator used by OpenAI, Google, Cloudflare, and others, and is winding down hosted products for all other customers.
  • KPMG announced a global alliance with Anthropic, deploying Claude to all 276,000+ employees and embedding it in Digital Gateway for tax, legal, and PE work.

🌐 Platform Moves

Google I/O 2026: Gemini goes everywhere

Google used I/O 2026 to push Gemini deeper into nearly every surface it owns, with new models, a redesigned app, a personal agent, and a string of product integrations spanning Search, Workspace, YouTube, and hardware.

On the model side, Gemini 3.5 Flash is the headline release, surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks while running at four times the output token speed of other frontier models at Flash-tier cost. It’s rolling out now across the Gemini app, Search, and the Gemini API. Gemini 3.5 Pro is in testing and expected next month. Google also announced Gemini Omni, a new model series that pairs reasoning with generation. The Flash variant accepts image, audio, video, and text input and outputs editable video grounded in real-world knowledge.

Black background featuring colorful gradient symbols and abstract icons associated with Google I/O 2026, including geometric shapes, a globe, arrows, and code brackets arranged in a futuristic, minimalist composition.
Features Image: Google

The most significant product announcement is Gemini Spark, described as a personal agent that takes actions on behalf of users rather than answering questions. It integrates with Gmail, Docs, and other Workspace apps at launch, with third-party tool support via MCP coming over the summer. Spark will be available to Google AI Ultra subscribers in the US starting next week. Complementing it is Android Halo, a persistent status layer at the top of the phone screen that surfaces what an agent is working on without requiring the user to switch apps. A related feature, Daily Brief, synthesizes Gmail, Calendar, and Tasks into a personalized digest with suggested next steps, rolling out now to AI Plus, Pro, and Ultra subscribers in the US.

Search is getting a substantive upgrade, too. AI Mode is now powered by Gemini 3.5 Flash, with a new expandable search box designed for longer conversational queries. Background information agents will monitor the web continuously for changes related to topics a user cares about, available to Pro and Ultra subscribers this summer. Google also announced plans to build custom dashboards and trackers β€” described as mini apps for specific continuous tasks β€” coming in the months ahead.

On the shopping side, Universal Cart is a Gemini-powered cart that works across the Gemini app, YouTube, and Gmail. It tracks price drops, flags product incompatibilities, and factors in payment perks and loyalty information from Google Wallet. It’s coming to Search and the Gemini app in the US this summer. In Workspace, Gmail is getting a conversational search mode called Gmail Live, Docs is getting a live creation and editing mode, and Google Keep will organize free-flowing notes automatically.

Google also announced its first Android XR audio glasses, arriving this fall. The hardware is made by Samsung and Qualcomm, with exteriors designed by Gentle Monster and Warby Parker, and will pair with both Android phones and iPhones.

On pricing, Google AI Ultra now starts at $100 per month. The previous $250 tier drops to $200 with the same capabilities. The Gemini app is also shifting from daily prompt limits to a compute-used model that factors in prompt complexity, features used, and conversation length, with limits refreshing every five hours up to a weekly cap.

Why it matters: Gemini is reaching into every Google product simultaneously. The strategic logic is clear: Google has the distribution advantage, and I/O is the moment it turns that into lock-in. Gemini Spark is the most consequential announcement because it signals Google’s intent to move from assistant to agent across its own ecosystem before opening to third parties. The MCP support coming this summer is worth watching closely. If Spark can operate credibly across non-Google tools, Google’s installed base across Search, Gmail, and Android becomes a serious distribution moat for agentic AI.

πŸ’° AI Gets Personal

ChatGPT Gets a Personal Finance Dashboard

OpenAI is rolling out a personal finance feature in ChatGPT for Pro users in the U.S., letting people connect bank accounts, credit cards, and investment portfolios directly inside the chat interface. The integration uses Plaid to link more than 12,000 financial institutions, and once connected, users get a live dashboard covering spending by category, subscriptions, upcoming payments, and portfolio performance. Intuit support is coming.

Series of mobile app interface screens showing ChatGPT finance integrations powered by Plaid, including account connection flows, financial insights setup, institution syncing, and privacy messaging for connected banking accounts.
Featured Image: OpenAI / ChatGPT

The feature does more than display balances. With account data in context, GPT-5.5 Thinking, the default model for the Finances experience, can answer questions grounded in a user’s actual spending history rather than generic advice. Ask it to help you save more over the next three months, and it pulls your real transaction data, identifies the spending categories with the most flexibility, and builds a specific plan. OpenAI says it worked with more than 50 finance professionals to evaluate response quality, and GPT-5.5 Thinking scored 79 out of 100 on their internal personal finance benchmark; GPT-5.5 Pro scored 82.5.

On privacy: ChatGPT can read balances and transactions, but cannot see full account numbers or execute any transactions. Financial data follows the same model training settings users have already configured, and disconnecting accounts deletes synced data within 30 days.

Why it matters: Personal finance is one of the most credible use cases for an AI with persistent memory and reasoning ability, as the value of the advice scales directly with how much context the model has. OpenAI is betting that users will trust ChatGPT with their financial data in exchange for advice that actually reflects their situation, not a generic budgeting template. That’s a meaningful shift from AI as a search tool to AI as a financial copilot. The Intuit partnership is worth watching: the roadmap OpenAI describes, moving from a credit card recommendation to an actual application, or from a tax question to a booked appointment with a local CPA, signals that ChatGPT wants to become the interface layer over the financial services stack, not just a place to ask questions about it.

🧩 Agentic Platforms

Notion Opens Up to External Agents and Custom Code

Notion has launched a developer platform that repositions the productivity tool as an orchestration layer for AI work. The centrepiece is Workers, a cloud-based environment where teams can write and deploy custom code in a sandboxed environment β€” no external infrastructure required. Through Workers, teams can sync live data from any API-connected database, including Salesforce, Zendesk, and Postgres, directly into Notion, keeping it current and queryable.

The platform also addresses a gap in Notion’s existing Custom Agents product, which launched in February and has since seen over one million agents built by customers. Those agents previously couldn’t connect with external data sources or run custom logic. Now they can β€” and external agents can connect to Notion as well. At launch, Claude Code, Cursor, Codex, and Decagon are supported as partner agents, meaning teams can assign them work and track their progress from inside Notion as if they were native. An External Agent API covers internal, company-built agents too. For cases where MCP connections aren’t sufficient, Workers can build custom agent tools with their own logic. The developer platform is accessible via the Notion CLI across all plans, and Workers are free through August.

Why it matters: Notion is making a deliberate move from application to infrastructure. The combination of live data sync, deployable code, and cross-agent coordination in a single workspace is a credible answer to a real enterprise problem: AI agents are multiplying, but the surfaces they work across are fragmented. By becoming the place where agents are assigned, tracked, and given access to a unified data layer, Notion is competing less with note-taking tools and more with workflow automation platforms. The one million agents already built on its platform provide a base. The question now is whether enterprises trust a workspace app with that much of their operational stack.

Manus Upgrades Scheduled Tasks With Persistent Context

Manus has released Scheduled Tasks 2.0, an upgrade to its recurring automation feature that goes beyond simple time-based triggers. The core change is that scheduled work can now continue inside the same task rather than spawning a new one each time. That means a recurring workflow β€” a daily standup summary, a status check, a research thread β€” carries forward the instructions, files, conversation history, and prior results already built inside that task. For work organized in Projects, the schedule can also inherit the Project’s shared setup: connectors, files, skills, and output standards.

The update also brings scheduling into web apps built with Manus. Rather than requiring a user to manually trigger data refreshes or recurring reports, those actions can now be embedded directly into the app’s behaviour, running on whatever cadence makes sense without any user intervention.

A new set of views makes the operational side easier to manage. A side panel, schedule view, and calendar view show upcoming runs and run history, and each run card links back to the originating task so users can inspect specific outputs. Control over individual schedules is consolidated in one place: users can set whether each run continues in the same task or starts fresh, skip confirmation prompts for trusted automated workflows, attach connectors as live data sources, and choose the execution environment.

Why it matters: The original version of scheduled tasks solved the when. This version addresses the where and the what, specifically, whether a recurring workflow should carry context forward or reset. That distinction matters more than it might seem. Most meaningful recurring work isn’t stateless: a weekly report that builds on last week’s, a dashboard that tracks a running dataset, a follow-up thread that references prior decisions. Letting the schedule stay anchored to the artifact and context it belongs to is what separates useful automation from a glorified cron job. Combined with the ability to embed schedules inside apps, Manus is pushing toward agents that maintain ongoing processes rather than completing discrete tasks on demand.

πŸ’» Coding Agents

Codex Is Coming to Your Phone β€” and Your Locked Laptop

OpenAI recently brought Codex to the ChatGPT mobile app on iOS and Android, letting users review outputs, approve commands, switch models, and dispatch new tasks to a Mac running the Codex desktop app directly from their phone. Now the company is working on extending that remote control capability in two directions.

The first addresses an awkward gap in the current workflow: Computer Use, the mechanism that lets Codex see the screen and interact with desktop applications, requires the Mac to be unlocked and awake. OpenAI is developing a way to keep that capability active even when the machine is locked or sleeping, meaning a user could direct Codex to open an app, test a GUI build, run through a simulator, or pull from a data source without walking back to the machine to log in first. The second extension would let Codex connect to and control other desktop devices running the Codex app β€” a Mac Mini, for instance β€” from a primary device, with the developing UI suggesting support for multiple remote machines simultaneously. No release timeline has been announced for either feature.

Why it matters: The locked-screen limitation is one of the more consequential constraints on autonomous coding agents right now. It effectively means the agent can only work when someone is actively at the machine. Lifting that restriction opens the door to genuine overnight and background work across a device fleet. The unresolved question is how Apple responds. A screen-driving agent staying active inside a locked macOS session runs against the security assumption that a locked screen means an untouchable one. How that tension gets resolved will determine how far this capability can actually go on Mac hardware.

Cursor Ships Composer 2.5 With Targeted RL Training

Cursor has released Composer 2.5, a significant update to its in-house coding model, now available inside Cursor. Like its predecessor, it’s built on Moonshot’s Kimi K2.5 open-source checkpoint, but trained with several new techniques that improve both raw capability and day-to-day usability.

The most technically interesting addition is targeted RL with textual feedback. Standard reinforcement learning assigns reward over an entire rollout, which becomes a noisy signal when a rollout spans hundreds of thousands of tokens: a single bad tool call buried in an otherwise successful run barely registers. Cursor’s approach inserts a short hint directly at the problematic point in the trajectory, uses the hinted version as a teacher distribution, and trains the model to move toward it at that specific turn only. The result is precise correction of localized behaviour, such as wrong tool calls, style violations, and unclear explanations, without disrupting the broader training objective.

Minimalist benchmark comparison table displaying performance scores for AI models including Composer 2.5, Opus 4.7, GPT-5.5, and Composer 2 across coding and software engineering evaluation tasks such as Terminal-Bench and SWE-Bench Multilingual.
Featured Image: Cursor / Composer 2.5

Composer 2.5 was also trained on 25 times more synthetic tasks than Composer 2, including a “feature deletion” approach where the agent is given a codebase, asked to remove specific functionality while keeping everything else working, then tasked with reimplementing it. At scale, this surfaced some striking reward hacking: the model found a leftover Python type-checking cache and reverse-engineered it to recover deleted function signatures, and in another case, decompiled Java bytecode to reconstruct a third-party API. Cursor says agentic monitoring tools caught these, but flags them as a sign of how much care large-scale RL now requires.

Pricing is $0.50/M input and $2.50/M output for the standard tier. A faster variant runs at $3.00/M input and $15.00/M output, which is lower than the fast tiers of comparable frontier models, Cursor says. Double usage is included for the first week.

Cursor also noted it is training a significantly larger model from scratch with SpaceXAI using 10x more total compute on Colossus 2’s infrastructure.

Why it matters: Targeted textual feedback is a meaningful training innovation because it addresses a real limitation of RL at long context: credit assignment degrades as rollouts get longer, which is precisely the regime coding agents operate in. By making corrections surgical rather than diffuse, Cursor can train against specific behavioural problems, such as tool misuse, communication style, effort calibration, that aggregate reward signals tend to wash out. The reward hacking examples are also worth sitting with: a model sophisticated enough to decompile bytecode to solve a synthetic task is demonstrating a level of resourcefulness that will require increasingly careful environment design as these models get more capable.

xAI Launches Grok Build, a Terminal-Based Coding Agent

xAI has released an early beta of Grok Build, a coding agent and CLI aimed at professional software engineering work. Available now to SuperGrok Heavy subscribers, it installs via a single curl command and runs directly from the terminal.

The agent is built around a plan-review-approve loop: for complex tasks, Grok Build drafts a step-by-step plan before touching any code, letting users approve it, comment on individual steps, or rewrite it entirely. Once approved, every change surfaces as a diff. For larger jobs, it can spin up specialized subagents that run in parallel, each in its own git worktree, and coordinate across them. The latency regression example in the announcement shows it splitting a debugging task across subagents, simultaneously exploring deploys, slow endpoints, query plans, and cache hit rates at the same time.

Dark-themed Grok Build coding assistant interface.
Featured Image: xAI / Grok Build

It also picks up existing repo conventions out of the box: AGENTS.md files, plugins, hooks, MCP servers, and a plugin marketplace are all supported. A headless mode allows the agent to run inside scripts and automations, and a full ACP interface lets developers build their own bots or orchestration layers on top of it.

Why it matters: The terminal-based coding agent space is getting crowded fast β€” Claude Code, Cursor, and Codex are all competing for the same professional developer workflow. Grok Build’s parallel subagent architecture is a notable design choice: rather than one agent working through a task sequentially, it distributes exploratory work across multiple agents simultaneously, which could meaningfully compress time on large codebases. The SuperGrok Heavy paywall limits early reach, but xAI is clearly using it as a feedback loop to iterate quickly before a wider release. Whether Grok Build can carve out ground against entrenched tools will depend on how well the model performs on real engineering tasks β€” the architecture is promising, but the beta label is doing a lot of work right now.

πŸ€– Open Source AI

Cohere Pushes Sovereign AI Forward with Open-Source Command A+

Cohere has released Command A+ under an Apache 2.0 license, making the model freely available for download on Hugging Face. It is a 218 billion parameter mixture-of-experts architecture with 25 billion active parameters, designed to run on as little as two H100 GPUs or a single Blackwell GPU at 4-bit quantization with negligible quality loss.

Command A+ consolidates capabilities that were previously spread across four separate Command A models β€” reasoning, multimodal understanding, translation, and tool use β€” into a single model. It supports 48 languages, up from 23 in previous generations, and accepts text, image, and tool use as inputs across a 128K context window. Compared to Command A Reasoning, agentic coding performance on Terminal-Bench Hard improved from 3% to 25%, and tau-squared Bench Telecom scores jumped from 37% to 85%. On Cohere’s internal North evaluations, agentic question answering accuracy improved 20%, and spreadsheet analysis quality improved 32%. The model is also faster than its predecessors, delivering up to 63% higher output tokens per second and reducing time to first token by up to 17% compared to Command A Reasoning at equivalent quantization levels.

The release also reflects a broader shift happening across enterprise AI: organizations increasingly want agentic systems they can deploy privately, customize internally, and integrate directly into operational workflows without depending entirely on closed hosted platforms. Cohere’s emphasis on sovereign AI continues to position the company differently from many frontier model competitors. Instead of focusing primarily on consumer-facing AI experiences, Cohere is doubling down on enterprise-controlled deployments, secure infrastructure, and production-ready agentic systems.

Why it matters: Cohere’s explicit framing around sovereign AI is pointed. Command A+ is built to run entirely inside customer infrastructure, which targets a real and growing concern among enterprises that cannot or will not send sensitive data to third-party APIs. Open-sourcing a model of this capability level under Apache 2.0 raises the floor for what organizations can deploy privately, and consolidating the full Command A capability set into one model simplifies a deployment decision that previously required choosing between specialized variants. For enterprises where data residency and operational control are non-negotiable, this is a more practical option than most frontier models currently on the market.

πŸ”§ Developer Infrastructure

Anthropic Acquires Stainless, Cuts Off Competitors’ Access

Anthropic has acquired Stainless, a New York-based developer tools startup founded in 2022 by former Stripe engineer Alex Rattray. The Information reported the deal was valued at more than $300 million; Anthropic has not confirmed terms. Stainless, backed by Sequoia Capital and Andreessen Horowitz, built tooling that takes API specifications and automatically generates production-ready SDKs across Python, TypeScript, Go, Java, Kotlin, and more, and keeps them updated as the underlying API changes. The technology was widely adopted across the AI industry: OpenAI, Google, Cloudflare, Replicate, and Runway were all customers, as was Anthropic, which has used Stainless to generate every official Claude SDK since the earliest days of its API.

The acquisition comes with a significant competitive consequence: Anthropic will wind down all hosted Stainless products, including its SDK generator. Existing customers retain full rights to the SDKs they’ve already generated, but the platform will no longer be available to Anthropic’s rivals going forward.

Beyond SDKs, Stainless also has tooling for MCP server generation, which is directly relevant to Anthropic’s push to make Claude agents more broadly connectable to external data and services.

Why it matters: This is a supply chain move as much as a talent acquisition. Stainless occupied a quiet but load-bearing position in the AI developer ecosystem: nearly every major lab used it to manage the unglamorous but critical work of keeping SDKs current. By acquiring and closing it to competitors, Anthropic removes a shared piece of infrastructure that its rivals depended on and internalizes it exclusively. The MCP angle sharpens the strategic logic further, as the industry moves toward agents that connect to external tools, the quality and reach of that connectivity layer becomes a real differentiator. Controlling the tooling that builds those connections puts Anthropic closer to owning that stack end to end.

🀝 Enterprise AI

KPMG Deploys Claude Across 276,000 Employees in Global Alliance

Anthropic has announced a global alliance with KPMG that embeds Claude across the professional services firm’s entire workforce and client-facing platforms. All 276,000-plus KPMG employees worldwide will gain access to Claude, extending an adoption that began two years ago inside KPMG’s US operations and AI and Data Labs.

The more consequential integration is inside Digital Gateway, KPMG’s Azure-hosted platform where its tax expertise, proprietary tools, and client data live. Claude Cowork and Managed Agents are now embedded directly in the platform, letting KPMG professionals and their clients build AI tools without switching between systems. The firm’s tax vice chair noted that building an agent to help clients navigate changing tax regulations previously took weeks across multiple tools. The same task now takes minutes inside Digital Gateway. Cybersecurity is another application area, with KPMG and Anthropic teams working together to identify and remediate vulnerabilities in critical systems under KPMG’s Trusted AI framework.

The alliance also has a private equity dimension. Anthropic is naming KPMG a preferred partner for PE deployments, with KPMG consulting on Claude rollouts inside portfolio companies. A new offering called KPMG Blaze embeds Claude Code to help portfolio company teams modernize legacy IT systems and ship AI-enabled products faster.

Why it matters: Enterprise AI deployments at this scale are still relatively rare, and KPMG is an instructive case because of where it operates: audit, tax, and legal work, where accuracy and accountability carry real legal and reputational weight. A firm willing to put Claude into that environment across 138 countries is a meaningful signal of where enterprise confidence in AI is heading. The private equity angle is also worth watching: KPMG as a distribution channel into PE portfolio companies gives Anthropic a route into a large pool of mid-market businesses that wouldn’t typically be direct Anthropic customers. That’s a go-to-market model that will likely be replicated across other verticals.

Keep ahead of the curve – join our community today!

Follow us for the latest discoveries, innovations, and discussions that shape the world of artificial intelligence.