AI This Week: Midjourney Goes Video, Google Goes Conversational

9 mins
Close-up of a girl with platinum blonde hair looking directly at the camera, partially obscured by a bird in soft focus.

AI companies made bold moves across multiple fronts this week, from creative tools to hardware partnerships to interface innovations. Midjourney finally launched its video generation capabilities, Google transformed search into a conversational experience, and Meta expanded its smart glasses lineup with a sports-focused collaboration. Meanwhile, OpenAI provided the first concrete timeline for GPT-5 while exploring new monetization strategies.

Midjourney Makes Its Video Debut

Midjourney finally entered the video generation market with its V1 model, marking a significant expansion beyond the image creation tools that made the company a household name among digital artists. The new V1 model transforms static images into short video clips, allowing users to animate either uploaded photos or images created through Midjourney’s existing models.

How It Works

V1 operates as an image-to-video system, producing four different five-second clips from a single input image. True to Midjourney’s roots, the service remains exclusively available through Discord, maintaining the platform’s community-driven approach that has distinguished it from competitors. Users can extend these initial clips by four-second increments up to four times, potentially creating videos as long as 21 seconds.

The model offers both automated and manual animation controls. Users can let the system randomly animate their images or provide specific text descriptions to guide the animation process. Additional settings enable fine-tuning of camera movement and subject motion intensity, providing creators with varying degrees of control over the final output.

Blurred motion shot of a man in a blue jacket passionately playing drums, with dramatic lighting and energy.
Featured Image: Midjourney

The Competitive Landscape

Midjourney’s entry into video generation puts it in direct competition with established players like OpenAI’s Sora, Runway’s Gen 4, Adobe’s Firefly, and Google’s Veo 3. However, CEO David Holz positions this as just one step toward the company’s broader ambition: developing AI capable of real-time open-world simulations. The roadmap includes future models for 3D rendering and real-time AI applications.

The launch arrives amid legal turbulence, coming just one week after Disney and Universal filed a lawsuit against Midjourney, alleging that its AI models generate images of copyrighted characters, such as Homer Simpson and Darth Vader. This lawsuit reflects broader industry tensions as entertainment companies grapple with AI tools that may compete with or devalue creative work.

Google Introduces Conversational Search with Live Voice

Google rolled out Search Live, bringing conversational voice interactions to its mobile search experience. Available through the AI Mode experiment in the Google app for Android and iOS users in the United States, the feature transforms search from discrete queries into flowing conversations.

The new capability addresses scenarios where typing isn’t practical—when users are multitasking, driving, or have their hands full. Instead of tapping out questions, users can speak naturally and receive audio responses, then continue the conversation with follow-up questions without starting over.

How It Works

Search Live operates through a dedicated “Live” icon beneath the search bar in the Google app. Users can ask complex questions verbally and receive AI-generated audio responses, with relevant web links appearing on the screen for further exploration. The system maintains conversation context, allowing for natural follow-ups, such as asking for clarification or additional details on the same topic.

The feature runs in the background, meaning users can switch to other apps while continuing their search conversation. A transcript function converts the audio exchange into text, allowing users to transition between voice and typed queries within the same session. All interactions are saved in AI Mode history for later reference.

Digital search interface with glowing icons representing voice and visual search on a dark, gradient background.
Featured Image: Google

Technical Foundation

Google powers Search Live with a customized version of Gemini optimized for voice interactions. The underlying system leverages Google’s established search quality and information infrastructure, aiming to maintain reliability while adding conversational capabilities.

The implementation utilizes what Google refers to as the “query fan-out technique,” which generates multiple related searches from a single voice query to surface diverse web content. This approach is designed to expose users to broader information sources and enable more exploratory search behaviour.

Future Capabilities

Google plans to expand Live functionality with visual search integration in the coming months. The roadmap includes camera-enabled conversations, allowing users to show Search what they’re seeing in real time while maintaining the conversational format. This would combine Google’s existing visual search capabilities with the new voice interaction model.

The launch represents Google’s effort to make search more natural and accessible while competing with conversational AI interfaces that have gained popularity. By integrating voice conversation directly into search, rather than requiring separate apps or interfaces, Google aims to reduce friction for users seeking quick, hands-free access to information.

Another Google Launch: Gemini CLI Targets Terminal Users

Google introduced Gemini CLI this week, an open-source command-line tool designed to bring AI assistance directly into developers’ terminal environments. The tool represents Google’s latest effort to capture mindshare among programmers who have increasingly adopted AI coding assistants from competitors.

Gemini CLI connects Google’s AI models to local codebases, enabling natural language interactions for common development tasks. Developers can ask the tool to explain complex code sections, write new features, debug existing code, or execute terminal commands through conversational prompts.

Beyond Code Generation

While coding assistance forms the primary use case, Google designed Gemini CLI with broader capabilities in mind. The tool can generate videos using Google’s Veo 3 model, create research reports through the company’s Deep Research agent, and access real-time information via Google Search integration.

The system also supports connections to MCP (Model Context Protocol) servers, allowing developers to link external databases and services to their AI workflows. This extensibility positions the tool as more than just a coding assistant, potentially serving as a general-purpose AI interface for technical users.

Competitive Positioning

The launch puts Google in direct competition with established terminal-based AI tools like OpenAI’s Codex CLI and Anthropic’s Claude Code. These command-line tools have gained popularity among developers for their speed and integration capabilities compared to web-based alternatives.

Google’s timing reflects the growing importance of developer relationships in the AI space. Since launching Gemini 2.5 Pro in April, Google has seen increased adoption among programmers, many of whom use third-party tools like Cursor and GitHub Copilot that integrate multiple AI models. By offering a direct solution, Google aims to build stronger relationships with this influential user base.

Open Source Strategy and Adoption

Google released Gemini CLI under the Apache 2.0 license, one of the most permissive open-source licenses available. The company expects community contributions through GitHub to enhance the tool’s capabilities and drive adoption through developer collaboration.

To encourage uptake, Google offers generous usage limits for free users: 60 model requests per minute and 1,000 daily requests. According to the company, these limits exceed typical developer usage patterns by roughly double, removing cost barriers for experimentation and regular use.

The move reflects broader industry recognition that developer adoption drives enterprise AI sales, as programming teams often influence their organizations’ technology choices.

OpenAI’s Summer Plans: GPT-5 Timeline and Advertising Strategy

OpenAI CEO Sam Altman provided the first concrete timeline for GPT-5 during the new OpenAI Podcast, announcing the next-generation model will arrive sometime this summer. While Altman stopped short of giving a specific launch date, the announcement signals OpenAI’s confidence in the model’s development progress.

Early testing reports suggest GPT-5 represents a substantial improvement over its predecessor, with testers describing it as “materially better” than GPT-4. The upgrade comes as OpenAI faces intensifying competition from rivals and seeks to maintain its position in the rapidly evolving AI landscape.

Advertising Considerations and Trust Concerns

Altman also addressed potential monetization changes, revealing he’s “not totally against” introducing advertisements to ChatGPT. However, Altman emphasized strict boundaries around ad implementation. He warned that allowing advertisers to influence the model’s actual responses would constitute “a trust-destroying moment” for users. Instead, any advertising would need to remain separate from the AI’s output, potentially appearing in sidebars or other distinct areas of the interface.

The advertising approach would require extensive care to maintain user trust, with Altman noting the burden of proof would be “very high” to demonstrate that ads genuinely benefit users without compromising the AI’s objectivity or accuracy.

Meta and Oakley Team Up for Performance AI Glasses

Meta continues expanding its smart glasses portfolio with a new partnership targeting athletes and sports enthusiasts. The collaboration with Oakley brings together the eyewear brand’s 50-year legacy in sports performance with Meta’s AI technology, creating what they’re calling “Performance AI glasses.”

The Oakley Meta HSTN (pronounced “Houston”) represents the first product from this partnership, designed specifically for active users who want hands-free capture and AI assistance during physical activities. Unlike the lifestyle-focused Ray-Ban Meta glasses, these target serious athletes and sports fans with features tailored to performance scenarios.

Close-up of a woman smiling while wearing Meta smart glasses with white frames and orange-tinted lenses.
Featured Image: Meta

Technical Upgrades for Active Use

The HSTN model includes several improvements over existing Meta smart glasses. Battery life extends to eight hours of typical use with up to 19 hours on standby, addressing one of the primary limitations of wearable tech during extended activities. A quick 20-minute charge provides 50% of the battery’s capacity, while the included charging case offers an additional 48 hours of power.

Video quality gets a significant boost with Ultra HD 3K recording, allowing users to capture high-resolution footage of their activities. The IPX4 water resistance rating makes the glasses suitable for sweaty workouts and light rain exposure, though they’re not designed for swimming or heavy water sports.

AI Integration for Athletes

Meta AI powers voice-activated features designed for sports contexts. Users can ask about weather conditions like wind speed for golf shots or check surf conditions before hitting the waves. The hands-free video recording and social media posting capabilities let athletes document their achievements without interrupting their activities.

The glasses integrate Oakley’s PRIZM lens technology, which manipulates light at the molecular level to enhance visual contrast and reduce glare. This proprietary technology aims to help athletes perceive subtle visual cues more clearly and react more quickly during competition.

Market Positioning and Availability

Six different frame and lens combinations offer options for various sports and lighting conditions, all of which are compatible with prescription lenses. A limited edition model celebrating Oakley’s 50th anniversary features gold accents and 24K PRIZM Polar lenses.

Pricing starts at $399 for standard models, with the limited-edition version priced at $499. Pre-orders begin July 11, with initial availability across 15 countries, including the US, Canada, major European markets, and Australia. Expansion to Mexico, India, and the UAE is planned for later this year.

Keep ahead of the curve – join our community today!

Follow us for the latest discoveries, innovations, and discussions that shape the world of artificial intelligence.