If you're building for the agentic era you need AI-optimized infrastructure to deliver on new requirements.We announced significant expansion of our AI infrastructure portfolio including the eighth generation of our Tensor Processing Units (TPUs), which for the first time includes two distinct chips and specialized systems, engineered specifically for the agentic era.Ready to learn how to leverage TPUs for your own training and inference workloads? Start here with this course
OpenAI introduced workspace agents in ChatGPT, allowing teams to create shared AI agents for complex tasks and workflows. These agents, powered by Codex, perform tasks like generating reports, writing code, and managing communication, while integrating with various tools like Slack. Workspace agents are currently available in research preview for select ChatGPT plans, aiming to streamline collaboration and improve productivity.
Google launched Workspace Intelligence, enhancing Google Workspace with a semantic layer to integrate emails, chats, files, and projects for Gemini-powered agents. This update includes major product enhancements like natural-language spreadsheet building in Sheets and AI-driven features in Docs, Slides, Gmail, and Drive. Workspace Intelligence aims to make Workspace a centralized control layer for business operations, emphasizing security, context integration, and cross-application functionality.
Core Automation is an AI lab started by Jerry Tworek, a former OpenAI researcher, that aims to build the most automated AI lab in the world. It will start by automating its own research before developing new algorithms that go beyond pre-training and reinforcement learning. The lab will also create architectures designed to scale better than transformers. The team contains experts in frontier models, optimization, and systems engineering.
Perplexity's two-stage pipeline for search-augmented language models uses initial Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) to optimize factual accuracy, user preference, and tool-use efficiency. This approach, starting with Qwen3 models, separates compliance from search improvement to achieve accuracy without compromising guardrails. The models showed enhanced accuracy on benchmarks like FRAMES and FACTS OPEN with reduced cost per query and improved efficiency in tool usage over existing models like GPT-5.4.
Agentic workloads are reshaping inference engine benchmarks, demanding multi-turn, tool-using scenarios that strain KV cache management and scheduling due to longer traces and varied token distributions. Applied Compute introduced three workload profiles to aid in optimizing engine and accelerator performance. They released an open-source benchmarking tool to replay these scenarios, highlighting the need for solutions such as KV cache offloading and workload-aware routing to improve throughput and efficiency.
Most of what people put in AGENTS.md either doesn't help or actively hurts. The patterns that work are specific and learnable. This to post looks at which patterns work, which fail, and how to tell which is which for your codebase. Different patterns move different metrics, so pick patterns that target the problem you actually have.
Valuable data is often fragmented across various SaaS tools, file shares, and other silos that sneak up on you when you're trying to ship fast. In this webinar, Backblaze's director of Applied AI explains how you can build a scalable storage foundation on object storage using Backblaze B2 and B2 Overdrive. for all the phases of the AI data pipeline. See how you can store, label, and use all of your data, without blowing up your budget. Watch on-demand
Qwen3.6-27B delivers flagship-level agentic coding performance. The Qwen team claims that it surpasses the previous-generation flagship Qwen3.5-397B-A17B across all major coding benchmarks. The model is 55.6 GB on Hugging Face, and there are even smaller quantized versions available. Tests show that the model delivers outstanding results, even when quantized.
The Gemini Enterprise Agent Platform is a comprehensive platform for building, scaling, governing, and optimizing agents. It brings together model selection, model building, and agent building capabilities together with new features for agent integration, DevOps, orchestration, and security. Agent Platform is a single destination for technical teams to build agents that can transform products, services, and operations. The agents can be delivered to employees through the Gemini Enterprise app.
Agents can connect to external systems through direct API calls, CLIs, and MCP. This post looks at where each fits and the patterns for building those integrations effectively. MCP becomes the critical compounding layer as production agents move to the cloud. Every integration built on MCP strengthens the ecosystem.
Microsoft plans to roll out token-based billing for all GitHub Copilot customers starting in June. Copilot Business Customers will pay $19 per-user-per-month and receive $30 of pooled AI credits. Copilot Enterprise customers will pay $39 per-user-per-month and receive $70 of pooled AI credits. It is unclear what will happen to individual subscribers.
Personalization in LLM responses introduces variation but often retains a stable semantic core across answers. This shared foundation results from common model priors, overlapping retrievals, and product constraints, with differences emerging in examples and emphasis. Understanding this allows businesses to optimize their presence in AI-generated content by focusing on being part of the model's core knowledge.
Relying on prompts to correct recurring AI agent mistakes is an unreliable, "vibes-based" approach that decays as soon as conversations become complex. To solve this, Y Combinator CEO Garry Tan advocates for "skillification." Instead of letting an agent waste compute attempting to solve deterministic tasks (like historical calendar lookups) in its latent space, this framework forces the AI to execute precise local scripts.
AI is enhancing engineering workflows by handling execution, leaving humans to plan, review, and ensure quality output. Humans excel at diagnosing problems from multiple angles, a challenge for AI. Organizational AI strategies in the future will likely include personalized assistants for employees or a singular super-agent with departmental plugins.
We're hiring an engineer/researcher at a major AI lab or startup to help write for 1M+ subscribers. Curators have been invited to Google I/O and OpenAI DevDay, scouted for Tier 1 VCs, and get early access to unreleased TLDR products. Learn more.
npx workos@latest launches an AI agent, powered by Claude, that reads your project, detects your framework, and writes a complete auth integration into your codebase. No signup required. It creates an environment, populates your keys, and you claim your account later when you're ready.But the CLI goes way beyond installation. WorkOS Skills make your coding agent a WorkOS expert. workos seed defines your environment as code. workos doctor finds and fixes misconfigurations. And once you're authenticated, your agent can manage users, orgs, and environments directly from the terminal. No more ClickOps.See how it works →
OpenAI introduced an upgraded image model with improved text rendering, multi-image reasoning, and higher fidelity outputs, enabling complex assets like comics and marketing visuals.
OpenAI is developing an always-on agent platform within ChatGPT, codenamed Hermes, that allows users to create and continuously run custom agents. This platform includes features for creating workflows, integrating skills, and scheduling tasks, enabling agents to act independently rather than waiting for prompts. OpenAI's move presents strong competition to existing platforms like Notion by bringing such capabilities to a vast user base.
Qwen3.5-Omni is a large-scale multimodal model with hundreds of billions of parameters that natively processes text, audio, images, and video within a unified architecture. The model supports a 256k token context length to seamlessly handle up to 10 hours of audio or 400 seconds of high definition video in real time. It leverages a Hybrid Attention Mixture of Experts framework alongside a dynamic alignment technique called ARIA to generate highly stable and emotionally nuanced multilingual speech synthesis with minimal latency.
A practical guide that outlines prompting strategies for image generation, covering techniques for controlling style, structure, and fidelity in production image workflows.
Ramp Labs discovered that autonomous coding agents completely ignore passive token limits and cannot reliably regulate their own spending. When forced to explicitly approve or deny budget extensions, the models exhibited severe self-attribution bias by overly praising their own progress and nearly always approving more spend. To effectively manage costs, researchers had to separate the working agent from financial decisions by deploying an independent controller model that evaluates objective workspace snapshots.
This study found that models with extended pre-saturation phases generalize well from minimal examples and tolerate noise, while rapidly saturating models fail. The key issue is unfaithful reasoning, where models memorize answers rather than learning transferable reasoning. Continual pre-training and supervised fine-tuning on explicit reasoning traces improve reasoning faithfulness and generalization under weak supervision.
If you're building AI applications, you need infrastructure that can actually handle the compute.Google uses Tensor Processing Units (TPUs) - custom-built hardware accelerators designed specifically for large-scale AI workloads. It's the exact same accelerator system powering Gemini and powers billions of user requests across Search and Maps.Ready to learn how to leverage TPUs for your own training and inference workloads?Start the course →
Deep Neural Lesion (DNL) identifies highly sensitive parameters where flipping just a few bits can collapse model performance across vision and language tasks. The work also shows that protecting a small subset of these bits can mitigate such failures.
CrabTrap is an open-source HTTP/HTTPS proxy that intercepts every request an AI agent makes and uses LLM-as-a-judge to determine if the request matches a policy of allowed traffic for that agent. Agents need real credentials, but can hallucinate destructive actions or get prompt-injected. This can have production consequences. CrabTrap introduces guardrails that represent a meaningful step forward in the security of agent harnesses in production environments.
Stitch's DESIGN.md lets users export or import design rules from project to project. Stitch understands the reasoning behind design systems and can generate user interfaces that match branches. Google has open sourced the draft specification for DESIGN.md, which can be used across any tool or platform. A video breaking down the format is available in the article.
OpenAI is working with several consulting firms to help sell its AI coding tool Codex to businesses. Codex now has four million weekly active users, up from three million just two weeks ago. The Codex consulting program is part of OpenAI's push to focus on coding and enterprise businesses. Consulting partners will get access to an AI coding tool as part of the program.
OpenAI CEO Sam Altman called out Anthropic's new cybersecurity model during a podcast appearance this week, saying the company was using fear to make its product sound more impressive than it actually is. Anthropic announced its Mythos model earlier this month and only released it to a small cohort of enterprise customers with the claim that the model was too powerful to be released to the public as cybercriminals would weaponize it. Altman said that Anthropic's fear-based marketing was a good way to keep AI in the hands of a small and exclusive elite. Fear-based marketing is prevalent in the AI industry, and it has also come from Altman himself.
Anthropic's "Conway" is an always-on agent with UI extensions available on web and mobile, allowing users to manage connectors, install extensions, and configure the environment.
We're hiring an engineer/researcher at a major AI lab or startup to help write for 1M+ subscribers. Curators have been invited to Google I/O and OpenAI DevDay, scouted for Tier 1 VCs, and get early access to unreleased TLDR products. Learn more.
New Cloud Security Alliance (CSA) research makes it clear: 47% of organizations have already experienced a security incident involving an AI agent. 53% report agents regularly exceeding intended permissions. And 87% of enterprises run two or more AI agent platforms. Every additional platform is another place where policy enforcement breaks down. While only 21% maintain a real-time inventory of what's actually deployed.AI agent adoption has outpaced visibility, ownership, and control. The Enterprise AI Security Starts With AI Agents report from Cloud Security Alliance and Zenity maps the real threat landscape and what a proactive security strategy actually requires.→ Download the free report→ See the key findings → Explore Zenity Labs latest AI security research
Chronicle, available for ChatGPT Pro users on macOS, augments Codex by using screen context for memory building, helping Codex understand ongoing work with less context restatement. It stores unencrypted markdown memories on your device and requires macOS Screen Recording and Accessibility permissions. Be aware of prompt injection risks from screen content, and pause Chronicle during sensitive work to prevent unwanted context capture.
Kimi K2.6 features robust capabilities for coding and agentic tasks across chat and agent modes on kimi.com, with weights on Hugging Face and APIs via platform.moonshot.ai. The lineup includes K2.6 Instant for quick replies, K2.6 Thinking for complex reasoning, K2.6 Agent for document and web tasks, and K2.6 Agent Swarm for large-scale processing. Kimi K2.6 claims top open-source benchmark scores, surpassing competitors like GPT-5.4 and Claude Opus 4.6 in SWE-bench Multilingual and BrowseComp.
Qwen3.6-Max-Preview brings stronger world knowledge and instruction following along with significant agentic coding improvements across a wide range of benchmarks. The model is still under active development as researchers continue to iterate on it. Users can chat with the model interactively in Qwen Studio or call via API on Alibaba Cloud Model Studio API (coming soon).
Jeff Bezos' AI startup, which is aiming to develop models with the capability of understanding the physical world, is close to finalizing a $10 billion funding round. The company, code-named Project Prometheus, will use AI to accelerate engineering and manufacturing in fields like aerospace and automobiles. It was set up with an initial $6.2 billion in funding, sourced in part by Bezos himself. The new funding round, which is expected to close soon but has not been finalized, will include JPMorgan and BlackRock as investors.
Meta introduced Effective Training Time (ETT%) to measure how much end-to-end training runtime is spent on actual learning, highlighting overhead like checkpointing and failures. This post outlines system and PyTorch-level optimizations that reduce wasted time and improve large-scale training efficiency.
AllenAI describes a post-training approach that builds independent domain experts and combines them using a mixture-of-experts architecture. This allows models to gain new capabilities without retraining from scratch or degrading existing skills.
Even uncensored models quietly nudge language away from the words that sentences actually want. There is no refusal or warning - the probability just moves in some instances. This is a mechanism that can be used to shape what billions of users read without them noticing.
AI systems fail when agents can't share state or recover from failures. Build multi-agent architectures with LangGraph for orchestration and AWS for durable messaging. Join the AWS technical workshop + read the guide.
Google's Gemini CLI now includes subagents to split coding tasks, enhancing parallel execution by delegating specific roles like frontend updates or testing. This enables multiple tasks to process simultaneously without interference, optimizing workflows for developers. Gemini's setup contrasts with systems like Claude Code, which extends agent coordination across multiple sessions.
Qwen3.5-Omni scales to hundreds of billions of parameters with a hybrid MoE architecture, supporting long-context multimodal inputs across text, audio, and video.
TIPSv2 improves vision-language pretraining by combining distillation, enhanced self-supervised objectives, and richer caption data. The resulting models achieve strong performance across multimodal tasks, with notable gains in zero-shot segmentation.