It looked like Saturday night would be very bad. Sentry's Head of AI, Indragie, was online when Seer, Sentry's AI debugger, had started failing. The issue was an upstream infra outage on the provider's side - but there was no way to figure that out without digging into dashboards. So Indragie pulled up a new tool: Seer Agent. He told it what he was seeing and waited for the answer.It came in seconds. Seer fixed Seer.Read the blog to get the full story.Curious? See how it works in the live webinar on May 19th
Grok Build is a coding agent that runs from the terminal. It is now in early beta for SuperGrok Heavy subscribers. AGENTS.md, plugin, hooks, skills, and MCP servers all work out of the box. Grok Build supports subagents for larger tasks, and it also supports deep worktree integrations, so users can launch subagents in their own worktrees. There is a headless mode that allows the easy running of agents inside scripts and automations.
Cursor detailed a new system for configuring cloud-based development environments tailored to autonomous coding agents. It supports multi-repo, environment configuration as code, automated setup workflows, and governance controls for managing fleets of parallel agents.
Bloomberg reported that OpenAI explored legal options against Apple over dissatisfaction with how deeply ChatGPT was integrated into Apple's ecosystem and the limited subscriber growth that followed.
Anthropic outlines two possible 2028 global AI leadership scenarios: one where the US retains its compute advantage and shapes AI norms, and another where China competes closely due to policy inaction. The US currently leads due to strong export controls and advanced chip technology preventing China from keeping pace. Closing loopholes on compute access and restricting distillation attacks are crucial for maintaining the US lead and ensuring democracies shape AI governance.
There are two ways to sandbox an agent that can execute code: isolate the tool or isolate the agent. Agents should have nothing worth stealing and nothing worth reserving. Isolating the agent requires an extra network hop on every operation and more services to deploy, but there are no secrets to steal, no state to preserve, and agents can be killed, restarted, and scaled independently.
Production failures don't come from bad code. They come from correct code entering a system nobody fully modeled. AI code review tools see the diff. They don't see configurations, dependencies, user behavior, or infrastructure under load. AI code simulations offer a better approach to understanding production impact before code ships.Learn more → | Book a demo →
Codex has implemented hooks and programmatic tokens to make it easier to automate and customize code. Hooks can customize the Codex loop with scripts that run at key points in a task. Programmatic access provides scoped credentials for Business and Enterprise teams. A video showing how to create access tokens for Codex automations is available.
Raindrop Workshop gives Claude Code the ability to read traces, write evals against codebases, and fix what's broken. It provides livestreamed traces, coding-agent integration, a self-healing eval loop, and local replay. Raindrop Workshop is compatible with TypeScript, Python, Go, and Rust, and most popular SDKs, providers, and coding agents.
Genkit is a framework for building full-stack, AI-powered and agentic applications for any platform. It supports TypeScript, Go, Dart, and Python. Genkit uses composable hooks that intercept generation calls to implement retries and fall-backs for maximum reliability, human approval before destructive tool calls, and observability across every layer. Its middleware system runs a tool loop that repeats until the model is done. The Genkit Developer can be used to inspect, test, and debug applications and middleware execution.
Asynchronous batching can reduce idle time between CPU and GPU cycles, improving GPU utilization for inference by 22%. By using CUDA streams and events, CPU tasks prepare batch N+1 during batch N's GPU computation, eliminating idle gaps. This method yields more efficient GPU operations without changing kernels or models, enhancing generation speed substantially.
Microsoft signed a deal with OpenAI late April that amended the company's exclusive license to OpenAI models, freed OpenAI to sell on any other cloud, and removed the AGI clause that would have triggered changes to Microsoft's IP rights once OpenAI's board declared the threshold reached. Microsoft's IP license, a 27% stake worth roughly $135 billion, will be kept through 2032. Microsoft is reportedly looking to purchase Inception, a company that builds diffusion-based language models. It is interesting that Microsoft would spend $13 billion on a partner and then immediately start a shadow procurement process for a replacement.
SpaceXAI is reportedly losing top talent across coding, world models, and Grok voice. Rivals like Meta and Thinking Machines Lab are scooping up former staff. Elon Musk's culture of extreme work has led some staff to leave. Several of the exits could have been driven by a desire to cash out.
Nvidia has announced a partnership with Ineffable Intelligence, a startup pursuing superintelligence that was founded in late 2025 by UCL professor and former lead of DeepMind's reinforcement learning team, David Silver.
TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget. Learn more.
Unwrap is the AI-powered customer intelligence platform trusted at scale by Stripe, Southwest Airlines, Perplexity,and other customer-centric companies. With Unwrap, you get:All customer feedback automatically categorized by AI + NLPStructured feedback you can query using Unwrap Assistant, or in your favorite tools using Unwrap's MCPReal-time alerts for new feedback that demands your attentionA clear view of customer sentiment👉 Unwrap is offering a free trial of its tools to TLDR subscribers. Grab time with the team to get set up
Cerebras raised $5.5 billion in its US IPO at a market valuation of about $40 billion. This was the largest IPO this year so far. The IPO drew orders for more than 20 times the number of shares available. The offering was led by Morgan Stanley, Citigroup, Barclays, and UBS.
Anthropic launched Claude for Small Business, a package of connectors and workflows that embeds Claude into tools like QuickBooks, PayPal, HubSpot, Google Workspace, and Microsoft 365.
Anthropic has passed OpenAI in business adoption. More businesses used Anthropic than OpenAI in April. Anthropic has quadrupled business adoption over the last year, while OpenAI grew business adoption by only 0.3%. The pace of development in the AI industry is overriding the typical forces of vendor stickiness.
Recursive Superintelligence has raised more than $650 million at a valuation of more than $4 billion to build AI that can improve itself with little or no help from human developers. Its seven co-founders include notable researchers from many of the industry's leading AI companies. Many of these researchers specialize in AI systems that can run for long periods in pursuit of goals. While the researchers are bullish on the idea of AIs recursively improving themselves, the current technology is a long way from the point where humans can be removed from the loop.
OpenAI detailed the engineering behind Codex's Windows sandbox, which constrained local commands, file access, and networking permissions while still allowing coding agents to operate effectively on developer machines.
Vercel analyzed seven months of AI Gateway production traffic spanning hundreds of models and over 200,000 teams. The report showed rapid growth in agentic workloads, increasing adoption of open-source models, and heavy multi-model routing in large-scale deployments.
Superstar researchers at frontier labs can earn over a hundred times more than the average AI postdoc. Researcher quantity doesn't easily make up for quality in the field of AI. Even a 2x researcher can earn far more than the median because their contributions easily scale to billions of users. If they can add something that multiple 1x researchers can't, then it's worth paying a lot to capture it.
Agents are a mess to manage. With Agent Foundry, you get a single place to govern agents like OpenClaw and Claude Code. Real enforcement at the agent level, deployed in your own environment or SentinelOne's for zero lock-in. Powered by Prompt Security from SentinelOne. Join the waitlist
@cline/sdk is an open-source framework for building agentic applications. It has a plugin architecture that makes it easy to customize, and the framework has all the features expected from agents, like checkpoints, web fetch, MCPs, cron jobs, subagents, and more. The SDK can be used to run agents from CI/CD pipelines, create automations for end-to-end workflows, or embed agents directly inside products.
Perplexity outlined the security systems powering its autonomous Computer agent, including Firecracker microVM isolation, scoped connector permissions, and prompt injection defenses.
PyTorch 2.12 shipped major infrastructure updates including faster CUDA eigendecomposition, a unified graph capture API, MX quantization export support, and fused Adagrad optimizers.
Microsoft's MDASH AI system uses more than 100 specialized AI agents to work together across multiple AI models to find real-world software vulnerabilities. A set of agents scans code for potential vulnerabilities, and then a separate group of agents debates whether each finding is real and exploitable. A final stage constructs proof-of-concept attacks to confirm the bugs exist. MDASH surpassed Anthropic's Mythos model on the CyberGym benchmark, a test that measures how well AI systems can reproduce real-world vulnerabilities.
Krishna Rao, Anthropic's CFO, joined the company two years ago when run-rate revenue was about $250 million. It is now $30 billion. Rao helped raise around $75 billion and is responsible for the procurement and allocation of compute. This post links to an interview with Rao where he discusses compute, raising funds, pricing dynamics, how Anthropic's finance team uses Claude, Mythos, biotech and healthcare, and much more.
Your usage changes fast - your spend doesn't have to. Archera offers insured cloud commitments on AWS / Azure / GCP; you get reservation savings without the downside. Start with $0 platform fees
AI security is on everyone's mind - but few have a roadmap they can stand behind.The SANS AI Security Maturity Model helps you to assess your current stage and progress with confidence using a 5-stage framework with defined controls, metrics, and actions:✅ Mapped to NIST AI RMF, EU AI Act, ISO 42001, and OWASP✅ Evidenced-based scoring models across Protect, Govern, and Utilize✅ Step-by-step guidance your team can immediately applyDownload the SANS AI Security Maturity Model eBook by Chris Cochran, Field CISO and VP of AI SecurityBrowse more SANS AI resources for ways to build, break, and defend AI in production
Google and SpaceX were reportedly discussing orbital data centers as part of broader efforts to expand AI compute infrastructure beyond Earth-based facilities.
Meta's Muse Spark foundational model is now powering Meta AI across the company's services. The model enables faster voice responses, smarter shopping assistance, and real-time visual recognition through device cameras. The initial rollout targets users in the US and Canada.
Fast mode for Claude Opus 4.7 is now available in research preview in the API and Claude Code, and on Cursor, Emergent, Factory, v0, Warp, and Windsurf. Fast mode is currently opt-in, but it will eventually become the default. A link to join the waitlist for fast mode is available.
Inference workloads are more variable and less predictable than training workloads. This makes them a natural fit for serverless computing. However, serverless computing only works if new replicas can be spun up as fast as demand changes. This article looks at how Modal took AI inference server scaling from multiple kiloseconds to just tens of seconds.
The AI infrastructure boom has driven increased demand for analog and power semiconductors, notably benefiting Multilayer Ceramic Capacitors, amidst a past supply glut and competition. Companies like Texas Instruments and NXP Semiconductors are avoiding capacity expansion, focusing instead on raising prices and improving profitability. The semiconductor supply chain, previously supporting EV and solar industries, is now being leveraged for AI-related demand growth.
Parameter Golf attracted over 1,000 participants and 2,000 submissions focused on minimizing loss on a dataset within strict constraints. Participants leveraged a range of techniques, including careful tuning, quantization, and novel modeling ideas, with AI coding agents playing a significant role. This challenge revealed new talent and highlighted the evolving role of AI agents in research competitions.
With the ability to publish hundreds of CMS pages in a single click, operate at a global scale with seamless localization, and even host unified content across multiple domains, teams have never been able to ship faster. Trusted by companies like Miro, Bilt, and PerplexityLaunch your site today
Researchers derived compression-aware neural scaling laws by training nearly 1,300 models, revealing how bytes per token affect compute allocation. This challenges the heuristic that scales models by 20 tokens per parameter, showing it's due to specific tokenizers. The study suggests scaling should use bytes, not tokens, for better compute efficiency across diverse languages.
The article discusses using reinforcement learning to fine-tune 4B models as recursive language models (RLMs) for production, achieving efficient task-specific behavior at a lower cost. By training a shared policy for both parent and child RLMs, this approach maintains task performance and reduces the need for multiple models. In tests, this method matches the performance of larger models like Claude Sonnet 4.6 but operates with significantly reduced size and cost.