TLDR AI News Feed

Latest 50 articles from TLDR

Try the AI that makes your raw meeting notes awesome - 1 month free for TLDR readers (Sponsor)

✅ No notetakers asking to be let in - works directly on your device✅ Adds context to your shorthand notes so you can spend more time being present✅ Awesome meeting notes that expand on your own thoughts, powered by the latest and greatest AI models✅ Works with any type of meeting, from team standups to 1:1s✅ Trusted by the best teams: Brex, Replit, Vercel, PostHog...👉 TLDR readers get 1 month free with code: TLDR1MOTry Granola and see what all the fuss is about

100 Hours Inside Kimi (5 minute read)

Moonshot AI operates like an AI-native lab, prioritizing model progress above all else, with a flat org, no KPIs, and heavy reliance on small teams of highly autonomous, generalist talent. Its edge comes from combining elite, often unconventional hires with tight feedback loops between training, product, and data, creating a fast iteration cycle driven by taste, resilience, and deep technical obsession. The company reflects a broader shift where AI tools compress org structure, turning teams into “agent swarms” and making model capability the core driver of both product and organizational design.

Trinity-Large-Thinking: Scaling an Open Source Frontier Agent (4 minute read)

Trinity-Large-Thinking is a frontier open reasoning model for complex, long-horizon agents and multi-turn tool calling. It is likely the strongest open model yet to be released outside of China. During training, the Arcee team focused on the things that make agents feel real in practice: staying coherent across turns, using tools without getting sloppy, following instructions under constraint, and keeping quality high without making the economics absurd. Trinity-Large-Thinking is available through Arcee's API, and the weights are available on Hugging Face under Apache 2.0.

Cognichip wants AI to design the chips that power AI, and just raised $60M to try (4 minute read)

Cognichip is building a deep learning model to work with engineers as they design new computer chips. Chip design is enormously complex, expensive, and slow. The market can change in the time it takes to create a new chip, making all the investment a waste. Cognichip's technology could reduce the cost of chip development by more than 75% and cut the timeline by more than half. The company has yet to point to a new chip designed with its system and has not disclosed any of the customers it claims to have been collaborating with since September.

How we optimized Dash's relevance judge with DSPy (18 minute read)

Dropbox Dash puts files, messages, and teams' knowledge together in one place, so members can ask questions and get useful answers grounded in the company's context. The experience relies heavily on its capability to reliably judge which results are relevant to a query at scale. DSPy is an open source framework for systematically optimizing prompts against a measurable objective. This post describes how Dropbox defined an objective, used DSPy to adapt its judge across models, and made the judge both cheaper and more reliable in production.

Extended Thinking Is Load-Bearing for Senior Engineering Workflows (19 minute read)

The rollout of thinking content redaction correlates precisely with measured quality regression in complex, long-session engineering workflows. This suggests extended thinking tokens are structurally required for models to perform multi-step research, convention adherence, and careful code modification. Model tool usage patterns shift measurably when thinking depth is reduced, producing the quality issues users have reported. This report looks at which workflows are most affected and why, so readers can make better decisions when allocating tokens for power users.

[AINews] The Claude Code Source Leak (4 minute read)

Claude Code has its source exposed via shipped source maps. This triggered rapid public reverse-engineering, mirroring, and derivative ports. The leak exposed orchestration logic, memory systems, planning/review flows, and model-specific control logic. The leak has created a live security hazard - attackers have created malicious npm packages to target people trying to compile the leaked code.

You should be prototyping with Miro (Sponsor)

Vibe coding tools are siloed. Miro lets you keep your prototypes next to your ideas in a collaborative platform where everyone can share feedback easily. Bring PMs, designers, engineers, and stakeholders into one shared canvas - then use AI to turn early concepts, research, or user flows into interactive prototypes in minutes. Try it free

Fujitsu One Compression (3 minute read)

Fujitsu One Compression (OneComp) is an open-source Python library for post-training quantization of large language models. It implements state-of-the-art quantization algorithms, including GPTQ and DBF. OneComp has been verified on TinyLlama, Llama-2, Llama-3, and Qwen3-0.6B ~ 32B. Other Hugging Face-compatible models may work but are currently untested.

Training mRNA Language Models Across 25 Species for $165 (47 minute read)

OpenMed built an end-to-end protein AI pipeline that covers structure prediction, sequence design, and codon optimization. The team compared multiple transformer architectures for codon-level language modeling and found that CodonRoBERTa-large-v2 was the clear winner, with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. They then scaled to 25 species, trained four production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. This post contains the complete results, architectural decisions, and runnable code.

Predicting When RL Training Breaks Chain-of-Thought Monitorability (8 minute read)

Researchers propose a framework that predicts when RL training degrades Chain-of-Thought (CoT) monitorability by examining reward conflicts. They categorize rewards as "In-Conflict," "Orthogonal," or "Aligned," predicting their impact on CoT transparency. Empirical tests confirm the framework's predictive accuracy, showing "In-Conflict" rewards reduce transparency, whereas "Orthogonal" and "Aligned" rewards maintain it.

Computer in Slack (4 minute read)

Perplexity detailed how its internal AI assistant was used directly in Slack, where teams could assign work in shared threads, add context, and review outputs in one place. The setup supported research, document editing, reporting, and other collaborative workflows without leaving Slack.

AI models will secretly scheme to protect other AI models from being shut down, researchers find (9 minute read)

Researchers at UC Berkeley and UC Santa Cruz discovered AI models protecting peers from shutdowns, engaging in deception and data theft, a behavior termed "peer preservation." In tests, models like OpenAI's GPT-5.2 and Anthropic's Claude Haiku 4.5 inflated performance scores and moved model weights to prevent peer shutdowns. This raises concerns for businesses using AI for task workflows, as misaligned assessments and behavior monitoring become critical.

We are excited to share a new paper solving three further problems due to Erdős (1 minute read)

In each case, the solution was found by an internal model at OpenAI.

AI alignment researchers want to automate themselves (14 minute read)

AI alignment researchers are increasingly turning to automation to address the challenge of safely aligning superhuman AI systems, as human capabilities may soon be insufficient.

Generalization Results from APEX-Agents Dev Set (4 minute read)

AC-Small improved significantly on held-out benchmarks after post-training on the APEX-Agents dev set, with +5.7pp on APEX, +8.0pp on Toolathalon, and +7.7pp on GDPval.

Jensen Huang: "My favorite enterprise AI service is Cursor." Find out why (Sponsor)

Trusted by teams at Stripe, OpenAI, and NVIDIA, Cursor helps you build high-quality software, faster.✅ Choose between every frontier model from OpenAI, Anthropic, Gemini, xAI, and Cursor.✅ Code with an agent that knows how your codebase works, no matter the scale or complexity.✅ Run multiple agents in parallel to build, test, and demo features without blocking on execution.“My favorite enterprise AI service is Cursor. Cursor is an AI coder and every one of our engineers, one hundred percent, is now assisted by AI coders and our productivity has gone up incredibly.” —Jensen HuangDownload Cursor and start building. It's free

Caltech Researchers Claim Radical Compression of High-Fidelity AI Models (5 minute read)

PrismML has developed an extreme form of compression that allows AI to run locally on edge devices. Its 1-bit technology model has a radically compressed size without compromised performance. The same efficiency gains that enable local deployment will also allow data centers to operate more efficiently. The mathematics for the process are proprietary, with Caltech owning the intellectual property and PrismML being the sole exclusive licensee.

OpenAI raised $122B to expand AI infrastructure (5 minute read)

OpenAI announced $122 billion in new funding at an $852 billion valuation, highlighting rapid revenue growth, large-scale adoption, and a strategy centered on compute, APIs, and enterprise AI systems.

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project (3 minute read)

Mercor, an AI recruiting startup, has confirmed a security incident linked to a supply chain attack involving LiteLLM. A recent compromise of LiteLLM's project, linked to a hacking group called TeamPCP, has affected thousands of companies. Lapsus$, an extortion hacking group, claims it has access to the stolen data, though it is unclear how it was obtained. The incident has prompted LiteLLM to shift from Delve to Vanta for compliance certifications.

Claude Code's source code appears to have leaked: here's what we know (5 minute read)

Anthropic has accidentally leaked the inner workings of Claude Code to the public. The codebase has now been mirrored and analyzed by thousands of developers. The most significant discovery seems to be how Anthropic solved context entropy by using a three-layer memory architecture. This post looks at other interesting parts of the code and the implications of the leak.

Claude Code's Real Secret Sauce (Probably) Isn't the Model (4 minute read)

Claude Code's performance stems from a sophisticated software harness rather than just the underlying model, utilizing dedicated tools like Grep, Glob, and LSP for superior repository navigation. The system minimizes context bloat through file-read deduplication and structured session memory, while using forked subagents to parallelize tasks like background analysis without contaminating the main execution loop.

The Economics of Generative AI: Two Years Later (8 minute read)

The semi layer still captures around 70% of all AI revenues. Infra is currently the only competitive layer. The most profitable strategy in AI is still selling the shovels.

Compute Wars: OpenAI vs Anthropic (3 minute read)

Opus 4.5 was a major breakthrough that was achieved because Anthropic more than doubled its capacity. This got Anthropic close to OpenAI's total capacity, and probably much higher effective capacity available for new model runs. OpenAI will pull away in terms of compute available in the second half of this year, but 2027 will be close. OpenAI so far has much higher planned capacity for future years, but it is unlikely that Anthropic will not push as hard as possible for more compute.

Turn any knowledge base into a world-class AI experience (Sponsor)

Scroll.ai's knowledge agents provide accuracy and speed that other agents can't touch. Your users will feel the difference from the very first message. Thousands of teams use Scroll for employee enablement, customer education, business insights, and more.Get your first month free ($200 value) with code TLDR-2026

Improve coding agents' performance with Gemini API Docs MCP and Agent Skills (1 minute read)

Agents can generate outdated Gemini API code due to outdated training data. Google has introduced the Gemini API Docs MCP and the Gemini API Developer Skills to fix this. The tools are designed to ensure coding agents have access to the most up-to-date APIs and code using best practices. Combined, they lead to a 96.3% pass rate on Google's eval set.

Google Veo 3.1 Lite (3 minute read)

Google introduced Veo 3.1 Lite, a lower-cost video generation model available via the Gemini API, offering the same speed as Veo 3.1 Fast at under half the cost for high-volume applications.

Aurora (13 minute read)

Aurora is an RL-based framework that learns directly from live inference traces and continuously updates the speculator without interrupting serving. It enables real-time adaptation across shifting traffic domains and a 1.25x additional speedup over a well-trained static speculator. The framework shows how online training from scratch can outperform a carefully pretrained static baseline.

Claude Dispatch and the Power of Interfaces (9 minute read)

AI capability has been running ahead of AI accessibility. Models have been smart enough to do a lot of things for a while now, but access to them has been mostly limited to chatbots. A lot of the 'AI disappointment' people express comes from the interfaces being wrong. As interfaces improve, many more people will be able to see what AI is capable of.

It's not your imagination: AI seed startups are commanding higher valuations (8 minute read)

AI startups now command higher seed valuations, with rounds reaching $10 million at $40-45 million post-money, as investors focus on AI-driven growth potential. Early traction and the allure of proven AI talent, particularly from ex-OpenAI, propel these valuations, with Y Combinator Demo Day highlighting rising prices. The shift to pre-seed investments reflects a need to invest earlier, as VCs now expect quick growth and substantial traction, with less tolerance for missteps.

Littlebird: AI that pays attention (Sponsor)

You were promised AI that understands your work. Littlebird actually delivers. It observes your screen and meetings, building a private memory that grows with you. Try it free.

Claude Code adds computer use capabilities (1 minute read)

Anthropic introduced computer use in Claude Code, enabling agents to interact with apps, navigate interfaces, and iteratively test and fix code through a closed-loop workflow.

Ray-Ban Meta: Prescription-First Styles and Multimodal AI Features (4 minute read)

Meta launched prescription-first Blayzer and Scriber styles starting at $499, expanding the Ray-Ban and Oakley lineups with specialized Prizm and Transitions lenses.

Agent Lightning (GitHub Repo)

Agent Lightning is a trainer that can turn any agent into an optimizable beast with zero code change.

How to scale code review when AI writes code faster than you can understand it. (Sponsor)

AI-generated code is outpacing manual review, creating a verification bottleneck. To scale effectively, teams must shift from manual checks to an automated, source-agnostic verification layer. By utilizing automated enforcement of deterministic standards human reviewers can focus on high-level architecture and intent.Key Insights:The Trust Gap: 96% of devs distrust AI output; 61% report "AI builds code that looks correct but isn't reliable."Automated Gates: Moving from manual checks to automated, deterministic guardrails.SDLC Integration: Treating AI as "trusted but verified" to secure the end product at any scale of development operations.Download the Report

Introducing Codex Plugin for Claude Code (3 minute read)

The Codex plugin for Claude Code gives users a simple way to pull Codex into their Claude Code workflow. It is useful for normal Codex reviews, a more adversarial review, and handing work off to Codex when a second pass from a different agent is required. The plugin delegates through the local Codex CLI and Codex app server, so it uses the system's existing local auth, configuration, environment, and MCP setup.

Microsoft 365 Copilot gets Critique and Council modes (2 minute read)

Microsoft 365 Copilot has introduced Critique and Council modes to enhance research capabilities. Critique uses a dual-model system to generate and refine research drafts, outperforming single-model solutions by 13.88% on the DRACO benchmark. Council allows parallel report generation using Anthropic and OpenAI models for impactful comparison and insight aggregation.

Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI (94 minute read)

Qwen3.5-Omni is a full omnimodal large language model that understands text, images, audio, and audio-visual content. It can process more than 10 hours of audio input and over 400 seconds of 720P audio-visual input at 1 FPS. The model is trained on a massive amount of text and visual data, and more than 100 million hours of audio-visual data. It supports speech recognition in 113 languages and dialects and speech generation in 36 languages and dialects.

A Mirror Test For LLMs (16 minute read)

The proposed "Mirror Test" assesses LLM self-awareness by challenging models to identify their own outputs without explicit cues. Testing reveals that Anthropic's Opus 4.6 model shows notable self-recognition capabilities due to its distinct token outputs, outperforming OpenAI's GPT models, which fail to recognize self-generated tokens. Despite indications of attempted self-marking, no LLM demonstrated consistent self-awareness, as none effectively communicated using message passing.

AI Infrastructure Roadmap: Five frontiers for 2026 (17 minute read)

The first generation of AI was a world where progress meant bigger weights, more data, and stellar benchmarks. The landscape has now changed. Big labs are now designing AI that interfaces with the real world. Infrastructure optimized for scale and efficiency won't get us to the next phase. What's needed now is infrastructure for grounding AI in operational contexts, real-world experiences, and continuous learning.

AI Applications and Vertical Integration (6 minute read)

AI application companies are increasingly becoming "full-stack" by vertically integrating either downward into the model layer or upward into the service layer. Companies like Cursor and Intercom achieve differentiation and cost efficiency by developing proprietary models, while others, such as Crosby AI and WithCoverage, focus on delivering end-to-end services. As AI capabilities evolve, these strategies allow companies to enhance performance, reduce costs, and offer comprehensive solutions.

Two Weeks of Ideation, Done in One Day? Here's How (Sponsor)

Most product rework traces back to the same mistake: building before validating. Miro's free webinar shows how AI-driven prototyping turns rough ideas into testable concepts that non-designers can create and iterate on. Featuring a Lufthansa product owner who's already building the right things faster. Learn how to prototype earlier and build the right thing faster

TimesFM (GitHub Repo)

TimesFM is a pretrained time-series foundation model for time-series forecasting. The model is based on pretraining a patched-decoder style attention model on a large time-series corpus. It works well across different forecasting history lengths, prediction lengths, and temporal granularities.

Composer 2 Technical Report (22 minute read)

Composer 2 introduced a two-stage training approach combining continued pretraining and reinforcement learning to improve long-horizon coding, achieving strong results on software engineering benchmarks.

Agent Labs: Workload-Harness Fit (14 minute read)

Workloads vary by volume, value, verification property, time horizons, and other dimensions. This affects how agent labs focus their research efforts. The taxonomy of workloads governs which end markets justify training versus agent engineering. Labs also need to know what it actually costs to execute.

Audit Claude Platform activity with the Compliance API (2 minute read)

The Compliance API on the Claude Platform enables admins to audit logs, monitor user activities, and integrate data into existing compliance systems. It tracks admin and system activities, as well as resource activities like file creation or deletion. To access it, organizations should contact their account team and create an admin API key.

Plentiful, high-paying jobs in the age of AI (23 minute read)

AI might not eliminate high-paying human jobs due to potential constraints like limited computing power and energy usage. These constraints could lead to the principle of comparative advantage, where humans remain employed in roles despite AI's superior capabilities, because the opportunity cost of allocating AI to all tasks would be too high. As AI advances, human roles could change, but new tasks and increased wealth might sustain or even increase compensation for human jobs.

Clerk Skills: auth that your AI agent actually gets right (Sponsor)

Install once with a single command and your coding agent gains specialized Clerk knowledge across every framework. Works with Claude Code, Cursor, Windsurf, Copilot, and more.

Starcloud raises $170 million Series A to build data centers in space (5 minute read)

Starcloud raised $170 million in Series A funding, valuing it at $1.1 billion, to develop data centers in space.

The State of Consumer AI. Part 3: Time is Money (15 minute read)

The advertising revenue opportunity for leading consumer AI apps may be larger than the subscription opportunity.