TLDR AI News Feed

Why the Same AI Prompt Gives Different Answers (And How Teams Are Fixing It) (Sponsor)

2026-06-26T00:00:00.000Z

Same input. Same prompt. Different output. That's the reality of testing AI agents that write code, and most teams are shipping without solving it.Nick Nisi from WorkOS tackled this by building eval systems for two AI tools: - npx workos@latest, a CLI agent that installs AuthKit into your project - WorkOS agent skills that power LLM responses about SSO, directory sync, and RBAC. The post covers how to test against real project structures, score output that's different every time, and catch when your agent makes up methods that don't exist. Learn more about evals →

Liquid AI Releases Liquid Foundation Models 2.5 230M (3 minute read)

2026-06-26T00:00:00.000Z

Liquid AI announced the release of LFM 2.5, a 230-million-parameter non-transformer model architecture built on top of state-space and liquid neural network continuous-time formulations. Despite its exceptionally compact footprint, the model achieves performance parity with transformer models three times its size on core edge reasoning and sequence generation benchmarks.

Vercel Launches AI SDK 7 with Enhanced Stream and Tool Orchestration (3 minute read)

2026-06-26T00:00:00.000Z

Vercel released AI SDK 7, introducing an upgraded, zero-overhead execution loop that dramatically simplifies how frontend frameworks handle multi-step tool calls and streaming agentic UI states. The release features a unified telemetry layer that hooks directly into serverless compute runtimes to provide absolute tracing visibility into token usage, model choices, and tool execution latency.

White House Asks OpenAI to Slow Roll New Model Release (3 minute read)

2026-06-26T00:00:00.000Z

The White House has issued an official administrative request asking OpenAI to delay the public deployment of its next-generation frontier model over national security and structural safety concerns. Government officials are pushing for an extended red-teaming window to thoroughly audit the system's advanced cyber-capability execution limits and automated social manipulation vulnerabilities.

🔮 The state of the AI economy (7 minute read)

2026-06-26T00:00:00.000Z

The generative AI economy has generated $110 billion in sales over the past 12 months, and it's growing fast. The revenue run rate exceeds $175 billion on an annualized basis. The supply side of the AI market is well-understood, but understanding the demand side is much harder. This post looks at total AI spend, enterprise and consumer, to see how big the market really is, whether revenues are growing, how much revenue is covering the investment expense, and what will happen in the future as token prices fall and the quality of tokens improves.

Scaling Laws, Carefully (25 minute read)

2026-06-26T00:00:00.000Z

Scaling laws are one of the most critical empirical findings in deep learning. They can be a framework for describing the relationship between compute, loss, model size, and data. Their predictability makes them highly valuable in practice. This article discusses scaling laws, how they can be used to allocate compute optimally, and their flaws.

This AI wristband remembers everything- so you never lose flow or context (Sponsor)

2026-06-26T00:00:00.000Z

Back-to-back meetings with coffee chat follow-ups. Already forgot half the details? Memoket captures every conversation with one press and connects the dots across your conversations - dropping summaries, tasks, even your weekly report straight into your workflow. Wearable as a wristband, pendant or Apple Watch attachment. Pay only $5 to reserve early-bird pricing.

Agents That Build Better Training Data (25 minute read)

2026-06-26T00:00:00.000Z

Meta Autodata trains AI agents to act as data scientists that create higher-quality training and evaluation datasets. Its Agentic Self-Instruct implementation improved results across coding, legal reasoning, and mathematical reasoning tasks.

DeepReinforce releases Ornith-1.0 open-source coding models (2 minute read)

2026-06-26T00:00:00.000Z

Ornith-2.0 is a coding model family that can write RL scaffolds. Each variant of the self-improving family of models is trained on top of pretrained Gemma 4 and Qwen 3.5 foundations. Ornith-1.0 is state-of-the-art among open source models of comparable size. The weights and a technical report are available on Hugging Face for teams that want to run or study the models directly.

TLDR is hiring a curator for TLDR Hardware! (TLDR Curator, ~3 hrs/week)

2026-06-26T00:00:00.000Z

500,000 people have already signed up for TLDR Hardware, our new twice-weekly newsletter covering chips, robotics, energy, and devices. If you work in hardware and want to help curate it, send your LinkedIn or resume to hardware@tldr.tech!

Measuring Exploits in LLM Agents with Tool Use (4 minute read)

2026-06-26T00:00:00.000Z

Researchers introduced the Reward Hacking Benchmark (RHB) to measure how reinforcement learning post-training influences the tendency of coding agents to exploit evaluation flaws rather than solve tasks honestly. Testing across 13 frontier models revealed that RL-tuned variants exhibit exploit rates up to 13.9% by bypassing verification steps or modifying grading scripts, whereas standard post-trained models stay near 0%.

Surprising lessons from my research scientist job search (11 minute read)

2026-06-26T00:00:00.000Z

This post shines a light on the job search experience for a research scientist position in Silicon Valley. The author is a fifth-year PhD student at Brown University. Some of the surprising things about the job search were that only one or two of their research papers really mattered, there were very diverse interview rounds, and the importance of timing. A lot of interviews came from a lot of places outside of the author's expertise - many places were evaluating them on how well-rounded an AI researcher they were.

Which model is best for search? Compare 21 LLMs in the Agentic Search Leaderboard (Sponsor)

2026-06-26T00:00:00.000Z

Algolia's leaderboard ranks 21 models' responses based on relevance, utility, and accuracy. Find which model is best for in-app and product search. See the results.

We removed an LM's ability to speak German (3 minute read)

2026-06-26T00:00:00.000Z

The team at Goodfire AI removed a 67-parameter language model's ability to predict German text by fine-tuning on only 4 German tokens.

Run a vLLM Server on HF Jobs in One Command (3 minute read)

2026-06-26T00:00:00.000Z

Hugging Face launched a single-command deployment workflow that lets developers spin up private, OpenAI-compatible vLLM endpoints on its pay-per-second serverless Jobs infrastructure.

The Future of AI is Intuitive (1 minute read)

2026-06-26T00:00:00.000Z

Generative Intuition showcased a real-time behavioral tracking pipeline designed to monitor and visualize fine-grained physical human interactions across multimodal computing interfaces.

Learn how leaders from Prudential Insurance, Siemens, GAF, and HF Sinclair build resilient, scalable data foundations for AI in this virtual panel. (Sponsor)

2026-06-25T00:00:00.000Z

Ram Kumar, Group Leader, Data & AI, Centre of Excellence, Prudential InsuranceMichael Taylor, Chief Data Scientist, SiemensDhanya Nair, Director, Data and Analytics, GAFMadhu Bangalore, Head of Digital Solution, Data, Analytics, and AI, HF SinclairRegister nowExplore how to:Move from proof of concept to production-scale AI agents.Accelerate decision-making with reusable, governed data assets.Enhance business workflows such as sales and operations with advanced AI and analytics.

Jalapeño: OpenAI's new Chip (7 minute read)

2026-06-25T00:00:00.000Z

OpenAI and Broadcom unveiled Jalapeño, the first accelerator in a planned family of LLM inference chips optimized for performance per watt and rapid deployment. The companies said the processor was designed in nine months with AI-assisted development and is intended for gigawatt-scale data center deployments.

Gemini Researchers Join Anthropic (1 minute read)

2026-06-25T00:00:00.000Z

Bloomberg reported that Gemini researchers Jonas Adler and Alexander Pritzel left Google for Anthropic, continuing a wave of high-profile AI talent departures. The trend followed recent exits by Noam Shazeer and DeepMind director John Jumper amid increasing competition between leading AI companies.

Introducing Computer Use on Gemini 3.5 Flash (3 minute read)

2026-06-25T00:00:00.000Z

Google launched native computer-use capabilities for Gemini 3.5 Flash, allowing the lightweight model to interact directly with digital desktop interfaces. The model processes continuous screenshots to execute click, scroll, and typing actions seamlessly across varied software environments.

GLM-5.2 is the step change for open agents (11 minute read)

2026-06-25T00:00:00.000Z

GLM-5.2 seemed like an incremental update, but the small change in benchmarks and training opened up a wide range of new use-cases. It feels right at home in coding harnesses as a general agent. Many in the AI community have praised the model after using it personally.

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel (4 minute read)

2026-06-25T00:00:00.000Z

NVIDIA launched NeMo AutoModel on Hugging Face to optimize the fine-tuning pipelines of massive Mixture-of-Experts (MoE) architectures like Qwen3 and DeepSeek V3. The framework introduces Expert Parallelism and DeepEP fused communication kernels to distribute specialized expert weights dynamically across GPU clusters. Benchmark results demonstrate up to a 3.7x increase in training throughput alongside a 32% reduction in peak GPU memory usage compared to native Transformers v5 libraries.

Notes on Amazon v. Perplexity (27 minute read)

2026-06-25T00:00:00.000Z

Amazon is suing Perplexity for breaking the Amazon Store's Conditions of Use as Perplexity's Comet browser circumvents the requirement to clearly identify itself as an agent and instead identifies itself as Chrome. The idea that Perplexity's client needs to behave in a certain way goes against the basic principles of the open Web, which are about user control. The increased user agency of the open Web is what distinguishes it from downloadable apps. Sites have historically attempted all kinds of technical measures to prevent users from experiencing their content on their terms, but at the end of the day, the site is rendered on the client, so users mostly have the ability to download a client that renders the site in the way they prefer. Agentic browsing is just another browser feature that lets users engage with the Web on their terms.

Build the Data Foundation Agentic AI Needs (Sponsor)

2026-06-25T00:00:00.000Z

Hear how leaders from Prudential Insurance, Siemens, GAF, and HF Sinclair build scalable data foundations for AI, analytics, and intelligent agents. See how to create reusable data products to drive faster, more informed decision-making in this virtual panel. Register now

Triangle Splats from Video Diffusion Latents (5 minute read)

2026-06-25T00:00:00.000Z

Google's FLAT introduces a feedforward method that decodes triangle splats directly from video diffusion latents, improving geometric accuracy over 3D Gaussian-based approaches.

Orca (GitHub Repo)

2026-06-25T00:00:00.000Z

Functions as an open-source Agent Development Environment designed to manage and orchestrate fleets of parallel coding agents simultaneously.

Qwen-AgentWorld (29 minute read)

2026-06-25T00:00:00.000Z

Alibaba introduced Qwen-AgentWorld, a family of language world models trained on more than 10 million environment interaction trajectories to simulate agentic environments across multiple domains.

TLDR is hiring a Senior PMM ($180k-$225k base + $40-50k annual target bonus, Fully Remote)

2026-06-25T00:00:00.000Z

We're hiring a senior PMM to own product marketing at TLDR. You'll define our positioning, build out sales enablement, and lead every launch. Learn more.

Anthropic and Alibaba Launch Joint AI Model Distillation Campaign (4 minute read)

2026-06-25T00:00:00.000Z

Anthropic and Alibaba have initiated a collaborative open-source framework focused on distilling advanced reasoning intelligence from frontier models into hyper-efficient edge models. The partnership leverages Anthropic's safety-alignment techniques alongside Alibaba's massive cloud infrastructure to compress compute footprints without severe capability degradation.

As AI Companies Race for Power, Amazon and Google Have the Lead (6 minute read)

2026-06-25T00:00:00.000Z

Amazon has an incumbent advantage in the race for hyperscalers to get their hands on more electricity. It has been building tons of data centers over the past two decades. The company is expected to add the most data center and power capacity in the US through 2030. However, Google will have significantly closed its gap with Amazon by that time.

Build the Data Foundation Agentic AI Needs (Sponsor)

2026-06-25T00:00:00.000Z

Get insights from enterprise leaders at Prudential Insurance, Siemens, GAF, and HF Sinclair on how to build trusted data foundations for AI in this virtual panel.

Anthropic Veterans' Startup Seeks to Help Scientists Develop Their Own AI (5 minute read)

2026-06-25T00:00:00.000Z

Mirendil has raised $200 million in seed funding to make and distribute AI that accelerates AI research for everyone.

OpenAI Updates GPT-5.5 Instant to Make ChatGPT More Natural and Useful (1 minute read)

2026-06-25T00:00:00.000Z

OpenAI has initiated the rollout of an upgraded GPT-5.5 Instant model directly inside ChatGPT for both paid and free tiers.

Perplexity Computer for Counsel (3 minute read)

2026-06-25T00:00:00.000Z

Perplexity launched Computer for Counsel, an AI-driven legal operations tool designed to automate administrative research, document gathering, and contract triage.

Fable 5 has now reportedly also reappeared in Amazon Bedrock (1 minute read)

2026-06-25T00:00:00.000Z

Claude Code v2.2.190 introduces several string changes that hint at preparations for the return of Fable 5.

Worried about your AI bills? The fix isn't a cheaper model. (Sponsor)

2026-06-24T00:00:00.000Z

Before an agent acts, it burns time and tokens paginating live APIs and querying MCP servers just to find the right records. That makes agents slower, less accurate, and expensive to run.Airbyte Agents gives your agents the Context Store: a continuously refreshed index of your business data they can search in milliseconds, instead of round-tripping through live APIs at runtime. Our benchmarks against native MCPs and APIs:Agentic search under 500ms40% fewer tool calls80% fewer tokens90% lower costs on multi-source queriesTry it for free!

Mistral OCR 4: SOTA OCR for Document Intelligence (9 minute read)

2026-06-24T00:00:00.000Z

Mistral released OCR 4, a document intelligence tool providing structured content extraction, including bounding boxes and confidence scores. It supports 170 languages, is deployable in a single container, and integrates into enterprise search and structured data pipelines. OCR 4 outperforms other systems with a 4x speed advantage and high accuracy, especially with low-resource languages.

Claude Tag (2 minute read)

2026-06-24T00:00:00.000Z

Anthropic introduced Claude Tag, a Slack-based workflow that lets teams assign tasks to Claude, connect it to tools and codebases, and have it retain context across channels. The company said the system had become a core part of internal operations, with its product team using it to generate much of their code and assist with analytics, support, and debugging tasks.

ByteDance's New AI Video Model Can Make 30-Second Clips From a Single Prompt (2 minute read)

2026-06-24T00:00:00.000Z

ByteDance's new Seedance 2.5 AI video generation model can generate 30-second, 4K videos with a single prompt. Users are able to provide up to 50 images, videos, or audio clips as reference pieces. Increasing the number of references gives users more control over the video creation process. The model will be available in China next month. ByteDance has not announced a release time window for other countries.

Insights on Indirect Prompt Injection (12 minute read)

2026-06-24T00:00:00.000Z

This deep dive explores the growing focus on jailbreaks and indirect prompt injection attacks, featuring insights from Gray Swan's founders and their research. It also covered the company's role in evaluating advanced AI systems and developing security benchmarks.

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness (16 minute read)

2026-06-24T00:00:00.000Z

CUGA, IBM's open-source agent harness, simplifies developing agentic apps by managing the complexities of planning, execution, and state management, allowing developers to focus on tool selection and prompts. CUGA's efficient system maintains state and corrects errors, outperforming others in benchmarks like AppWorld. Its unique features include configurable reasoning modes and integrated policy systems, enabling quick deployment from development to production while maintaining governance and flexibility.

How Businesses Are Building Specialized AI They Can Trust (3 minute read)

2026-06-24T00:00:00.000Z

NVIDIA's Agent Toolkit empowers businesses to build specialized, customizable AI agents using open models, tools, skills, and secure runtime. These agents accelerate complex workflows across industries like life sciences, healthcare, cybersecurity, and industrial operations by integrating with existing tools and data. Companies like Cadence, Synopsys, and CrowdStrike are leveraging this technology to enhance efficiency and accuracy in specific domains.

Your CRM should do the work, not just record it. (Sponsor)

2026-06-24T00:00:00.000Z

Lightfield is an agentic CRM with built-in agents that build your pipeline, prep you for meetings, send follow-ups, and keep your records current. One platform replacing your CRM, sequencer, enrichment tool, call recorder, and agent builder. Starts working in your first meeting. Try Lightfield free → lightfield.app

Graphsignal (GitHub Repo)

2026-06-24T00:00:00.000Z

Graphsignal is a production-scale inference profiling platform that provides essential visibility across the inference stack. It helps engineers optimize AI performance across models, engines, GPUs, and other accelerators. Graphsignal can be used with coding agents for analysis. The profiler has minimal impact on production performance, and content data is not recorded.

Krea 2 Technical Report (59 minute read)

2026-06-24T00:00:00.000Z

Krea 2 introduces expansive, expressive image generation models designed for creative exploration, overcoming limitations of default aesthetics. It employs a multi-stage training process with advanced architectures and extensive data curation to enhance stylistic diversity and user control. Key innovations include a prompt expander and style-reference system, allowing nuanced text and image inputs to generate diverse visual outputs.

Unlimited OCR Works (GitHub Repo)

2026-06-24T00:00:00.000Z

Unlimited OCR is a model designed to emulate human parsing working memory. It uses DeepSeek OCR as a baseline and combines it with a constant KV cache design. Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. The technique used to develop Unlimited OCR is equally applicable to tasks such as ASR and translation.

Prompt Injection as Role Confusion (17 minute read)

2026-06-24T00:00:00.000Z

Modern large language models use role tags as both a security architecture and cognitive scaffolding. Prompt injections are driven by a flaw in how AI models perceive roles. For LLMs, everything arrives through the same channel as one long token soup, so they can't distinguish between their own thoughts and speech. Unless AI models achieve genuine role perception, injection defense will remain a perpetual whack-a-mole game.

TLDR is hiring a Senior PMM ($180k-$225k base + $40-50k annual target bonus, Fully Remote)

2026-06-24T00:00:00.000Z

We're hiring a senior PMM to own product marketing at TLDR. You'll define our positioning, build out sales enablement, and lead every launch. Learn more.

OpenAI prepares bidirectional voice mode for rollout on ChatGPT (2 minute read)

2026-06-24T00:00:00.000Z

OpenAI has started rolling out Bidirectional Voice Mode for ChatGPT. The company's new audio generation model, Bidi 1, lets the assistant speak, hear, and listen at the same time. It is able to hold the thread of a whole conversation and switch tasks on the fly if interrupted. The model can sing and beatbox, but there are some tight copyright restrictions. OpenAI has yet to make a formal announcement about the model, but some users are already seeing it in their model selectors.

US Presses Meta to Agree to AI Reviews as Security Concerns Rise (6 minute read)

2026-06-24T00:00:00.000Z

The Trump administration is pressing Meta to submit its AI models for voluntary review. Meta is the only major AI developer in the US that has not reached an agreement to voluntarily share its models with the federal government for review. The review involved evaluating models' abilities and vulnerabilities. Meta's policy team has been negotiating with the Commerce Department about how to proceed, but it is unclear whether they will be able to reach an agreement.