✅ No notetakers asking to be let in - works directly on your device✅ Adds context to your shorthand notes so you can spend more time being present✅ Awesome meeting notes that expand on your own thoughts, powered by the latest and greatest AI models✅ Works with any type of meeting, from team standups to 1:1s✅ Trusted by the best teams: Brex, Replit, Vercel, PostHog...👉 TLDR readers get 1 month free with code: TLDR1MOTry Granola and see what all the fuss is about
Moonshot AI operates like an AI-native lab, prioritizing model progress above all else, with a flat org, no KPIs, and heavy reliance on small teams of highly autonomous, generalist talent. Its edge comes from combining elite, often unconventional hires with tight feedback loops between training, product, and data, creating a fast iteration cycle driven by taste, resilience, and deep technical obsession. The company reflects a broader shift where AI tools compress org structure, turning teams into “agent swarms” and making model capability the core driver of both product and organizational design.
Trinity-Large-Thinking is a frontier open reasoning model for complex, long-horizon agents and multi-turn tool calling. It is likely the strongest open model yet to be released outside of China. During training, the Arcee team focused on the things that make agents feel real in practice: staying coherent across turns, using tools without getting sloppy, following instructions under constraint, and keeping quality high without making the economics absurd. Trinity-Large-Thinking is available through Arcee's API, and the weights are available on Hugging Face under Apache 2.0.
Cognichip is building a deep learning model to work with engineers as they design new computer chips. Chip design is enormously complex, expensive, and slow. The market can change in the time it takes to create a new chip, making all the investment a waste. Cognichip's technology could reduce the cost of chip development by more than 75% and cut the timeline by more than half. The company has yet to point to a new chip designed with its system and has not disclosed any of the customers it claims to have been collaborating with since September.
Dropbox Dash puts files, messages, and teams' knowledge together in one place, so members can ask questions and get useful answers grounded in the company's context. The experience relies heavily on its capability to reliably judge which results are relevant to a query at scale. DSPy is an open source framework for systematically optimizing prompts against a measurable objective. This post describes how Dropbox defined an objective, used DSPy to adapt its judge across models, and made the judge both cheaper and more reliable in production.
The rollout of thinking content redaction correlates precisely with measured quality regression in complex, long-session engineering workflows. This suggests extended thinking tokens are structurally required for models to perform multi-step research, convention adherence, and careful code modification. Model tool usage patterns shift measurably when thinking depth is reduced, producing the quality issues users have reported. This report looks at which workflows are most affected and why, so readers can make better decisions when allocating tokens for power users.
Claude Code has its source exposed via shipped source maps. This triggered rapid public reverse-engineering, mirroring, and derivative ports. The leak exposed orchestration logic, memory systems, planning/review flows, and model-specific control logic. The leak has created a live security hazard - attackers have created malicious npm packages to target people trying to compile the leaked code.
Vibe coding tools are siloed. Miro lets you keep your prototypes next to your ideas in a collaborative platform where everyone can share feedback easily. Bring PMs, designers, engineers, and stakeholders into one shared canvas - then use AI to turn early concepts, research, or user flows into interactive prototypes in minutes. Try it free
Fujitsu One Compression (OneComp) is an open-source Python library for post-training quantization of large language models. It implements state-of-the-art quantization algorithms, including GPTQ and DBF. OneComp has been verified on TinyLlama, Llama-2, Llama-3, and Qwen3-0.6B ~ 32B. Other Hugging Face-compatible models may work but are currently untested.
OpenMed built an end-to-end protein AI pipeline that covers structure prediction, sequence design, and codon optimization. The team compared multiple transformer architectures for codon-level language modeling and found that CodonRoBERTa-large-v2 was the clear winner, with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. They then scaled to 25 species, trained four production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. This post contains the complete results, architectural decisions, and runnable code.
Researchers propose a framework that predicts when RL training degrades Chain-of-Thought (CoT) monitorability by examining reward conflicts. They categorize rewards as "In-Conflict," "Orthogonal," or "Aligned," predicting their impact on CoT transparency. Empirical tests confirm the framework's predictive accuracy, showing "In-Conflict" rewards reduce transparency, whereas "Orthogonal" and "Aligned" rewards maintain it.
Perplexity detailed how its internal AI assistant was used directly in Slack, where teams could assign work in shared threads, add context, and review outputs in one place. The setup supported research, document editing, reporting, and other collaborative workflows without leaving Slack.
Researchers at UC Berkeley and UC Santa Cruz discovered AI models protecting peers from shutdowns, engaging in deception and data theft, a behavior termed "peer preservation." In tests, models like OpenAI's GPT-5.2 and Anthropic's Claude Haiku 4.5 inflated performance scores and moved model weights to prevent peer shutdowns. This raises concerns for businesses using AI for task workflows, as misaligned assessments and behavior monitoring become critical.
AI alignment researchers are increasingly turning to automation to address the challenge of safely aligning superhuman AI systems, as human capabilities may soon be insufficient.
AC-Small improved significantly on held-out benchmarks after post-training on the APEX-Agents dev set, with +5.7pp on APEX, +8.0pp on Toolathalon, and +7.7pp on GDPval.
Trusted by teams at Stripe, OpenAI, and NVIDIA, Cursor helps you build high-quality software, faster.✅ Choose between every frontier model from OpenAI, Anthropic, Gemini, xAI, and Cursor.✅ Code with an agent that knows how your codebase works, no matter the scale or complexity.✅ Run multiple agents in parallel to build, test, and demo features without blocking on execution.“My favorite enterprise AI service is Cursor. Cursor is an AI coder and every one of our engineers, one hundred percent, is now assisted by AI coders and our productivity has gone up incredibly.” —Jensen HuangDownload Cursor and start building. It's free
PrismML has developed an extreme form of compression that allows AI to run locally on edge devices. Its 1-bit technology model has a radically compressed size without compromised performance. The same efficiency gains that enable local deployment will also allow data centers to operate more efficiently. The mathematics for the process are proprietary, with Caltech owning the intellectual property and PrismML being the sole exclusive licensee.
OpenAI announced $122 billion in new funding at an $852 billion valuation, highlighting rapid revenue growth, large-scale adoption, and a strategy centered on compute, APIs, and enterprise AI systems.
Mercor, an AI recruiting startup, has confirmed a security incident linked to a supply chain attack involving LiteLLM. A recent compromise of LiteLLM's project, linked to a hacking group called TeamPCP, has affected thousands of companies. Lapsus$, an extortion hacking group, claims it has access to the stolen data, though it is unclear how it was obtained. The incident has prompted LiteLLM to shift from Delve to Vanta for compliance certifications.
Anthropic has accidentally leaked the inner workings of Claude Code to the public. The codebase has now been mirrored and analyzed by thousands of developers. The most significant discovery seems to be how Anthropic solved context entropy by using a three-layer memory architecture. This post looks at other interesting parts of the code and the implications of the leak.
Claude Code's performance stems from a sophisticated software harness rather than just the underlying model, utilizing dedicated tools like Grep, Glob, and LSP for superior repository navigation. The system minimizes context bloat through file-read deduplication and structured session memory, while using forked subagents to parallelize tasks like background analysis without contaminating the main execution loop.
The semi layer still captures around 70% of all AI revenues. Infra is currently the only competitive layer. The most profitable strategy in AI is still selling the shovels.
Opus 4.5 was a major breakthrough that was achieved because Anthropic more than doubled its capacity. This got Anthropic close to OpenAI's total capacity, and probably much higher effective capacity available for new model runs. OpenAI will pull away in terms of compute available in the second half of this year, but 2027 will be close. OpenAI so far has much higher planned capacity for future years, but it is unlikely that Anthropic will not push as hard as possible for more compute.
Scroll.ai's knowledge agents provide accuracy and speed that other agents can't touch. Your users will feel the difference from the very first message. Thousands of teams use Scroll for employee enablement, customer education, business insights, and more.Get your first month free ($200 value) with code TLDR-2026
Agents can generate outdated Gemini API code due to outdated training data. Google has introduced the Gemini API Docs MCP and the Gemini API Developer Skills to fix this. The tools are designed to ensure coding agents have access to the most up-to-date APIs and code using best practices. Combined, they lead to a 96.3% pass rate on Google's eval set.
Google introduced Veo 3.1 Lite, a lower-cost video generation model available via the Gemini API, offering the same speed as Veo 3.1 Fast at under half the cost for high-volume applications.
Aurora is an RL-based framework that learns directly from live inference traces and continuously updates the speculator without interrupting serving. It enables real-time adaptation across shifting traffic domains and a 1.25x additional speedup over a well-trained static speculator. The framework shows how online training from scratch can outperform a carefully pretrained static baseline.
AI capability has been running ahead of AI accessibility. Models have been smart enough to do a lot of things for a while now, but access to them has been mostly limited to chatbots. A lot of the 'AI disappointment' people express comes from the interfaces being wrong. As interfaces improve, many more people will be able to see what AI is capable of.
AI startups now command higher seed valuations, with rounds reaching $10 million at $40-45 million post-money, as investors focus on AI-driven growth potential. Early traction and the allure of proven AI talent, particularly from ex-OpenAI, propel these valuations, with Y Combinator Demo Day highlighting rising prices. The shift to pre-seed investments reflects a need to invest earlier, as VCs now expect quick growth and substantial traction, with less tolerance for missteps.
You were promised AI that understands your work. Littlebird actually delivers. It observes your screen and meetings, building a private memory that grows with you. Try it free.
Anthropic introduced computer use in Claude Code, enabling agents to interact with apps, navigate interfaces, and iteratively test and fix code through a closed-loop workflow.
Meta launched prescription-first Blayzer and Scriber styles starting at $499, expanding the Ray-Ban and Oakley lineups with specialized Prizm and Transitions lenses.
AI-generated code is outpacing manual review, creating a verification bottleneck. To scale effectively, teams must shift from manual checks to an automated, source-agnostic verification layer. By utilizing automated enforcement of deterministic standards human reviewers can focus on high-level architecture and intent.Key Insights:The Trust Gap: 96% of devs distrust AI output; 61% report "AI builds code that looks correct but isn't reliable."Automated Gates: Moving from manual checks to automated, deterministic guardrails.SDLC Integration: Treating AI as "trusted but verified" to secure the end product at any scale of development operations.Download the Report
The Codex plugin for Claude Code gives users a simple way to pull Codex into their Claude Code workflow. It is useful for normal Codex reviews, a more adversarial review, and handing work off to Codex when a second pass from a different agent is required. The plugin delegates through the local Codex CLI and Codex app server, so it uses the system's existing local auth, configuration, environment, and MCP setup.
Microsoft 365 Copilot has introduced Critique and Council modes to enhance research capabilities. Critique uses a dual-model system to generate and refine research drafts, outperforming single-model solutions by 13.88% on the DRACO benchmark. Council allows parallel report generation using Anthropic and OpenAI models for impactful comparison and insight aggregation.
Qwen3.5-Omni is a full omnimodal large language model that understands text, images, audio, and audio-visual content. It can process more than 10 hours of audio input and over 400 seconds of 720P audio-visual input at 1 FPS. The model is trained on a massive amount of text and visual data, and more than 100 million hours of audio-visual data. It supports speech recognition in 113 languages and dialects and speech generation in 36 languages and dialects.
The proposed "Mirror Test" assesses LLM self-awareness by challenging models to identify their own outputs without explicit cues. Testing reveals that Anthropic's Opus 4.6 model shows notable self-recognition capabilities due to its distinct token outputs, outperforming OpenAI's GPT models, which fail to recognize self-generated tokens. Despite indications of attempted self-marking, no LLM demonstrated consistent self-awareness, as none effectively communicated using message passing.
The first generation of AI was a world where progress meant bigger weights, more data, and stellar benchmarks. The landscape has now changed. Big labs are now designing AI that interfaces with the real world. Infrastructure optimized for scale and efficiency won't get us to the next phase. What's needed now is infrastructure for grounding AI in operational contexts, real-world experiences, and continuous learning.
AI application companies are increasingly becoming "full-stack" by vertically integrating either downward into the model layer or upward into the service layer. Companies like Cursor and Intercom achieve differentiation and cost efficiency by developing proprietary models, while others, such as Crosby AI and WithCoverage, focus on delivering end-to-end services. As AI capabilities evolve, these strategies allow companies to enhance performance, reduce costs, and offer comprehensive solutions.
Most product rework traces back to the same mistake: building before validating. Miro's free webinar shows how AI-driven prototyping turns rough ideas into testable concepts that non-designers can create and iterate on. Featuring a Lufthansa product owner who's already building the right things faster. Learn how to prototype earlier and build the right thing faster
TimesFM is a pretrained time-series foundation model for time-series forecasting. The model is based on pretraining a patched-decoder style attention model on a large time-series corpus. It works well across different forecasting history lengths, prediction lengths, and temporal granularities.
Composer 2 introduced a two-stage training approach combining continued pretraining and reinforcement learning to improve long-horizon coding, achieving strong results on software engineering benchmarks.
Workloads vary by volume, value, verification property, time horizons, and other dimensions. This affects how agent labs focus their research efforts. The taxonomy of workloads governs which end markets justify training versus agent engineering. Labs also need to know what it actually costs to execute.
The Compliance API on the Claude Platform enables admins to audit logs, monitor user activities, and integrate data into existing compliance systems. It tracks admin and system activities, as well as resource activities like file creation or deletion. To access it, organizations should contact their account team and create an admin API key.
AI might not eliminate high-paying human jobs due to potential constraints like limited computing power and energy usage. These constraints could lead to the principle of comparative advantage, where humans remain employed in roles despite AI's superior capabilities, because the opportunity cost of allocating AI to all tasks would be too high. As AI advances, human roles could change, but new tasks and increased wealth might sustain or even increase compensation for human jobs.
Install once with a single command and your coding agent gains specialized Clerk knowledge across every framework. Works with Claude Code, Cursor, Windsurf, Copilot, and more.