Seroter's Daily Reading — #743 (March 17, 2026)

Source
Daily Reading List — March 17, 2026 (#743)
Articles Covered
- Bringing the power of Personal Intelligence to more people — Google
- Subagents — Simon Willison
- Giving you more transparency and control over your Gemini API costs — Google
- Google Workspace’s New AI Features Seem Genuinely Useful — Lifehacker
- The State of AI in the Enterprise — Deloitte
- Measuring progress toward AGI: A cognitive framework — Google DeepMind
- Banks struggle to scale AI as legacy tech devours IT budgets — CIO Dive
- Introducing multi-cluster GKE Inference Gateway — Google Cloud
- State of Open Source on Hugging Face: Spring 2026 — Hugging Face
- Developer Guide: Nano Banana 2 with the Gemini Interactions API — Philipp Schmid
- Agent Protocols — MCP, A2A, A2UI, AG-UI — Mete Atamel
- Announcing the Colab MCP Server — Google Developers
- Durable AI agent with Gemini and Temporal — Google AI
Transcript
Welcome back to another episode covering Richard Seroter’s Daily Reading List. This is number 743, posted March 17th, 2026. Thirteen articles today — a big list — and the dominant thread is agents everywhere: how they communicate, how they’re governed, and the infrastructure that makes them possible. Let’s get into it.
Google is expanding what they call Personal Intelligence across Search, the Gemini app, and Gemini in Chrome. The idea is that Gemini can now connect to your Google apps — Gmail, Photos, Calendar — and give you responses that are specifically tailored to your data. The examples they give are practical: you ask about sneakers you bought last year and it finds them in your purchase history, or you have a layover and it calculates restaurant options based on your gate, your food preferences, and how much time you actually have between flights. It’s available now in the US for free-tier users. The privacy angle is worth noting: Google says they don’t train on your Gmail inbox or Photos library directly, though they do train on prompts and responses within the Personal Intelligence features. Whether you trust that distinction is up to you.
Simon Willison published a guide on subagents as part of his ongoing series on agentic engineering patterns. This is one of the clearest explanations I’ve seen of how subagents actually work in practice. The core idea is simple: LLMs have finite context windows, so instead of cramming everything into one session, you dispatch a fresh copy of the agent to handle a specific subtask. It gets its own context, does the work, and reports back. Simon uses Claude Code’s Explore subagent as his example — when you give Claude Code a task in an existing codebase, it first sends a subagent to explore the repo and figure out what’s relevant. The parent agent stays clean while the subagent burns through tokens reading files and searching for patterns. He also covers parallel subagents for speed, and specialist subagents for tasks like code review or test running. The key insight is that the primary value of subagents isn’t delegation — it’s context management. You’re protecting the parent’s working memory.
Google announced project spend caps and revamped usage tiers for the Gemini API. This solves a real problem — API costs can spiral quickly with agentic workloads, especially if you have runaway loops or unexpected traffic. You can now set a hard monthly dollar limit per project in Google AI Studio. They’ve also restructured usage tiers with lower spend qualifications and automatic upgrades, so you get higher rate limits as your usage grows without having to request them. The most interesting detail: each tier now has a system-defined cap on total monthly spend across your entire billing account, separate from the custom caps you set yourself. For anyone running agents in production, cost controls like this aren’t optional — they’re essential.
Lifehacker covered Google Workspace’s latest AI features, and the review is surprisingly positive. Gemini now has a much more prominent interface in Docs, Sheets, and Slides — a persistent bar at the bottom of the screen. The standout features are the ability to pull data from Gmail, Google Chat, and Drive directly into a document, and a style-matching feature where Gemini can generate text that matches the writing style of an existing Google Doc. The reviewer tested it with their own writing and said it was recognizably similar, if still a bit stilted. The real value seems to be in data aggregation — asking Gemini to summarize everything you’ve written this year, or to build a spreadsheet from meeting notes and emails. It’s the kind of feature that’s actually useful rather than just impressive.
Deloitte released their State of AI in the Enterprise 2026 report — and unusually for a Deloitte report, it’s completely ungated. Some key numbers: worker access to AI rose by 50 percent in 2025. Two-thirds of organizations report productivity and efficiency gains. But only 20 percent are actually growing revenue through AI — 74 percent aspire to but haven’t gotten there yet. Only one-third of organizations are using AI to truly reimagine their business; the other two-thirds are just optimizing existing processes. On agents specifically, usage is expected to surge in the next two years, but only one in five companies has mature governance for autonomous agents. The skills gap is the number one barrier to integration, and most companies are addressing it through education rather than actual role redesign. The overall picture: lots of adoption, still mostly surface-level.
Google DeepMind published a paper on measuring progress toward AGI through a cognitive framework. Instead of treating AGI as a binary achievement, they propose evaluating AI systems across ten cognitive abilities: perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem solving, and social cognition. The evaluation protocol benchmarks AI against human baselines from a representative sample of adults. They’ve launched a Kaggle hackathon with a $200,000 prize pool to crowd-source evaluations for the five abilities where the measurement gap is largest: learning, metacognition, attention, executive functions, and social cognition. Whether or not you think AGI is a useful concept, having rigorous benchmarks for cognitive abilities is valuable — it moves the conversation from hand-waving to measurement.
CIO Dive reported on banks struggling to scale AI because legacy tech is devouring their IT budgets. The core tension: financial institutions want to invest in AI, but their existing infrastructure — decades-old core banking systems, mainframes, compliance layers — eats up so much budget that there’s little left for innovation. It’s the same pattern we saw with the SAP article in last week’s list. Until the foundational infrastructure is modernized or at least stabilized, AI adoption stays at the pilot stage. Results need to be there too — you can’t just spend indefinitely on experiments.
Google announced multi-cluster GKE Inference Gateway, which lets you run inference workloads across multiple Kubernetes clusters, even across different regions. The key feature is model-aware load balancing — the gateway can route requests based on real-time signals like KV cache utilization, not just generic health checks. If one cluster’s GPU capacity is saturated, traffic shifts to another. If a region goes down, routing adjusts automatically. For anyone running large-scale inference in production, this is a significant piece of infrastructure. It turns multi-region GPU deployment from a custom engineering project into a configuration problem.
Hugging Face published their State of Open Source report for Spring 2026, and the data is fascinating. The platform has grown to 11 million users and over 2 million public models. But the most striking finding is geographic: China has surpassed the United States in monthly downloads and accounts for 41 percent of all model downloads. The shift happened fast — after DeepSeek’s R1 release in January 2025, Chinese organizations went all-in on open source. Baidu went from zero releases on the Hub in 2024 to over 100 in 2025. ByteDance and Tencent each increased releases by eight to nine times. Meanwhile, independent developers — individuals not affiliated with any company — now account for 39 percent of all downloads, up from 17 percent before 2022. They’re doing the unglamorous work of quantizing, adapting, and redistributing base models. The ecosystem is less centralized than it looks from the headlines.
Philipp Schmid published a developer guide for Nano Banana 2, Google’s latest image generation model, using the new Interactions API. What makes this interesting is the search grounding feature. The model can retrieve real images from Google Image Search and use them as visual references during generation. So if you ask it to create a poster of Fushimi Inari shrine, it first looks up actual photos of the shrine, then generates an image that’s visually accurate to the real location. You can also provide a reference photo of a person, and the model will preserve their appearance while placing them into a search-grounded scene. It supports up to 14 reference images and 5 characters. The Interactions API itself is Google’s next-generation unified interface for Gemini — worth keeping an eye on.
There’s a solid overview article on agent protocols — MCP, A2A, A2UI, and AG-UI — from Mete Atamel at Google. If you’ve been confused about which protocol does what, this is the clarifier. MCP gives agents tools — it’s about connecting models to external data sources and APIs. A2A, the Agent-to-Agent protocol, handles communication between agents running on different frameworks. AG-UI connects agent backends to user-facing frontends — it’s how you build UIs that stream agent output in real time. And A2UI is a generative UI spec where agents can return rich interactive widgets rather than just text. The summary is clean: MCP is agent-to-tool, A2A is agent-to-agent, AG-UI is agent-to-frontend, and A2UI is agent-generates-UI. Different layers of the same stack.
Google released an open-source Colab MCP Server that lets any MCP-compatible agent control Google Colab programmatically. This means an agent like Claude Code or Gemini CLI can create notebooks, write and execute code cells, install dependencies, and manage the entire notebook lifecycle — all in Colab’s cloud environment with its GPUs and compute resources. The practical value is treating Colab as a sandboxed compute environment for agents. Instead of running untrusted code locally, you dispatch it to Colab. The agent gets cloud compute; you get isolation. It’s open source and works with any MCP-compatible agent.
Finally, there’s a tutorial on building durable AI agents with Gemini and Temporal. The problem Temporal solves is persistence and fault tolerance in agentic loops. If your agent is partway through a multi-step workflow and the process crashes, Temporal can resume exactly where it left off — state, context, everything. It records each step as a checkpoint, so failures at step seven don’t mean re-running steps one through six. For production agent deployments where reliability matters, this is the kind of infrastructure that turns demos into real systems.
That’s the list for number 743. Heavy on Google today — nine of the thirteen articles are Google-authored or Google-adjacent. The throughline across all of them is agents moving from concept to infrastructure: protocols for how they communicate, gateways for how they scale, spend caps for how they’re governed, and durability frameworks for how they survive failures. The plumbing is getting serious. Thanks for listening.