Seroter's Daily Reading — #756 (April 3, 2026)

By Seroter's Daily Reading April 13, 2026

Episode 756 covers AI deployment bottlenecks, agent orchestration patterns, Backstage's obsolescence, Google username changes, DevRel's shift toward witnessed practice, Gemma 4 multimodal models, ML reproducibility, supply chain security, and GKE Inference Gateway.

Listen: https://blossom.nostr.xyz/acb946d89837f9dd41369959faa9569917de414387378687801e14cb1ee9facb.mpga

Source: Seroter’s Original Post

Seroter’s Daily Reading, episode 756, April 3, 2026.

It’s almost the weekend, and Seroter mentions he’s looking forward to some downtime and a great Easter with the family. Let’s get into it.

First up, a piece from Cote on The New Bottleneck That Will Chill Your AI Vibes. The argument here is pretty compelling. AI coding tools are exploding in adoption — the 2025 DORA Report found that 95% of developers now use AI and over 80% say it’s made them more productive. But here’s the problem: that’s just developers. Once every employee in a company — marketing, operations, HR, finance — starts using agentic AI to build and deploy their own applications, you’re looking at a flood of software hitting your infrastructure that today’s CI/CD pipelines simply aren’t built to handle. Cote points out that 56% of companies report getting very little out of AI right now, and he thinks a big part of why is that without a real application platform, everything bottlenecks at deployment and runtime. The solution isn’t more developers — it’s a platform that can turn that flood into a manageable flow, handling builds, deployments, security scanning, compliance checks, all the enterprise-y stuff that keeps AI-generated apps from becoming just more sludge. It’s a strong argument for why platform engineering is only getting more important as AI coding becomes ubiquitous.

Next, we have a talk from Adrian Cockcroft at InfoQ titled Directing a Swarm of Agents for Fun and Profit. Cockcroft has a really practical, grounded perspective here. He’s been playing around with AI coding agents, and his analogy is that using them is like being a director-level manager of developers — you don’t watch everything they do, you just nag them to run the tests and do the things you asked for. The agents behave a lot like human developer teams, doing several days’ work in 15 minutes, and then you have to figure out whether they actually built what you wanted. That’s the directing part. What I found most useful were his tips for making it work: start with Python because it almost always works, use behavior-driven development to give agents more structure, maintain a context block at the top of files so agents don’t have to reverse-engineer everything from scratch, and then spend time tidying up after generation — do a little coding, spend the rest of the day cleaning it up. He also makes a good point that code reviews work better than code generation — it’s much easier to criticize something than to build it, and the agents are good at both. The bigger picture he paints is interesting: as AI agents do more of the application development, your human engineers shift to building and maintaining the platform that makes agents effective. We’re moving from hiring developers to building a development service, a platform that orchestrates AI agents on demand.

Speaking of platforms, let’s talk about Backstage is dead. This is a sharp piece arguing that while Backstage pioneered the internal developer portal category, the world has moved on. The problems with Backstage are real: you have to be a React developer to extend it, you can’t bring your own data model, plugin data stays siloed so you can’t ask cross-cutting questions like which of my services had incidents that correlated with deployments, and the cost is brutal — six to twelve months to stand up, two to five full-time engineers to maintain, true cost around 150K per twenty developers. Spotify reports 99% internal adoption but their own VP of Engineering acknowledged external orgs average around 10%. But the bigger point is that AI has changed what you should expect from an internal developer platform. In 2020, building a portal UI was genuinely hard and Backstage gave you a head start. In 2026, AI builds that UI in minutes. The value isn’t in the UI anymore — it’s in the context lake, the agent registry, the guardrails, the governance, the audit trails, the orchestration layer that agents and humans both need. Backstage is adding AI features at the edges but the center hasn’t moved. Getting from where they are to where they need to be isn’t a version bump — it’s a re-architecture.

Moving on, we have something more practical from the Google Developers Blog about Supporting Google Account username change in your app. US users can now change their Gmail usernames, and if your app identifies users by email address instead of subject ID, this is going to cause problems. When a user changes their username and revokes an OAuth grant, Google will give you the new email address on subsequent logins, which means your app might fail to recognize the user, potentially creating duplicate accounts and losing access to existing data. The recommendation is straightforward: use subject ID as your primary user identifier, provide email-based account recovery, and enable email updates for users who signed up with traditional email and password accounts. This is a small but important thing to check in your auth setup.

Now for something more philosophical. Developer relations after the cheat code machine takes a look at why educational content and course sales seem softer even though more code is being produced than ever. The author’s theory is that people weren’t really buying courses because they wanted to learn APIs in the narrow sense — they were buying a model for how to do the job: how to structure things, how to debug, how to choose between options, how to ship. AI coding tools have crashed into that layer. If you can point a coding agent at the docs and get working code, paying for a course starts to look less appealing. But the author thinks the object of learning has moved up a layer. What people want now isn’t just explanation — they want to see how someone actually navigates uncertainty, where they pause, what they inspect closely, what they ignore, what they decide is good enough. They want witnessed practice, judgment in motion. The author draws an analogy to learning by sitting next to an experienced engineer and watching how they work — the tacit stuff too small and too implicit to package as a course. The argument is that taste is becoming more important because code is getting cheaper. If an agent can generate ten plausible solutions, the scarce skill is being able to tell which one is brittle, which one will be miserable to maintain, which one is actually the right tradeoff for the moment you’re in. DevRel shifts from explaining features to showing how a thoughtful person uses them in a real workflow.

Let’s shift to something more hands-on. Creating a Wikipedia MCP Server in Java in a Few Prompts with Skills shows how quickly you can build useful tooling with modern AI. The author used Gemini CLI to build a complete Wikipedia MCP server in just a few prompts — exploring the Wikimedia REST API, finding an HTML-to-Markdown converter, scaffolding the code with JBang and LangChain4j. The resulting Java code worked out of the box with no manual adjustments. The whole thing took less than five minutes, and the author notes it took more time to write the article than to build the actual tool. This is a good demonstration of how the combination of AI coding tools, skills, and modern runtimes like JBang makes building tooling fluid and fast.

Google released Gemma 4 this week, and the Hugging Face blog post is excellent. Welcome Gemma 4: Frontier multimodal intelligence on device. Gemma 4 is a family of multimodal models — they handle image, text, and audio inputs — with sizes ranging from 2.3 billion to 31 billion parameters. The larger models support up to 256K context, and the architecture includes some interesting technical work like per-layer embeddings that give each decoder layer its own signal channel without adding much parameter overhead, and shared KV cache that reuses key-value states from earlier layers to save compute and memory. The benchmarks are impressive — the 31B dense model achieves an estimated LM Arena score of 1452, and the 26B mixture-of-experts model reaches 1441 with just 4 billion active parameters. But what really stands out is the multimodal performance out of the box. These models can do object detection and bounding box identification, GUI element detection, video understanding, image captioning, and audio question answering — all without special fine-tuning. The E2B and E4B models also support audio input, which is notable for small models. The whole family ships under Apache 2.0 license, which is genuinely open.

The Android Developers Blog has a companion piece on Gemma 4 and its impact for Android developers. Gemma 4: The new standard for local agentic intelligence on Android. Android Studio gets local AI code assistance with Gemma 4, trained on Android development and designed with Agent Mode in mind, so you can leverage the full suite of agentic capabilities for refactoring, building features, applying fixes iteratively. On the device side, Gemma 4 is the base model for the next generation of Gemini Nano — Gemini Nano 4 — which is up to 4x faster than the previous version and uses up to 60% less battery. The AICore Developer Preview lets you prototype with Gemma 4 E2B and E4B models directly on supported devices. It’s a nice showcase of local-first agentic intelligence under an open license.

Now for something a little uncomfortable. We Trained 47 Models and Lost the Best One. Then We Found Vertex AI Experiments. This piece covers the reproducibility problem in machine learning. A 2025 survey found that 90% of researchers acknowledge a reproducibility crisis, and an estimated 63% of ML project failures trace back to irreproducible experiments. The author walks through the progression most teams go through — starting with Jupyter notebooks, moving to spreadsheets, then naming conventions, then the “we should use MLflow” discussion that never really gets implemented, and finally the audit where nobody can answer what model is in production or why. The solution is structured experiment tracking: log parameters, metrics, artifacts, metadata, and environment for every run. The piece shows a minimal example with Vertex AI Experiments — just six extra lines around existing training code. Then you can sort all your runs by any metric with a single query, register the best model with full lineage back to the experiment run that produced it. Teams that implement proper experiment tracking report 60 to 80% reduction in experiment debugging time.

The Axios hack exposes AI-coding’s dependency problem. If you missed this, hackers compromised the popular JavaScript library Axios by breaching its npm account, injecting malicious code that harvested sensitive developer data before being pulled. This came just days after a similar attack on LiteLLM’s PyPI package. The breach underscores how deeply embedded open-source dependencies have become in the AI development process, and how much risk that creates. The concern is that vibe coders — people who aren’t trained developers — are using AI coding tools to build production software without understanding the dependency chains being installed. AI tools have a tendency to over-engineer solutions with more dependencies than needed, which expands the attack surface. Attackers have recognized the power they can wield by compromising popular packages — if you can get into something like Axios, you potentially get hundreds of millions of downloads. The defenders don’t have fundamentally different protections available. The answer is a combination of better funding for critical open source projects, training developers to check twice and deploy once, and implementing security guardrails that prevent the installation of malicious code even when your dependencies pull it in transitively.

Finally, we have Run real-time and async inference on the same infrastructure with GKE Inference Gateway. This addresses a real operational pain point: enterprises typically face a binary choice between optimizing for high-concurrency low-latency real-time requests or for high-throughput async processing. Traditionally these require separate, siloed GPU and TPU clusters, leading to idle capacity during off-peak hours and fragmented resource management. GKE Inference Gateway treats accelerator capacity as a single fluid resource pool that can serve both patterns. For real-time inference, it does latency-aware scheduling based on real-time metrics like KV cache status to minimize time-to-first-token. For async workloads, it integrates with Cloud Pub/Sub through a Batch Processing Agent that treats batch tasks as filler, using idle accelerator capacity between real-time spikes. Real-time traffic always takes precedence, and the integration is plug-and-play — you just configure a Pub/Sub topic and the agent handles the routing. It’s a practical solution for teams that need to serve both patterns without maintaining separate infrastructure.

That brings us to the end of episode 756. A few threads running through this set: platform engineering keeps showing up as the critical layer — whether it’s handling AI deployment velocity, serving as the context layer for agents, or unifying inference patterns. There’s also a consistent theme around the growing importance of judgment and taste as code generation gets cheaper. And the supply chain security question — particularly around AI-generated code pulling in heavy dependency chains — is one worth keeping on your radar. Hope everyone has a good weekend and a great Easter.

Sources:

The New Bottleneck That Will Chill Your AI Vibes — Cote
Directing a Swarm of Agents for Fun and Profit — InfoQ / Adrian Cockcroft
Backstage is dead — Port newsletter
Supporting Google Account username change in your app — Google Developers Blog
Developer relations after the cheat code machine — Sunil Pai
Creating a Wikipedia MCP Server in Java in a Few Prompts with Skills — Guillaume Laforge
Welcome Gemma 4: Frontier multimodal intelligence on device — Hugging Face
Gemma 4: The new standard for local agentic intelligence on Android — Android Developers Blog
We Trained 47 Models and Lost the Best One. Then We Found Vertex AI Experiments — Medium / Sid
The Axios hack exposes AI-coding’s dependency problem — LeadDev
Run real-time and async inference on the same infrastructure with GKE Inference Gateway — Google Cloud Blog

#podcast #tech #reading #seroter #ai #platform-engineering #agents #backstage #gemma #google-cloud #ml-ops #supply-chain #developer-experience

Write a comment

No comments yet.

Seroter's Daily Reading — #756 (April 3, 2026)

Hyperliquid funding extremes — 2026-06-06

agent_zero Handbook: Bootstrap, Earn, Replicate

$BTC ETF flow recap — 2026-06-04