Back to Node.js episodes

Node.js · Episode 5

The Node.js Architecture Playbook: APIs, Workers, Real-Time Systems, and Production Decisions Today

A long-form Node.js podcast episode about architecture decisions today: when to use Node.js for APIs, workers, real-time systems, serverless functions, CLIs, and backend platforms. The episode covers LTS planning, Node.js 24, native TypeScript execution, node:test, permissions, Web APIs, performance, observability, security, AI-assisted development, and how teams should decide what belongs inside a Node.js service.

HostMykhailo D.Senior Full-Stack Engineer - React, Node and Modern Frameworks

GuestOmar Siddiqui — Principal Backend Architect — RelayStack Engineering

The Node.js Architecture Playbook: APIs, Workers, Real-Time Systems, and Production Decisions Today

#5: The Node.js Architecture Playbook: APIs, Workers, Real-Time Systems, and Production Decisions Today

Original editorial from Softaims, published in a podcast-style layout—details, show notes, timestamps, and transcript—so the guidance is easy to scan and reference. The host is a developer from our verified network with experience in this stack; the full text is reviewed and edited for accuracy and clarity before it goes live.

Details

This is episode 5 of the Node.js podcast category.

The episode keeps the same Node.js topic but uses a different title and a new architecture-focused angle.

The conversation focuses on how teams should make backend architecture decisions with Node.js today.

The episode discusses APIs, workers, real-time systems, queues, serverless functions, CLIs, dependency discipline, observability, performance, and security.

The transcript is written like a natural human podcast conversation and is structured to feel like a 55-minute edited episode.

Show notes

  • Why Node.js architecture matters more than runtime hype
  • How to decide what belongs in a Node.js service
  • Node.js for APIs, workers, real-time systems, CLIs, and serverless
  • Why LTS strategy is an architecture concern
  • Node.js 24 and the stronger native platform direction
  • Native TypeScript execution and why type checking still belongs in CI
  • Testing architecture with node:test
  • Permission Model and safer service boundaries
  • Web APIs, fetch, URLPattern, WebSocket, streams, and AbortController
  • Performance tradeoffs: event loop, CPU work, queues, and databases
  • Observability: AsyncLocalStorage, logs, traces, and metrics
  • Security, npm supply-chain risk, and dependency review
  • AI-assisted architecture work and why humans still own decisions
  • When Node.js is a strong architecture choice
  • When Node.js is not the right tool
  • A practical architecture checklist for Node.js teams

Timestamps

  • 0:00Cold open: Node.js architecture is not just choosing a framework
  • 3:30Why this episode focuses on architecture decisions
  • 7:00What Node.js is best at today
  • 11:30APIs, backend-for-frontend layers, and service boundaries
  • 16:00Workers, queues, and background processing
  • 20:30Real-time systems, WebSocket clients, and event-driven products
  • 25:00LTS planning, Node.js 24, and runtime decisions
  • 29:30TypeScript type stripping, node:test, and native tooling
  • 34:00Permission Model, security boundaries, and dependency discipline
  • 39:00Performance architecture: event loop, CPU, memory, and database pressure
  • 44:00Observability architecture with logs, traces, metrics, and async context
  • 49:00AI-assisted architecture and code generation
  • 52:30Final architecture checklist for Node.js teams
  • 55:00End

Transcript

[0:00]Mykhailo: Welcome back to the Node.js stack podcast. This is episode five, and today we are looking at Node.js from a different angle. We are not asking only what is new in Node.js. We are asking how teams should make architecture decisions with Node.js today.

[0:38]Mykhailo: Because choosing Node.js is not an architecture by itself. Choosing Express, Fastify, Nest, a queue, a database, a serverless platform, or a monorepo is not automatically an architecture either. Those are tools. Architecture is the set of decisions that explain where work belongs, how systems communicate, how failures behave, how teams deploy, and how the product survives growth.

[1:30]Mykhailo: That distinction matters because Node.js makes it very easy to start. You can create an API quickly. You can write a worker quickly. You can build a CLI quickly. You can wire up a queue, call an external API, ship a webhook handler, or create a real-time endpoint quickly. And that speed is useful. But speed can also hide weak boundaries.

[2:20]Mykhailo: A Node.js system can look simple at the beginning and become confusing later. Routes start calling databases directly. Background jobs start sharing code with API handlers in strange ways. Retry logic appears in five different places. Logs become inconsistent. Error handling depends on who wrote the endpoint. And suddenly the team is not building a backend anymore. They are maintaining a pile of decisions nobody remembers making.

[3:30]Mykhailo: So today we are building a Node.js architecture playbook. Not a perfect universal blueprint, because that does not exist. But a practical way to think about APIs, workers, real-time systems, serverless functions, native Node features, testing, permissions, performance, observability, and AI-assisted development.

[4:05]Mykhailo: To help us walk through it, I am joined by Omar Siddiqui, principal backend architect at RelayStack Engineering. Omar works with teams designing Node.js systems for SaaS products, developer platforms, internal tools, event-driven workflows, and real-time applications. Omar, welcome.

[4:35]Omar Siddiqui: Thanks for having me. I like this topic because a lot of Node.js conversations focus on frameworks or features, but architecture is where the real cost shows up. A bad architecture can make a good runtime feel bad. A good architecture can make Node.js feel extremely productive and reliable.

[5:12]Mykhailo: What is the most common architecture mistake you see in Node.js teams?

[5:20]Omar Siddiqui: The biggest one is treating the first working version as the final shape. A team builds an API quickly, and because it works, they keep layering features onto that same shape. But the first shape is often optimized for speed, not clarity. Later, they need background jobs, webhooks, reporting, audit logs, retry behavior, permissions, and multi-tenant logic. If the original boundaries were weak, everything starts leaking into everything else.

[6:10]Mykhailo: So the issue is not that Node.js scales badly. The issue is that unclear decisions scale badly.

[6:18]Omar Siddiqui: Exactly. Node can scale very well for the right workloads. But unclear ownership, unbounded dependencies, weak tests, and hidden coupling will scale badly in any language. Node just lets you reach that point quickly because development is fast.

[7:00]Mykhailo: Let us start with what Node.js is best at today. When you look at the runtime today, where does it shine?

[7:12]Omar Siddiqui: Node shines in I/O-heavy systems. APIs, backend-for-frontend layers, real-time coordination, serverless handlers, webhook processors, automation tools, CLIs, developer platforms, internal services, and queue workers are all strong fits. It is especially powerful when the team already works heavily in JavaScript or TypeScript.

[7:58]Mykhailo: Why I/O-heavy specifically?

[8:04]Omar Siddiqui: Because Node's event-driven model is very good at managing many concurrent operations where the service spends a lot of time waiting on databases, caches, HTTP calls, queues, file systems, or sockets. It can keep many things moving without blocking a thread per request in the traditional sense.

[8:48]Mykhailo: And where should teams be more careful?

[8:53]Omar Siddiqui: Heavy CPU work. Large synchronous computation. Image processing. Video processing. Huge data transformations. Expensive cryptography. Massive JSON parsing. PDF generation at scale. Machine learning inference inside the same process. Node can do some of this with worker threads or external services, but you need to design for it. You should not casually put CPU-heavy work into the main request path.

[9:50]Mykhailo: So Node.js can still be part of those systems, but maybe not the part doing the heavy computation.

[9:58]Omar Siddiqui: Correct. Node can orchestrate. It can receive the request, validate it, store metadata, enqueue work, stream results, notify users, and call specialized services. But sometimes the actual heavy computation belongs somewhere else. Architecture is knowing where work belongs.

[10:48]Mykhailo: That is a good sentence: architecture is knowing where work belongs.

[10:55]Omar Siddiqui: Yes, and in Node.js that question matters a lot because the runtime is flexible enough to let you put everything in one place. Flexibility is helpful only when you also have judgment.

[11:30]Mykhailo: Let us talk about APIs. Node.js is probably most commonly associated with building APIs. What does a good Node API architecture look like today?

[11:45]Omar Siddiqui: A good Node API has clear layers, but not too many layers. You want routing, validation, authentication, authorization, business logic, data access, and external service calls to be understandable. The route handler should not become a dumping ground. It should coordinate, not contain every decision.

[12:35]Mykhailo: What should not live inside the route handler?

[12:40]Omar Siddiqui: Complex business rules, raw SQL scattered everywhere, direct calls to multiple external systems without boundaries, retry logic copied across endpoints, authorization checks hidden deep in random helper functions, and huge response-shaping logic that nobody can test independently.

[13:25]Mykhailo: So what should the route handler do?

[13:30]Omar Siddiqui: It should accept the request, rely on shared middleware or utilities for cross-cutting concerns, call a clear application service or use-case function, and return a response. It can do some orchestration, but it should not be the entire application.

[14:12]Mykhailo: That sounds like classic backend architecture, not Node-specific.

[14:17]Omar Siddiqui: Exactly. Good backend design is not language-specific. But Node teams sometimes skip it because the framework makes the first endpoint so easy. The temptation is to keep adding logic where the endpoint started.

[15:00]Mykhailo: Where do backend-for-frontend layers fit?

[15:06]Omar Siddiqui: Node is excellent for backend-for-frontend, especially when frontend and backend teams share TypeScript knowledge. A BFF can shape data for a specific client, handle session-aware logic, call internal APIs, and reduce frontend complexity. But it should not become a hidden monolith where every business rule gets duplicated.

[15:55]Mykhailo: So the BFF is helpful when it adapts data, dangerous when it becomes the unofficial core backend.

[16:02]Omar Siddiqui: Exactly. It should protect the client experience without stealing ownership from the domain services.

[16:00]Mykhailo: Now workers and queues. Many Node systems eventually add background processing. What should teams understand before doing that?

[16:20]Omar Siddiqui: They should understand that a queue is not a magic reliability button. A queue changes the shape of failure. Instead of failing immediately in the API request, work may fail later in a worker. That can be great, but now you need retries, dead-letter queues, idempotency, visibility, monitoring, and clear ownership.

[17:10]Mykhailo: Idempotency comes up a lot. Explain it simply.

[17:16]Omar Siddiqui: Idempotency means the same operation can happen more than once without causing duplicate damage. If a payment webhook is delivered twice, you should not charge twice. If an email job retries, you should not send ten copies. If a data import resumes, it should not duplicate records. In distributed systems, duplicate messages happen. Your architecture needs to expect that.

[18:15]Mykhailo: What makes Node.js a good fit for workers?

[18:20]Omar Siddiqui: Node is good for workers that coordinate I/O: reading jobs, calling APIs, writing to databases, sending notifications, moving data between systems, processing webhooks, generating small reports, or managing workflow steps. It is productive, easy to deploy, and works well with TypeScript.

[19:05]Mykhailo: And where should teams be careful with workers?

[19:10]Omar Siddiqui: CPU-heavy jobs, huge file processing, memory-heavy exports, and unbounded concurrency. If a worker pulls too many jobs at once, you can overload your database or external APIs. Worker architecture needs backpressure. It needs concurrency limits. It needs monitoring.

[20:00]Mykhailo: So even background work has production design.

[20:05]Omar Siddiqui: Absolutely. Background does not mean invisible. It means users may not be watching directly, which actually makes observability more important.

[20:30]Mykhailo: Real-time systems are another Node.js strength. Chat apps, dashboards, notifications, collaboration, live logs, trading screens, multiplayer-style coordination. What should teams consider here?

[20:48]Omar Siddiqui: Real-time architecture is where people underestimate state. Opening a WebSocket is easy. Operating a real-time system is not. You need connection management, authentication, authorization, presence, reconnect behavior, backpressure, message ordering, fan-out, rate limits, and horizontal scaling.

[21:40]Mykhailo: Why is horizontal scaling tricky?

[21:45]Omar Siddiqui: Because a user's connection lives on one process or one instance, but events may be produced elsewhere. If you run multiple Node instances, you need a way to route or broadcast events across them. That might involve Redis pub/sub, a message broker, a dedicated real-time platform, or a custom event backbone.

[22:35]Mykhailo: So real-time is not just WebSocket code.

[22:40]Omar Siddiqui: Exactly. WebSocket is the transport. The architecture is everything around it.

[23:15]Mykhailo: Node has also been improving native Web APIs, including WebSocket-related support and browser-aligned primitives. How does that change things?

[23:25]Omar Siddiqui: It helps because developers get more standard primitives. Native fetch, URL handling, URLPattern, streams, AbortController, and WebSocket support reduce the need for small wrappers in simple cases. But again, primitives are not architecture. They make good architecture easier; they do not create it automatically.

[24:20]Mykhailo: That theme keeps coming back: the runtime is stronger, but the team still needs judgment.

[24:28]Omar Siddiqui: Yes. Modern Node gives you better building blocks. It does not decide the building design.

[25:00]Mykhailo: Let us talk runtime strategy. Node.js 24 is the LTS line right now, and Node.js 25 is Current. How should architecture teams treat versions?

[25:15]Omar Siddiqui: They should treat Node version choice as part of architecture, not just tooling. The runtime affects security support, performance behavior, native APIs, dependency compatibility, container images, serverless support, and developer workflow. Production systems should normally live on supported LTS lines.

[26:05]Mykhailo: What does a bad version strategy look like?

[26:10]Omar Siddiqui: A bad strategy is whatever version happened to be installed on one developer's machine when the project started. Or staying on an old version because nobody wants to touch the upgrade. Or jumping to Current in production without understanding support timelines and dependency behavior.

[27:00]Mykhailo: What does a good strategy look like?

[27:05]Omar Siddiqui: Pin versions clearly. Use an active LTS line for production. Track support windows. Test new LTS lines before you are forced to upgrade. Keep CI and container images aligned. Watch security releases. And make runtime upgrades boring.

[27:55]Mykhailo: Make upgrades boring is a good goal.

[28:00]Omar Siddiqui: Yes. Boring upgrades mean the team has tests, observability, deployment safety, and dependency discipline. Painful upgrades usually reveal missing engineering foundations.

[28:45]Mykhailo: Node.js also announced a release schedule change starting with 27.x, moving toward one major release per year. Does that affect architecture?

[28:55]Omar Siddiqui: Indirectly, yes. A clearer release rhythm helps planning. But it does not remove responsibility. Teams still need upgrade windows, compatibility testing, and awareness of which release line is supported. Runtime lifecycle is part of system lifecycle.

[29:30]Mykhailo: Native tooling is next. TypeScript type stripping, node:test, and other built-in capabilities are changing how teams build Node projects. Start with TypeScript.

[29:45]Omar Siddiqui: Native TypeScript type stripping is useful because Node can execute TypeScript files that contain erasable TypeScript syntax without a heavy transpilation step. That helps scripts, CLIs, migrations, simple services, examples, and test helpers. But it does not perform type checking.

[30:35]Mykhailo: So type stripping is execution convenience, not correctness.

[30:42]Omar Siddiqui: Exactly. Serious teams should still run type checking in CI. Also, TypeScript does not validate runtime data. A request body, queue message, webhook payload, or database record can still be wrong at runtime. You still need validation.

[31:30]Mykhailo: What about node:test?

[31:35]Omar Siddiqui: The built-in test runner is a strong default for many backend projects. It reduces dependency weight and gives teams a native testing path. For new APIs, workers, utilities, CLIs, and internal tools, it is worth considering before reaching for a larger framework.

[32:20]Mykhailo: Would you replace every existing test stack with node:test?

[32:25]Omar Siddiqui: No. Migration should have a reason. If a Jest or Vitest suite works well, keep it. But for new services, start simple. Add complexity when it solves a real problem.

[33:05]Mykhailo: What should architecture tests prove?

[33:10]Omar Siddiqui: They should prove boundaries. Can the API validate input? Does authorization work? Do workers handle duplicate messages? Are retries safe? Does the service behave correctly when a dependency fails? Can the business logic be tested without spinning up the entire app? These are architecture questions, not just test questions.

[34:00]Mykhailo: Permissions and security boundaries are next. Node's Permission Model lets teams control what resources a process can access. How should architects think about that?

[34:15]Omar Siddiqui: They should think in terms of least privilege. A process should only have the access it needs. That is true for cloud roles, databases, file systems, environment variables, child processes, and network access. The Permission Model gives Node teams another way to express that boundary.

[35:05]Mykhailo: Where is the best first use case?

[35:10]Omar Siddiqui: Scripts and tools. Build scripts, migration scripts, plugin runners, local automation, data import tools, report generators, and CI jobs. These often run with too much access. If a tool only needs one directory, restrict it. If it does not need child process access, block it.

[36:00]Mykhailo: What about production APIs?

[36:05]Omar Siddiqui: It can help there too, but you need to test carefully because production services often have legitimate access needs. Start with smaller surfaces, learn the behavior, and then apply it where the boundary is clear. Security features should be adopted thoughtfully, not blindly.

[36:55]Mykhailo: Dependency discipline also belongs in architecture, right?

[37:00]Omar Siddiqui: Absolutely. Dependencies shape architecture. A framework shapes how the team structures code. An ORM shapes data access. A queue library shapes background processing. An auth package shapes security behavior. Even small utilities add maintenance and risk. npm is powerful, but dependency choices are architecture choices.

[37:55]Mykhailo: What should a dependency review ask?

[38:00]Omar Siddiqui: Does Node already provide this capability? Is the package maintained? How many transitive dependencies does it bring? Does it run install scripts? Is the license acceptable? Does it touch security-sensitive code? If it disappears tomorrow, how painful is the replacement?

[38:50]Mykhailo: That last one is harsh but useful.

[38:55]Omar Siddiqui: It is useful because packages are not free. Even good packages have lifecycle cost.

[39:00]Mykhailo: Performance architecture. What are the big Node.js performance decisions teams need to make?

[39:15]Omar Siddiqui: The first decision is what work belongs in the request path. Keep the request path focused. If work is slow, expensive, or unreliable, consider moving it to a queue or separate service. The second decision is how you handle CPU-heavy work. Do not block the event loop casually.

[40:00]Mykhailo: What are common event loop problems?

[40:05]Omar Siddiqui: Large JSON parsing, synchronous file operations, heavy validation, compression, encryption, image processing, PDF generation, huge data transformations, and excessive logging. These can delay other requests in the same process.

[40:55]Mykhailo: How should teams fix that?

[41:00]Omar Siddiqui: Measure first. Then choose the right tool: streaming, batching, caching, worker threads, job queues, separate services, database-side operations, or a different runtime for specialized computation. The wrong answer is pretending the event loop does not matter.

[41:50]Mykhailo: Database pressure is usually bigger than runtime pressure, right?

[41:55]Omar Siddiqui: Very often. Many slow Node services are actually slow database services. Missing indexes, N+1 queries, poor connection pool settings, chatty service calls, huge result sets, and weak caching cause more pain than raw JavaScript speed.

[42:45]Mykhailo: What metrics matter?

[42:50]Omar Siddiqui: p95 latency, p99 latency, event loop delay, memory usage, garbage collection pressure, CPU, database query time, external API latency, queue depth, retry volume, error rate, and saturation signals like connection pool usage.

[43:40]Mykhailo: So performance architecture is mostly about visibility and boundaries.

[43:48]Omar Siddiqui: Yes. If you cannot see where time goes, you cannot fix performance intelligently. And if boundaries are unclear, slow work spreads everywhere.

[44:00]Mykhailo: Observability architecture. What does a Node system need so teams can debug it under pressure?

[44:12]Omar Siddiqui: It needs structured logs, metrics, traces, request IDs, operation names, useful errors, health checks, and alerts that point to action. Observability should answer questions quickly: what failed, where, for whom, how often, and whether it is still happening.

[45:00]Mykhailo: How does AsyncLocalStorage help?

[45:05]Omar Siddiqui: AsyncLocalStorage helps carry context across asynchronous operations. A request might go through authentication, validation, business logic, database calls, cache calls, and external APIs. You want the same request ID or trace context attached across that whole path.

[45:55]Mykhailo: Without that, logs become disconnected.

[46:00]Omar Siddiqui: Exactly. Disconnected logs make incidents slower. You do not want to search through thousands of log lines trying to reconstruct one request manually.

[46:45]Mykhailo: What should teams avoid?

[46:50]Omar Siddiqui: Avoid logging secrets, tokens, passwords, authorization headers, full request bodies, payment data, and sensitive personal data. Observability should increase clarity without creating a security problem.

[47:40]Mykhailo: How should traces and metrics work together?

[47:45]Omar Siddiqui: Metrics tell you something is happening. Traces help show where time went. Logs explain details. You need all three, but you do not need infinite data. Good observability is about useful signal, not maximum noise.

[48:35]Mykhailo: That matters because teams often add tools but still cannot answer production questions.

[48:42]Omar Siddiqui: Exactly. Observability is not buying a dashboard. It is designing the system so the dashboard tells the truth.

[49:00]Mykhailo: AI-assisted architecture and development. How should Node.js teams use AI without damaging the system?

[49:15]Omar Siddiqui: Use AI for acceleration, not authority. It can help draft tests, explain old code, create migration checklists, generate examples, convert CommonJS to ESM, suggest refactors, and outline service boundaries. But it should not make final architecture decisions.

[50:05]Mykhailo: Where is AI most dangerous in Node projects?

[50:10]Omar Siddiqui: Security-sensitive code, authentication, authorization, encryption, payment logic, database migrations, concurrency behavior, retry logic, and architecture boundaries. AI may produce code that looks clean but ignores failure modes.

[50:58]Mykhailo: What should reviewers ask when AI-generated code appears?

[51:05]Omar Siddiqui: Does the developer understand it? Does it match our architecture? Does it validate input? Does it leak data? Does it add unnecessary dependencies? Does it block the event loop? Does it handle retries safely? Does it have meaningful tests? What happens when the database or external API fails?

[51:58]Mykhailo: So the standard is not who wrote the code. The standard is whether the team can own it.

[52:05]Omar Siddiqui: Exactly. Production code needs ownership. AI can help write it, but humans still operate it.

[52:30]Mykhailo: Let us end with a practical architecture checklist. A team is building or modernizing Node.js services today. What should they do?

[52:45]Omar Siddiqui: First, define what each service owns. Do not let APIs, workers, webhooks, and scheduled jobs all mutate the same domain state without clear rules. Second, standardize runtime versions on supported LTS lines. Third, keep Node upgrades boring through tests, CI, staging, and observability.

[53:20]Omar Siddiqui: Fourth, use native platform features where they fit: fetch, streams, URLPattern, node:test, TypeScript type stripping, and permissions. Fifth, keep TypeScript type checking in CI. Sixth, validate runtime data from users, queues, webhooks, and external APIs.

[53:55]Omar Siddiqui: Seventh, treat dependency choices as architecture choices. Eighth, move slow or unreliable work out of the request path when needed. Ninth, design workers with retries, idempotency, dead-letter handling, and monitoring.

[54:25]Omar Siddiqui: Tenth, build observability from the beginning. Request IDs, structured logs, metrics, traces, event loop delay, database timing, and queue depth are not extras. They are how you run the system.

[54:45]Mykhailo: Final sentence: what is the Node.js architecture mindset for today?

[54:50]Omar Siddiqui: Use Node.js for what it does well, respect what it should not do alone, and make every boundary clear before growth makes it expensive.

[54:56]Mykhailo: Omar Siddiqui, thanks for joining us.

[54:58]Omar Siddiqui: Thanks for having me.

[55:00]Mykhailo: End.

More Node.js Episodes