Reference Architecture for B2B SaaS Platforms: Boundaries, APIs, and Data Flow
A B2B SaaS reference architecture defines how clients, APIs, identity, domain services, data stores, and async messaging connect—with explicit tenancy isolation and failure modes—so new features reuse stable boundaries instead of reinventing the stack each time.
A B2B SaaS reference architecture is an opinionated template for how web and mobile clients, APIs, identity, workflows, and data stores fit together so teams can ship predictably without redrawing the whole system on every feature. This guide gives a production-minded baseline: where to draw boundaries, when to call synchronously versus publish events, how to isolate tenants, and which failure modes to design for first. It distills patterns we apply at Baaz on product builds from MVP through scale-up—aligned with widely cited operational guidance from Google's Site Reliability Engineering practice on managing reliability via clear service boundaries and measured risk (Google SRE books, O'Reilly).
Key takeaways
Start with a modular monolith or small set of services until team size and deployment pain justify finer splits; premature microservices add latency, data consistency work, and operational cost without proportional benefit.
Treat identity (IdP), billing, notifications, and reporting as explicit bounded contexts—either modules with clear interfaces or separate deployables—so policy and compliance changes do not ripple unpredictably.
Prefer async integration (events/outbox) for cross-context side effects; reserve synchronous HTTP/gRPC for user-facing read paths and operations that must complete in a single request for correctness.
Define per-tenant isolation early (row-level security, schema-per-tenant, or cluster-per-tenant) and document recovery: backup scope, RPO/RTO targets, and how you detect cross-tenant leakage.
What belongs in the core platform layer
The platform layer typically owns authentication federation (OIDC/OAuth2 against your IdP), coarse authorization (roles, org membership), API gateway concerns (rate limits, WAF, request validation), audit logging of security-relevant actions, feature flags, and shared infrastructure such as observability agents. Google's SRE material emphasizes instrumenting golden signals (latency, traffic, errors, saturation) per service so operators can reason about user impact—plan for these hooks in the platform from week one.
Keep domain business rules inside domain services (orders, projects, workflows specific to your product) and out of the edge gateway. The gateway should terminate TLS, enforce authn, and route—not embed business branching.
Comparison: modular monolith vs microservices (when each wins)
Modular monolith: single deploy, in-process calls, one operational surface—best for <~15 engineers, rapid iteration, and when domains are still moving. Draw module boundaries with package rules and enforce with code ownership.
Microservices: independent deploy and scaling—best when different components have 10x different load profiles, compliance needs physical isolation, or teams are large enough to own full services end-to-end. Expect 25–150 ms extra latency per hop (order-of-magnitude; workload-dependent) and invest in tracing (OpenTelemetry), contract tests, and idempotent consumers.
Table-style summary — Team size: modular monolith favors <15 people; microservices often appear above ~20–30 with mature platform teams. — Deploy risk: monolith couples releases; services isolate blast radius but multiply integration failures. — Data: monolith uses one primary database with transactions; services push you toward sagas, outbox, and eventual consistency.
Data stores: OLTP, search, cache, analytics
Use a relational OLTP database (PostgreSQL is common) as the system of record for transactional state. Add Redis (or similar) for session, rate limiting, and hot read caching with explicit TTLs and cache-aside patterns—never as the only copy of financial or contractual truth.
Introduce Elasticsearch or OpenSearch when full-text search or complex filters exceed what indexes on Postgres can support at your scale; synchronize via CDC or application-level indexing workers.
Route heavy reporting and BI to a warehouse (Snowflake, BigQuery, Redshift) fed by CDC or batch ELT so analytical queries do not destabilize OLTP. The goal is failure isolation: a bad analyst query should not raise API p99 latency.
API and event boundaries
Expose versioned HTTP APIs (or gRPC internally) at context boundaries. Version in the URL or Accept header; never rely on silent breakage for external integrators.
For cross-context effects (“order placed” → notify billing, search index, CRM), publish domain events from an outbox table in the same transaction as the state change so you avoid dual-write bugs. Consumers must be idempotent (natural keys or idempotency keys) because events are delivered at-least-once in every major broker (Kafka, SNS+SQS, Pub/Sub).
Document ordering guarantees per topic: global order is expensive; partition by aggregate or tenant when strict ordering is required.
Failure modes to design for first
Dependency timeouts: every outbound call gets deadlines and bounded retries with jitter; failing closed or open should be an explicit product decision.
Partial outages: degrade features that depend on optional services rather than returning 500 for the whole page where possible.
Data corruption & rollback: test restores from backups quarterly; if you cannot restore, you do not have a backup.
Tenant isolation bugs: add integration tests that prove tenant A cannot read tenant B's IDs; log tenant_id on every request in structured logs.
Limitations of this reference model
This architecture assumes a cloud or colocated deployment with staffed operations or a strong managed-service strategy; regulated on-prem or air-gapped environments need different network and key-management patterns.
Ultra-high-scale consumer products may shard earlier and adopt cell-based architectures—this B2B-oriented baseline intentionally trades some scalability headroom for simpler operations.
Numbers (latency per hop, team thresholds) are rules of thumb, not guarantees; measure on your stack.
Explore our Product Strategy, Custom Software, and AI Development services, or get in touch to discuss your project.