> Microservices was always a solution to the organisational problem of getting more developers working on a system at once.
I don't really agree. It's probably the most agreed upon benefit, but there are others. Fundamentally, microservices allow you to isolate state. I can take two services and split them up - and now I've split their state spaces. I can put a queue service between them and now I've sliced out the state of their communication.
This slicing up of state has a ton of benefits if done right.
1. Isolation of state is basically the key to having scalable concurrency. Watch the Whatsapp talk on Erlang and they say "Isolation, Isolation, Isolation".
2. Isolation of services is great for security. You can split up permissions across your services, limit their access in a more granular way, etc.
Those are pretty nice wins. They're achievable through discipline in a monolith (heavy use of module boundaries, heavy use of immutability - something most mainstream languages don't encourage) but a network boundary really forces these things and makes unintentional stateful coupling a lot more painful.
> It is easier to start with a monolith, find the right design and then split on the boundaries than it is to make the correct microservices to begin with.
I disagree. Starting with a bad set of microservices can be fixed by merging. Merging two codebases is trivial compared to splitting. Again, if I have isolation between the services, even if the slicing up was done badly, even if they're coupled, I can just remove the layer between them.
Splitting has to start from a place of coupling and then try to decouple - this is especially hard with languages that encourage encapsulated mutable state (most of them).
> I disagree. Starting with a bad set of microservices can be fixed by merging. Merging two codebases is trivial compared to splitting. Again, if I have isolation between the services, even if the slicing up was done badly, even if they're coupled, I can just remove the layer between them.
There is still coupling in microservices, it has just shifted to messaging, networking, and queuing. If you get any of those parts wrong, you have a worse mess to untangle with less mature debugging/logging tooling than a monolith enjoys, all the while likely dealing with eventual consistency (depending on the design). I'm not saying don't start with a microservice, but it likely wouldn't be the very first tool I would reach for when starting out if a monolith would do the job effectively. Most things will never be hyperscale and won't benefit from the increased concurrency. You can go a very long way with a "majestic monolith" and a bit of care.
> There is still coupling in microservices, it has just shifted to messaging, networking, and queuing.
Sure, in the sense that your service is "coupled" to a queue and if you don't abstract that away it's hard to change that queue implementation. But in the sense of two services you wrote being coupled, they aren't, in terms of shared state. That gets pulled out. There is no way for one service to mutate the memory of another - it has to send a message to it.
That can be TCP or it can be over some queue or stream or whatever.
> If you get any of those parts wrong, you have a worse mess to untangle with less mature debugging/logging tooling than a monolith enjoys
This is the case with any concurrent system. The fact that so many languages lack concurrency primitives is probably why people don't run into this more often. If you use concurrency primitives in your language, you already have this.
> all the while likely dealing with eventual consistency (depending on the design)
There's nothing eventually consistent about this system. It fundamentally has causal consistency (since messages from a service must come after messages to that service that triggered them), and it's perfectly capable of leveraging transactions.
> I'm not saying don't start with a microservice, but it likely wouldn't be the very first tool I would reach for when starting out if a monolith would do the job effectively.
To each their own. I much prefer it. It's far simpler to maintain "good" design since the network boundary creates a hard line in the sand that you physically can not violate.
They are coupled by the queue itself (you accounted for your queue going down and out of order/delayed messages right?), the network (i.e. what happens if some microservices go offline?), and most importantly the event message abstraction. Nothing is for free, and the event message abstraction/format is the new shared state in microservices. It's easy to get the event messaging abstraction wrong in green field projects, since you likely don't understand the domain as well as you would like. If that goes wrong, it can be very painful to fix after the fact. Again, not slamming microservices, but we should go in with eyes wide open about the well-known benefits vs. the tradeoffs they offer. I refer to the high quality (and partially free!) course [1] taught by Udi Dahan from Particular that reviews many of the tradeoffs with distributed system design.
> The fact that so many languages lack concurrency primitives is probably why people don't run into this more often. If you use concurrency primitives in your language, you already have this.
The difference is that with a monolith, the entire application state is in one place, but with microservices its state is distributed. This makes logging and debugging more difficult along several dimensions. Finally, there are decades worth of tooling development at your disposal to debug and monitor your monolith (even concurrency issues). The tooling around debugging and troubleshooting microservices pales in comparison.
How is that any different to calling a function on a class? That's technically not class A modifying class B's memory either. B modifies it's own memory in response to a message (function parameters) from A. The message going over a network doesn't make that fundamentally different.
There are many solutions, certainly. A network is one option, which I personally prefer, but as I said elsewhere it's a "choose the right tool for the job" kind of situation.
I disagree. If you are able to merge them you spent the work to originally have them split. So more work to start with microservices. It goes back to agile, the easy solution is have a monolith and figure out later how it can be well-split.
> Fundamentally, microservices allow you to isolate state. I can take two services and split them up - and now I've split their state spaces. I can put a queue service between them and now I've sliced out the state of their communication.
Dr. Alan Kay would like a word. This is literally the premise behind OO:
> "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things."—Dr. Alan Kay
Outside of "extreme late-binding," which is a fascinating topic in its own right, isolating state is exactly the point of OOP. If we need microservices to accomplish isolation of state, that suggests we got OOP wrong, very wrong.
I'm extremely aware of Alan Kay's statement, as well as the foundations of the actor model. The reality is that today that is not what OOP has become, and Alan Kay would agree.
> that suggests we got OOP wrong, very wrong.
Alan Kay very clearly states that people "got it wrong" and that OOP was supposed to be about messaging. ie: He intended for it to be one thing, but it isn't that thing.
> I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.
The big idea is "messaging"
> The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.
[..]
I agree that these are Kay's thoughts and also agree with his take on what it should be. But I think the reality is more complicated than it simply being something that evolved away from his grand dream. It's more that there was a soup of ideas floating around during that time that came together as OOP and the combination that became dominant was something else. For instance Simula was already using inheritance prior to Kay's message passing proposal.
I can agree with all of that, I wasn't trying to imply otherwise, more just explaining that stating that OOP means what Kay wanted it to mean is ignoring history, consensus, and Kay himself.
I’m very aware that we evolved things Kay may not have intended, and that doesn’t make it wrong.
He may have coined the term, but he doesn’t own it, nor should we feel beholden to his vision dating back to 1972.
What I said was if the primary reason for micro services is hiding of state, then we got OO wrong, because OO, even the much maligned J2EE style of OO, can do that for us if we want hiding of state and message passing.
Another possibility is that microservices do much more for us than hiding of state and limiting communication to message-passing.
At my 9-5, we use Elixir to write our services and have a few Actor-based Scala services too, so my feeling is that we actually are doing OO fine, and that there’s something else that makes microservices compelling at scale.
> If we need microservices to accomplish isolation of state, that suggests we got OOP wrong, very wrong.
That’s the last sentence, which summarizes my point.
As for Dr. Kay, his exact words were saying that to him at the time OO was certain concepts and nothing more.
I have never interpreted that to mean that languages or systems that do more than hiding state and message passing are wrong, just that if we say something like “OO requires inheritance,” he would disagree with our definition of OO.
After all… Smalltalk itself has a lot more than hiding of state and message passing. Would anyone claim that Dr. Alan Kay would say Dr. Alan Kay was doing OO wrong?
I think there are good reasons to design microservice architectures, but if the argument is “Let’s break up our monolith so we can hide state,” I’d say that we can go ahead and just use our existing OO tools to achieve that.
It does, but that's because there are different flavors of OOP. Alan Kay's original take on OO was closer to the actor model than what grew into the mainstream spin on OOP with inheritance and the rest.
If you take 10 steps back and squint, microservices & the actor model start to look pretty similar.
Arguably, we did. In large codebases worked on by multiple teams it's not unusual to see teams drilling holes in OO walls because they "just want to get their work done" and view the abstractions as barriers. Taken along with the unfortunate fact that the majority of engineers suck at decomposing things into objects and the result is people preferring to move stuff out of process to keep the code simple and make the encapsulation more effective. I don't really think that's a good thing, but that's been my observation.
Erlang/Elixir feels like it strikes a middle ground, where every process behaves like one of Kay's objects, including the emphasis on message-passing rather than methods that behave like procedure calls.
The Erlang programming language is AFAIK the only programming language used in production that does what Alan Kay is talking about. So it is probably the only Object Oriented programming language used in production today.
That seems to be equating it with message passing, but messaging is already the first item mentioned by Alan (messaging, local retention ...) so I thought it might mean something else.
Network boundaries cause far more problems than they solve, and you've just shifted the complexity to now securing the network, usually with even more additional services, proxies, services meshes, firewalls, etc.
> Network boundaries cause far more problems than they solve,
They cause exactly 0 extra problems. A call from function A to function B can fail due to B having a bug. A call from service A to service B can fail due to B having a bug or a network failure. Either way, failure is possible and has to be handled - the network only makes that more obvious.
Further, a call between functions can cause mutated shared state - not the case across a boundary, they physically do not share mutable state.
> and you've just shifted the complexity to now securing the network
Not really. Fundamentally you have split your service capabilites up - now you can apply least privilege as you desire.
A failure is obvious all by itself. Network boundaries just turn it into a much bigger failure. And network failures are far more common and harder to test, handle and recover from.
> "Further, a call between functions can cause mutated shared state - not the case across a boundary, they physically do not share mutable state."
This is false as state is not tied to your process nor does it require a network leap to add isolation.
> "now you can apply least privilege as you desire."
How exactly? It's not magic, you still have to apply them, and now it requires more strategies and effort to accomplish.
I'm ignoring the first two points since I'm tired of explaining these things to people - you can read the papers/ watch the talks I've linked.
> How exactly? It's not magic, you still have to apply them, and now it requires more strategies and effort to accomplish.
Yes, we have tons of tooling for process isolation. Splitting a service into two services means you can isolate two processes instead of one, which means you break up the capabilities unique to each.
I used the word "apply" so I don't know why you're saying "you still have to apply them"... it's literally what I just said.
The only potential benefit there was for security, specifically because of your security-based product and the async nature of its processing. And even that was just relying on the ephemeral nature of lambdas instead of other security constructs or simply resetting instances of a monolith to accomplish exactly the same thing.
Nothing in that article explained a clear need or benefit of microservices.
Nothing in the article has to do with the product or the fact that it's security related, other than to provide a motivating use case.
> The only potential benefit there was for security
And performance.
> even that was just relying on the ephemeral nature of lambdas
I think you've failed to understand the article, which may be my fault, I haven't read it in a long time. The key is isolation. Ephemerality gives you a sort of temporal isolation. Splitting your messaging from your data storage gives you a capability based isolation. And so on.
It also means we can scale to the limits of S3/SQS - each service is itself stateless, the majority of state is managed in SQS, which could be quite loose about its consistency since every service is idempotent - arguably a form of temporal isolation.
What I've described in this article is effectively the actor model. I feel like I don't have to really justify the benefits of the actor model with regards to scale?
What part of microservices (split functionality with completely separate runtime artifact deployed to separate servers) is needed for actors? You can have actors in a monolith.
With a monolith you can put everything inside a database transaction and have a an entire request's worth of logic succeed or fail together. That's a lot easier to manage that having parts of the logic spread over multiple systems succeed and other parts fail.
>It is pointed out that faults in production software are often soft (transient) and that a transaction mechanism combined with persistent process-pairs provides fault-tolerant execution -- the key to software fault-tolerance.
So distributed transactions with two phase commit, XA and all that. Imagine what, microservices hipsters tried that and it turned out too slow and cumbersome so they invented "sagas" which are still immensely more complex than a single transaction in a single database.
You seem confused. If you want a transaction use a transaction. If you don't want a transaction don't use a transaction. If you need a transaction across services, that sounds like you've run across a microservice antipattern. Monolith/Microservice changes nothing here - you can have the same issue in a monolith where two different functions are managing transactions and now you want a single transaction.
Well you can't use database transactions across multiple connections, so presumably this would involve you implementing your own transaction and rollback system. That's a lot more complexity than using a system that just works out of the box.
I don't really understand. If I have a database, and a service is talking to it, it can open a transaction. If I then want to talk to other services, and rollback that transaction based on what happens with those, I can do that.
Microservices changes nothing about this. If you want to remove transactions by splitting up your logic such that it operates in terms of sequences or something, you can do that, but that's just a choice like any other.
Well... don't do that? This is where microservices comes in. In a SOA architecture nothing really tells you when it's a good idea to split things up. Microservices is a methodology to help you avoid this exact situation.
You'd have the same problem in a monolith if you have two different modules working on the same db.
Assume you have two tables A and B on the same DB. They are sort of seen as unrelated. Suddenly a feature request requires that A and B are mutated together consistently.
If it is in one service you just use a common DB transaction and get it done.
If it is in one microservice for A and one microservice for B then you have to somehow implement this transaction yourself. This is possible but more work.
OK imagine that you have two different modules that manage transactions to a database. Now suddenly you need there to be consistent mutations between those functions.
Do you see my point? Microservices do nothing here - you have run into antipatterns that are universal, and that microservice architecture addresses explicitly as an antipattern.
I do not see your point. Sometimes consistent mutations between modules is wanted. Monoliths lets you do it. Perhaps you discovered your module boundaries were wrong, you create a supermodule to encaspulate both to coordinate the joint transaction and then split up later a different way and so on.
Module boundaries are refactorable.
Importantly, what if the alternatives you end up with achieve the same things with microservices end up causing a ball of mud of services, reimplementing transaction logic that belong in your DB in your homegrown network protocols?
You seem to say that some things are not possible with microservices and therefore this leads to cleaner code. My retort is that the kind of things one sometimes come up with as workarounds to make things still work for microservices are so complex that the cure is worse than the disease you wanted to cure in the first place.
Why are microservices not refactorable? This is the same exact issue in both cases. You designed something for a use case, the use case changed, now your old design isn't working. So maybe you merge those two services, or merge those two modules, or whatever else you want to do.
> reimplementing transaction logic that belong in your DB in your homegrown network protocols
Don't do that? I mean, again, this issue of "I wrote something the wrong way and now I have to fix that" is not any better or worse in microservices.
> My retort is that the kind of things one sometimes come up with as workarounds to make things still work for microservices
That doesn't sound like microservices. In fact, even the idea of having a database shared across services doesn't sound like microservices - it's an explicit antipattern. So it sounds like a bad SOA design. The point of microservices is to take SOA and add patterns and guidance to avoid the issues you're talking about.
Microservices gets owned by different teams, teams get cemented, politics get in the way of refactoring. Game over.
Sure, if you are a single team working on 10 microservices you can probably refactor with abandon without spending 70% of your working days in meetings talking about migrations and trying to sync strategies...
You may have experiences that that microservices are as easy to refactor as monoliths; in my experience it is orders of magnitude harder...
I think there is a bit of a "No true scotsman" fallacy at play here. You see something you do not like then it is "not microservices done properly".
How about state all the things you don't like about monoliths , then I say "that is not monoliths done properly", "don't do that" for each one?
I think both monoliths and microservices can lead to good code or balls of mud depending on the organization and developers involved.
Real question isn't whether "microservices done right" is better. The question is: does a Decision to do microservices reduce the chances of a ball of mud, when that Decision is then implemented by imperfect developers in an imperfect organization?
PS I always meant that each microservice had their own DB above, we agree on that and I never dreamt otherwise.
What I was getting at is that when you go distributed, sometimes quite complex patterns must be applied to compensate.
You may say the architecture is then "better", but on what metric? It is certainly more work up front -- so you start out in the negative and to become better the system and organization need at least to get to a certain scale, you need to save in the hours you invested to come out ahead.
In many scenarios the cost in developer-months needed up front is just as important as other factors in evaluating the best architecture. E.g. a scrappy startup simply should not do it IMO. Corporations..... perhaps;but I have seen it gone badly. (I guess it is just not "done right" then? See above.)
PS I think microservices excel in making people FEEL productive (doing work that is not directly benefiting the company).
I have personal experience with the same product built twice, once as a monolith by a small team that worked really well and once as lots of services.
The featureset and development speed is about the same, but the many-services requires 10x as many people.
However by splitting into many services everyone feels productive doing auxiliary and incidental work. Only those of us who worked on the first system are able to see that the total output of the company is the same but 10x as expensive.
> Microservices gets owned by different teams, teams get cemented, politics get in the way of refactoring. Game over.
I don't understand how microservices make this worse in any way. Modules get owned by different teams all the time.
> You may have experiences that that microservices are as easy to refactor as monoliths; in my experience it is orders of magnitude harder...
Yes, I have said before that I believe merging is fundamentally simpler than splitting. If we're just talking about merging a module vs a service, I don't believe either is harder than the other - I mean... nothing about microservices prevents you from using modules, and indeed I would highly recommend it.
> I think there is a bit of a "No true scotsman" fallacy at play here. You see something you do not like then it is "not microservices done properly".
For sure, and that's a failing of microservices. People think microservices means "SOA", or "write a lot of services". If you want to criticize SOA or whatever, sure, the argument of "don't do that" goes away.
> How about state all the things you don't like about monoliths , then I say "that is not monoliths done properly", "don't do that" for each one?
I probably could state a bunch of things that are pretty fundamental, but I don't think it's important - I don't know that I've actually said anywhere that microservices are better than monoliths, what I've instead said are the benefits of microservices that I see, which others have taken to mean that I somehow think monoliths or modules are bad.
> You may say the architecture is then "better"
I honestly don't think I've said that anywhere, or even made a judgment anywhere.
I think I can summarize, again, what I've said.
1. Network boundaries provide a physical layer that enforces isolation of state and the use of message passing
2. Isolation of state makes scaling a system easier
3. Isolation of capabilities makes securing a system easier
4. SOA inherently leverages the network boundary
5. Microservice Architecture is similar to SOA but with a bunch of patterns, guidance, and concepts that you leverage in your design
What I've received in response is a hodgepodge of:
1. "Modules can isolate state" - only true in some languages, and even then there's no physical barrier enforcing it, you're relying on developers to maintain that isolation.
2. "But what if you do anti-patterns that microservices tell you not to" - ok, that's why microservice architecture has books and documentation about what not to do. If you do those things, I'm not going to blame you, it's a failing of all methodologies when users have a hard time understanding them.
But so far the anti-patterns mentioned aren't really compelling or specific to microservices. You wrote code to satisfy a domain, the domain changed, now you need to change that code so that it satisfies the new domain. That happens all the time, merging services isn't any harder than merging modules.
3. General misunderstandings about state, security, etc.
> What I was getting at is that when you go distributed,
I'm not really convinced that "distributed" is the right word here. People talk about distributed systems being complex, and I think they're confused - what's complex is consensus, but splitting one service from another service shouldn't impact consensus, and the fact that they're now located on two different assets does not necessarily make things more complex.
Those services may be more complex, if your application was quite trivial - a totally stateless system with no external connections, for example. I see no reason to rewrite 'grep' as a microservice, and I would never recommend that.
Those services may now be more error-prone because you have things like dns, tcp, etc involved. If you don't want to make that tradeoff, that's OK, you could be right in that case. Again, no need to make all software be microservices.
(Going to respond to your other message here)
> PS I think microservices excel in making people FEEL productive (doing work that is not directly benefiting the company).
Maybe, I don't really know. It isn't my experience, but that's just me. Most developers seem to be pretty bad at their jobs so I imagine that all sorts of issues can be experienced. Certainly the idea of rewriting a monolith as a microservice seems like a red flag unless there were very specific needs.
At some point, you can't gracefully handle bugs in other peoples code. If a function you call causes a SEGFAULT, in the vast majority of software, you're not expected to handle that. That's an invariant error, and you probably want some way to detect that it happened so you can fix it, but it's not reasonable to ask every caller of every function to handle that (in the same way we don't consider "the earth blew up" to be a reasonable thing to protect against, even though it its technically possible). There's simply not enough time and money to protect against every possible edge case in most software (NASA projects aside).
The argument here is that network issues are exceedingly common in microservice environments and so aren't actually an edge failure case, so you actually have to worry about them way more than you would worry about a function in a different module causing a SEGFAULT.
The point is not to handle individual bugs, it is to handle all failures. This is the difference between a "defensive programming" approach and the "let it crash"/ "zen of erlang" approach. Actors are designed such that they have failure isolation, which means they can react to errors in other actors without worrying about their own state. They then have two options based on one of two bug classes - transient and persistent.
Persistent errors are propagated to the supervisor. Transient errors are either retried or propagated.
It doesn't matter if it's a network error, a disk error, a timeout, a crash, a cosmic radiation bit flip - your approach is always one of those two. So adding more failure cases doesn't "matter" in terms of your error handling, although you may want to adopt helpful patterns in the nuances of "retry".
The frequency of errors will obviously increase with a network error (arguably very very little), but the pattern is fundamental to resiliency.
If your network is truly so unreliable that you can not pay that cost, don't do it. I don't think most people are developing on networks that fail for long periods of time frequently.
I'm not sure what you're talking about. What automatic handling of network exceptions? What safe failures? BEAM has lots of great features, no question, but they have very little to do with the implementation of actors - BEAM primarily provides names and linking as useful primitives.
Maybe? I can't compare 50% to 100 billion. If your computer crashed every 100 billion instructions that would be a problem. If it crashed every other instruction, that would be very slightly more (or the same amount) of a problem.
The point is that if you have a function call, you have the opportunity for a bug/ failure. Networks don't change that - you have the opportunity for a bug/ failure. The major difference is that services have stronger failure isolation.
I didn't say it removes state. I said it split the state up and isolated it. That's critically important - you physically can not mutate state across a network, you have to pass messages from one system to the other over a boundary, either via some protocol like TCP or via intermediary systems like message brokers.
That timestamp is rough, I just found a related section of the talk.
> And the network-defined state is a hell of a lot harder to trace and debug.
There's no such thing as network-defined state. I assume you're saying that it's harder to debug bugs that span systems, which is true, but not interesting since that's fundamental to concurrent systems and not to microservices.
I think you have a very narrow idea about what "mutating state" really means. You seem to talk about DMA access only. But you can manipulate the state of an application by writing to a shared data store, by calling an API, and countless other ways. It is really more of a concept for us humans to define where an application begins and ends.
Let's take an example. If we have two services that wants to keep the full name of a logged in user for some reason, that piece of state can be said to be shared between the applications. Should one service want to change that piece of data (perhaps we had it wrong and the user wanted to set it right), the service must now mutate the shared state. It does not matter whether it is done by evicting a shared cache or if we write the updated data to the service directly, we still speak of a shared state that is updated.
Now we can stipulate that the more of these things we have, the more coupled two pieces of software is, which generally makes reasoning about the system harder. It is not as black and white as one type of coupling is considered acceptable and the other isn't, but some types are easier to reason about than others. Joe really thought hard about these things and it really shows in the software he wrote.
We all share state in that we all exist within the same universe. But the universe has laws of causality, and Joe advocated that software should always maintain causal consistency.
A database is not needed for your example. You could replace it with an actor holding onto its own memory. But all mutations to that actor, which the other actors hold references to via their mailbox, are causally consistent and observable.
That is the premise of the talk I linked elsewhere.
> Fundamentally, microservices allow you to isolate state.
I think it's not really state isolation: your state is now spread across multiple separate services and a queue, which is objectively more complicated. To me, it's more the extreme version of things like dunder methods in Python or opaque structs in C: it prevents a specific type of programmer behavior. But honestly, it feels easier to solve this in code review.
Like, I agree it's bad to reach behind the public API of something, but microservices aren't immune to this. I've never worked on a microservice architecture that didn't have weird APIs just to support specific use cases, or had a bunch of WONTFIX bugs because other services depended on the buggy behavior. That's not fundamentally different than "this super important program calls .__use_me_and_get_fired__": you have an external program dictating the behavior and architecture of your own.
And you get multiple other layers of complexity here: networks, distributed transactions, separate dependency graphs, securing inter-server communications, auth/auth.
I don't think you're entirely wrong--there's a lot of history looking at state as a series of immutable updates (Git, Redux), and I think it is harder to "cheat" in this way using microservices. I just think it's far from a clear win.
> Fundamentally, microservices allow you to isolate state.
You can have logical dependencies between those "isolated" states anyway so I don't see that as a benefit really compared to say Java OOP private fields.
Re immutability -- I would say a well written backend in any language would (probably?) throw away the entire state between each request being handled. It's possible to introduce state, sure, but why and how does that happen? For very many backends the only natural thing to do is to code them stateless, keep all the state is in the database, and each new request starts in a fresh world.
I see two common sources of state in any backend (monolithic or not):
1) Caching, whether resources or flags, whitelists
2) Connection pools.
If there are ever any issues with those they can be segmented inside a monolith for a fraction of the cost of going to microservices (either using the same boundaries as if you had split into microservices -- or other boundaries, like just one set of caches/connection pools per endpoint handler..)
So I agree with the OP that the social aspect and development process is the only rational for microservices.
Otherwise, just scale the monolith horizontally to the same number of instances and you have strictly more ways to partition state; microservices only give you one way to partition state that may not even be the best one.
I don't really agree. It's probably the most agreed upon benefit, but there are others. Fundamentally, microservices allow you to isolate state. I can take two services and split them up - and now I've split their state spaces. I can put a queue service between them and now I've sliced out the state of their communication.
This slicing up of state has a ton of benefits if done right.
1. Isolation of state is basically the key to having scalable concurrency. Watch the Whatsapp talk on Erlang and they say "Isolation, Isolation, Isolation".
2. Isolation of services is great for security. You can split up permissions across your services, limit their access in a more granular way, etc.
Those are pretty nice wins. They're achievable through discipline in a monolith (heavy use of module boundaries, heavy use of immutability - something most mainstream languages don't encourage) but a network boundary really forces these things and makes unintentional stateful coupling a lot more painful.
> It is easier to start with a monolith, find the right design and then split on the boundaries than it is to make the correct microservices to begin with.
I disagree. Starting with a bad set of microservices can be fixed by merging. Merging two codebases is trivial compared to splitting. Again, if I have isolation between the services, even if the slicing up was done badly, even if they're coupled, I can just remove the layer between them.
Splitting has to start from a place of coupling and then try to decouple - this is especially hard with languages that encourage encapsulated mutable state (most of them).