So, dear reader, I hope you enjoyed your time in the land of rainbows, unicorns and
all things shiny, because it’s about to get ugly as we get slapped around the face by
the wet fish of reality.
There’s a lot to like about Serverless architectures and I wouldn’t have spent time
writing about them if I didn’t think there were a lot of promise in them, but they
come with significant trade-offs. Some of these are inherent with the concepts - they
can’t be entirely fixed by progress and are always going to need to be considered.
Others are down to the current implementations and with time we could expect to
see those resolved.
With any outsourcing strategy you are giving up control of some of your system to a
3rd-party vendor. Such lack of control may manifest as system downtime, unexpected limits,
cost changes, loss of functionality, forced API upgrades, and more. Charity Majors, who I
referenced earlier, explains this problem in much more detail in the Tradeoffs section of
[The Vendor service] if it is smart, will put strong constraints on how you
are able to use it, so they are more likely to deliver on their reliability
goals. When users have flexibility and options it creates chaos and
unreliability. If the platform has to choose between your happiness vs thousands
of other customers’ happiness, they will choose the many over the one every time
— as they should.
-- Charity Majors
Multitenancy refers to the situation
where multiple running instances of software for several different customers (or
tenants) are run on the same machine, and possibly within the same hosting
application. It's a strategy to achieve the economy of scale benefits
we mentioned earlier. Service
vendors try their darndest to make it feel as a customer that we are the only people
using their system and typically good service vendors do a great job at that. But
no-one’s perfect and sometimes multitenant solutions can have problems with security
(one customer being able to see another’s data), robustness (an error in one
customer’s software causing a failure in a different customer’s software) and
performance (a high load customer causing another to slow down.)
These problems are not unique to Serverless systems - they exist in many other
service offerings that use multitenancy - but since many Serverless systems are new
we may expect to see more problems of this type now than we will once these systems
Here’s the 3rd problem related to Serverless vendors - lock in. It’s very likely
that whatever Serverless features you’re using from a vendor that they’ll be
differently implemented by another vendor. If you want to switch vendors you’ll
almost certainly need to update your operational tools (deployment, monitoring,
etc.), you’ll probably need to change your code (e.g. to satisfy a different FaaS
interface), and you may even need to change your design or architecture if there are
differences to how competing vendor implementations behave.
Even if you manage to be able to do this for one part of your ecosystem you may
be locked in by another architectural component. For instance say you’re using AWS
Lambda to respond to events on an AWS Kinesis message bus. The differences between
Google Cloud Functions and
Microsoft Azure Functions may be relatively
small, but you’re still not going to be able to directly hook up the latter 2 vendor
implementations directly to your AWS Kinesis stream. This means that moving, or
porting, your code from one solution to another isn’t going to be possible without
also moving other chunks of your infrastructure too.
And finally even if you figure out a way to reimplement your system with a
different vendor’s capabilities you’re still going to have a migration process
dependent on what your vendor offers you. For example if you’re switching from 1
BaaS database to another do the export and import features of the original and
target vendors do what you want? And even if they do, at what amount of cost and
One possible mitigation to some of this could be an emerging general
abstraction of multiple Serverless vendors, and we’ll discuss that further later.
This really deserves an article in an of itself but embracing a Serverless
approach opens you up to a large number of security questions. Two of these
are as follows, but there are many others that you should consider.
- Each Serverless vendor that you use increases the number of different
security implementations embraced by your ecosystem. This
increases your surface area for malicious intent and the
likelihood for a successful attack.
- If using a BaaS Database directly from your mobile platforms you
are losing the protective barrier a server-side application provides
in a traditional application. While this is not a dealbreaker it
does require significant care in designing and developing your
Repetition of logic across client platforms
With a ‘full BaaS’ architecture no custom logic is written on the server-side -
it’s all in the client. This may be fine for your first client platform but as soon
as you need your next platform you’re going to need to repeat the implementation
of a subset of that logic that you wouldn’t have done in a more traditional
architecture. For instance if using a BaaS database in this kind of system all your
client apps (perhaps Web, native iOS and native Android) are now going to need to be
able to communicate with your vendor database, and will need to understand how to
map from your database schema to application logic.
Furthermore if you want to migrate to a new database at any point you’re going to
need to replicate that coding / coordination change across all your different
Loss of Server optimizations
Again with a ‘full BaaS’ architecture there is no opportunity to optimize your
server-design for client performance. The
‘Backend For Frontend’ pattern exists to abstract certain
underlying aspects of your whole system within the server, partly so that the client
can perform operations more quickly and use less battery power in the case of mobile
applications. Such a pattern is not available for 'full BaaS'.
I’ve made it clear that both this and previous drawback exist for ‘full BaaS’
architectures where all custom logic is in the client and the only backend services
are vendor supplied. A mitigation of both of these is actually to embrace FaaS or some
other kind of lightweight server-side pattern to move certain logic to the server.
No in-server state for Serverless FaaS
After a couple of BaaS-specific drawbacks let’s talk about FaaS for a moment. I
FaaS functions have significant restrictions when it comes to local .. state. ..
You should assume that for any given invocation of a function none of the in-process
or host state that you create will be available to any subsequent invocation.
I also said that the alternative to this was to follow factor number 6 of the
‘Twelve Factor App’ which is to embrace this very constraint:
Twelve-factor processes are stateless and share-nothing. Any data that needs to
persist must be stored in a stateful backing service, typically a database.
-- The Twelve-Factor App
Heroku recommends this way of thinking but you can bend the rules when running on
their PaaS. With FaaS there’s no bending the rules.
So where does your state go with FaaS if you can’t keep it in memory? The quote
above refers to using a database and in many cases a fast NoSQL Database,
out-of-process cache (e.g. Redis) or an external file store (e.g. S3) will be some of
your options. But these are all a lot slower than in-memory or on-machine persistence.
You’ll need to consider whether your application is a good fit for this.
Another concern in this regard is in-memory caches. Many apps that are reading from
a large data set stored externally will keep an in-memory cache of part of that data
set. You may be reading from ‘reference data’ tables in a database and use something
like Ehcache. Alternatively you may be reading from an http
service that specifies cache headers, in which case your in-memory http client can
provide a local cache. With a FaaS implementation you can have this code in your app
but your cache is rarely, if ever, going to be of much benefit. As soon as your cache
is ‘warmed up’ on the first usage it is likely to be thrown away as the FaaS instance
is torn down.
A mitigation to this is to no longer assume in-process cache, and to use a
low-latency external cache like Redis or Memcached, but this (a) requires extra work
and (b) may be prohibitively slow depending on your use case.
The previously described drawbacks are likely always going to exist with
Serverless. We’ll see improvements in mitigating solutions, but they’re always going
to be there.
The remaining drawbacks, however, are down purely to the current state of the art.
With inclination and investment on the part of vendors and/or a heroic community
these can all be wiped out. But for right now there are some doozies...
AWS Lambda functions offer no configuration. None. Not even an environment
variable. How do you have the same deployment artifact run with different
characteristics according to the specific nature of the environment? You can’t. You
have to redefine the deployment artifact, perhaps with a different embedded config
file. This is an ugly hack. The Serverless framework
can abstract this hack for you, but it’s still a hack.
I have reason to believe that Amazon are fixing this (and probably pretty soon) and
I don’t know whether other vendors have the same problem, but I mention it right at
the top as an example of why a lot of this stuff is on the bleeding edge right
Here’s another fun example of why Caveat Emptor is a key phrase whenever you’re
dealing with FaaS at the moment. AWS Lambda, for now, limits you to how many
concurrent executions you can be running of all your lambdas. Say that this limit is
1000, that means that at any one time you are allowed to be executing 1000 functions.
If something causes you to need to go above that you may start getting exceptions,
queueing, and/or general slow down.
The problem here is that this limit is across your whole AWS account. Some
organizations use the same AWS account for both production and testing. That means if
someone, somewhere, in your organization does a new type of load test and starts
trying to execute 1000 concurrent Lambda functions you’ll accidentally
DoS your production applications. Oops.
Even if you use different AWS accounts for production and development one
overloaded production lambda (e.g. processing a batch upload from a customer) could
cause your separate real-time lambda-backed production API to become unresponsive.
Other types of AWS resources can be separated by context of environment
and application area through various security and firewalling concepts. Lambda needs
the same thing, and I’ve no doubt it will before too long. But for now, again,
Earlier on in the article I mentioned that AWS Lambda functions are aborted
if they run for longer than 5 minutes. That's a limitation which I would expect
could be removed later, but it will interesting to see how AWS approach that.
Another concern I mentioned before was how long it may take a FaaS function to
respond, which is especially a concern of occasionally used JVM-implemented
functions on AWS. If you have such a Lambda function it may take in the order
of 10s of seconds to startup.
I expect AWS will implement various mitigations to improve this over time, but
for now it may be a deal-breaker for using JVM Lambdas under certain use cases.
OK, that’s enough picking on AWS Lambda specifically. I’m sure the other vendors
also have some pretty ugly skeletons barely in their closets.
Unit testing Serverless Apps is fairly simple for reasons I’ve talked about earlier
- any code that you write is ‘just code’ and there aren’t for the most part a whole
bunch of custom libraries you have to use or interfaces that you have to
Integration testing Serverless Apps on the other hand is hard. In the BaaS world
you’re deliberately relying on externally provided systems rather than (for instance)
your own database. So should your integration tests use the external systems too? If
yes how amenable are those systems to testing scenarios? Can you easily tear-up /
tear-down state? Can your vendor give you a different billing strategy for
If you want to stub those external systems for integration testing does the vendor
provide a local stub simulation? If so how good is the fidelity of the stub? If the
vendor doesn’t supply a stub how will you implement one yourself?
The same kinds of problems exist in FaaS-land. At present most of the vendors do
not provide a local implementation that you can use so you’re forced to use the
regular production implementation. But this means deploying remotely and testing using
remote systems for all your integration / acceptance tests. Even worse the kinds of
problems I just described (no configuration, cross-account execution limits) are going
to have an impact on how you do testing.
Part of the reason that this is a big deal is that our units of integration with
Serverless FaaS (i.e. each function) are a lot smaller than with other architectures
and therefore we rely on integration testing a lot more than we may do
with other architectural styles.
Tim Wagner (general manager of AWS Lambda) made a brief reference at the recent
Serverless Conference that they were tackling testing, but it sounded like it was
going to rely heavily on testing in the cloud. This is probably just a brave new
world, but I’ll miss being able to fully test my system from my laptop, offline.
Deployment / packaging / versioning
This is a FaaS specific problem. Right now we’re missing good patterns of bundling
up a set of functions into an application. This is a problem for a few reasons:
- You may need to deploy a FaaS artifact separately for every function in your entire
logical application. If (say) your application is implemented on the JVM and you have
20 FaaS functions that means deploying your JAR 20 times.
- It also means you can’t atomically deploy a group of functions. You may need to
turn off whatever event source is triggering the functions, deploy the whole group,
and then turn the event source back on. This is a problem for zero-downtime
- And finally it means there’s no concept of versioned applications so atomic
rollback isn’t an option.
Again there are open source workarounds to help with some of this, however it can
only be properly resolved with vendor support. AWS announced a new initiative named
‘Flourish’ to address some of these concerns at the recent Serverless Conference, but
have released no significant details as of yet.
Similarly to the configuration and packaging points there are no well-defined
patterns for discovery across FaaS functions. While some of this is by no means FaaS
specific the problem is exacerbated by the granular nature of FaaS functions and the
lack of application / versioning definition.
Monitoring / Debugging
At present you are stuck on the monitoring and debugging side with whatever the
vendor gives you. This may be fine in some cases but for AWS Lambda at least it is
very basic. What we really need in this area are open APIs and the ability for third
party services to help out.
API Gateway definition, and over-ambitious API Gateways
A recent ThoughtWorks Technology Radar
discussed over-ambitious API Gateways.
While the link refers to API Gateways in
general it can definitely apply to FaaS API Gateways specifically, as I mentioned
earlier. The problem is that API Gateways offer the opportunity to perform much
application specific-logic within their own configuration / definition domain. This
logic is typically hard to test, version control, and even often times define. Far
better is for such logic to remain in program code like the rest of the
With Amazon’s API Gateway at present you are forced into using many
Gateway-specific concepts and configuration areas even for the most simple of
applications. This is partly why open source projects like the
and Claudia.js exist,
to abstract the developer from implementation-specific concepts
and allow them to use regular code.
While it is likely that there will always be the opportunity to over-complicate
your API gateway, in time we should expect to see tooling to avoid you having to do so
and recommended patterns to steer you away from such pitfalls.
Deferring of operations
I mentioned earlier that Serverless is not ‘No Ops’ - there’s still plenty to do
from a monitoring, architectural scaling, security, networking, etc. point of view.
But the fact that some people (ahem, possibly me, mea culpa) have described Serverless
as ‘No Ops’ comes from the fact that it is so easy to ignore operations when you’re
getting started - “Look ma - no operating system!” The danger here is getting lulled
into a false sense of security. Maybe you have your app up and running but it
unexpectedly appears on Hacker News, and suddenly you have 10 times the amount of
traffic to deal with and oops - you’re accidentally DoS’ed and have no idea how to
deal with it.
The fix here, like part of the API Gateway point above, is education. Teams using
Serverless systems need to be considering operational activities early and it is on
vendors and the community to provide the teaching to help them understand what this