Hancock Bank, a small local bank on Mississippi's Gulf coast, gives a master class in survivability. In the days after Katrina they gave out $50M in cash. They used folding table on the sidewalk for branches. The IOUs were handwritten notes on sticky notes. Its a great story - more here.
The story is a case study for how Howard Lipson describes survivability - "the ability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents".
Howard Lipson's fundamental goal of survivability:
The mission must survive
Not any individual component Not even the system itself
What's interesting about this story is the ability provide essential services outside of the system itself. This isn't something that can be solved by adding to the existing system. Upgrading access control doesn't solve it, but rather carving out ways to deliver essential services.
Howard Lipsons defines the 3 R's of survivability - Resistance, Recognition and Recovery. Where Recovery is the "ability to restore essential services during attack, and recover full services after attack"; most people will focus on the second part - recovering full services. But what's important to focus on to even get to that point is the first part - "ability to restore essential services during attack." eBay had a similar concept for scalability called "limp mode." This middle state is the essence of survivability to me whether its limp mode or handing out $50M in cash from folding tables. The job is to assume failure, in most cases assume centralized management structures are replaced with decentralized and figure out a way to survive.