GRC - To Be or To Do

GRC (or Governance, Risk Management, and Compliance for the uninitiated) is all the rage, but I have to say I think that again Infosec has the wrong focus. My problem with making GRC the central part of Infosec programs is best summed up by Charles Harris' annual letter to shareholders. Charles Harris is the CEO of Harris and Harris (TINY) which is a venture capital firm that invests in Nanotech companies. Their portfolio is comprised of nanotech startups doing fascinating things, some example:

Cambrios Technologies Corporation develops electronic materials for the display industry. The company’s first product, a directly patternable, wet-processable transparent conductive film, is designed as a replacement for indium tin oxide (ITO), which is the current industry-standard transparent conductor material.

Nanosys, Inc., is developing nanotechnology-enabled systems incorporating novel and patent-protected zero and one-dimensional nanometer-scale materials such as nanowires, nanotubes and nanodots (quantum dots).

Solazyme, Inc., is a biotechnology company focused on synthetic biology for the renewable bioproduction of fuels, industrial oleochemicals, and health and wellness ingredients from marine microbes.

But Harris & Harris itself remains a small VC company whose focus is on finding the companies building the next great thing. Nanotech is a pretty amazing space, with truly awe inspiring potential. Now here is where we get to my beef with GRC focus, from the Harris & Harris annual shareholder letter:

Another reason that our Company needs to continue growing its assets is to drive down cash expenses as a percentage of net assets. In recent years, the expenses that a publicly traded business development company must incur to meet regulatory requirements have escalated dramatically, pursuant to the Sarbanes-Oxley Act of 2002, Rule 38a-1 for investment companies, expanded compensation disclosure and analysis requirements, FASB Statement No. 157 for the valuation of assets, etc.

In 2002, we had fewer investments and one office instead of two, but otherwise our business was the same
as it is now. We got along fine with one internal accountant, a single outside accounting firm, no corporate
compliance consultants, and no internal lawyers. Our business structure is very simple – no inventories, no
receivables, no off-balance sheet entities, no debt, no preferred stock, one wholly-owned subsidiary, essentially all of our assets held by one custodian – yet our independent registered public accounting firm charged us approximately $290,000 in 2007 and will charge us up to an estimated $340,000 in 2008. The same firm charged us $55,500 in 2002. Today, in order to fulfill our regulatory requirements, we find ourselves having to employ two internal accountants; three accounting firms, including our independent registered public accounting firm; three law firms for counsel unrelated to our investment activities; a compensation consulting firm; a compliance consultant; an asset-valuation consulting firm; and two internal lawyers; and we now have to hold many more Board committee meetings. In 2007, our directors' and officers' liability insurance premium expenses were $521,884, versus $68,216 in 2002. In 2007, our legal expenses were $323,366, versus $149,954 in 2002. To put all of this corporate-
governance overhead into perspective, we have only 13 full-time employees!

Wow. Those 13 employees should spend their time analyzing the incredibly complex nanotech space, find opportunities, shepherding companies, and so on. Instead there is a massive ballooning focus called compliance they are dealing with.

I realize that normal information security programs are not focused on protecting VCs, they are focused on banks, insurance, manufacturing and so on, but here is the point - Harris and Harris' core business is nanotech investing not compliance checkbox Olympics. Now normal information security programs have been underfunded for a very a long time, and when the wave of compliance regulations hit they scrambled to align their programs with the new regulations, of course you have to deal with regulations, but this does not do anything to provide security to your core business.

In the Security Architecture Blueprint that I built, we start with stakeholder goals, and those are translated into security architecture, security policy & standards, and a set of risk management actions. Compliance is important, but its a subset of risk management. The top level goal is a security architecture which in James McGovern's words "enables the strategic intent of the business."

From a security standpoint we enable the stakeholder's goals through delivering an effective, scalable security architecture communicated through real world policy & implementable standards; and further providing guidance on making informed risk management decisions.

So while compliance is important and there is a lot of investment dollars there (because large vendors have realized they can sell Provisioning suites which are really very basic Tomcat apps with a couple of hooks to arcane directories for 7 figures because its under the rubric of compliance!), this wave of investment and attention should not distract information security from the real issues - building security into systems, dealing with threats and vulnerabilities, and protecting assets.

Overfocus on compliance for short term gains, or work to build secuirty into your company, it is a classic To Be or To Do situation:

"One day you will come to a fork in the road. And you're going to have to make a decision about what direction you want to go." [Boyd] raised his hand and pointed. "If you go that way you can be somebody. You will have to make compromises and you will have to turn your back on your friends. But you will be a member of the club and you will get promoted and you will get good assignments." Then Boyd raised the other hand and pointed another direction. "Or you can go that way and you can do something - something for your country and for your Air Force and for yourself. If you decide to do something, you may not get promoted and you may not get the good assignments and you certainly will not be a favorite of your superiors. But you won't have to compromise yourself. You will be true to your friends and to yourself. And your work might make a difference." He paused and stared. "To be somebody or to do something. In life there is often a roll call. That's when you will have to make a decision. To be or to do? Which way will you go?"

Personally, I am happy sticking to classic infosec knitting - delivering confidentiality, integrity, and availability through authentication, authorization, and auditing. But if you are looking for a next generation conceptual horse to bet on, I don't think GRC is it, I would look at information survivability. Hoff's information survivability primer is a great starting point for learning about survivability.

Why survivability is more valuable over the long haul than GRC is that survivability is focused on assets not focused on giving an auditor what they need, but giving the business what it needs.

Seminal paper on survivabilityby Lipson, et al. "survivability solutions are best understood as risk management strategies that first depend on an intimate knowledge of the mission being protected." Make a difference - asset focus, not auditor focus.

Book Review: Brave New War

0471780790 John Robb's Brave New War provides an excellent summary of the major security issues that military, governments and businesses have to deal with. Robb explores the asymmetries in information, technology, intelligence, and agility that can give a small, disgruntled band of people certain advantages over very large and powerful systems. His excellent blog is rife with these examples.

Due to a number of technological factors, small groups of people can bring very powerful weapons to bear on large systems, due to our hypermedia age, Robb's so-called Global Guerillas can learn from each other in an open source type way, Robb gives examples of the Iraq IED marketplace where IED entrepreneurs learn how to improve techniques from each other.

There are many parallels with the above and computer security. In computer security, enterprises have to defend thousands of machines and connections. An attacker need only find one exploit. It is very likely that the attacker knows far more about the security vulnerabilities in your operating system, app server, web server, and database than the person who is administering it. This is an information asymmetry that can be(and is) exploited. In the computer security world we typically think of things in white hat and black hat ways. I tend to think of Robb as the physical world's uber Black Hat and Thomas Barnett as the White Hat (heck he even advocates for a sys admin approach).

Sadly, another parallel is investment in security. While the US military fights guerillas, the Pentago invests in more battleships and submarines. While enterprise IT connects millions of customers and partners throughout their systems, IT security buys firewalls and network secuity gear. This is not just fighting the last war, this is fighting in the last century.

The last part of the book "Rethinking Security" was the most interesting for me. Robb points out that you cannot really expect to deal with all the threats. Attacks evolve. As Pete Lindstrom says there are three reasons for this

1. Intelligent adversary
2. Intelligent adversary
3. Intelligent adversary

So instead of assuming the naive "patch and pray" approach, Robb advocates for survivability as the centerpiece for a 21st century approach to security. This was quite a nice surprise to find at the end of an already enjoyable book. One of my favorite people to work with, Howard Lipson has been beating the drum for computer security to deal with survivability for awhile. Howard's three R's for survivability are:

Resistance - ability of a system to repel attacks Recognition - ability to recognize attacks and the extent of the damage Recovery - ability to restore essential services during attack, and recover full services after attack

Of course, as I blogged yesterday the Anasazi were pretty good at this stuff a few hundred years ago. Wonder when computer scientists will catch up?

Design for Failure

ACM's interview of Bruce Lindsay by Steve Bourne is a classic. Designing for failure is something a lot of people talk about, but not with much specificity. Not so with this interview:

SB Are you really thinking of system failures as opposed to user errors?

BL I don’t think of user errors, such as improper input kinds of things, as “failures.” Those are normal occurrences. I don’t think of a compiler saying you misspelled goto as really being an error. That’s expected.

If you look at the OWASP Top Ten for example, or other lists like SANS, so many of the so-called security vulnerabilities go back input validation issues. But they are normal and even input that is not intended to malicious, for example due to poor training, causes faults that can compromise the system.

SB One thing we could explore here is what techniques we have in our toolkit for error detection.

BL Fault handling always begins with the detection of the fault—most often by use of some kind of redundancy in the system, whether it be parity, sanity checks in the code where we may spend 10 percent of the lines of code checking the consistency of our state to see if we should go into error handling. Timeout is not exactly a redundancy but that’s another way of detecting errors.

The key point here is that if you go to sea with only one clock, you can’t tell whether it’s telling you the right time. You need to have some way to check. For example, if you read a message from a network, you might want to check the header to see if it is really a message that you were expecting—that is, look for some bits and some position that says, “Aha, this seems like I should go further.”

Or if you read from the disk, you might want to check a label on the disk block to see if it was the block you thought you were asking for. It’s always some kind of redundancy in the state that allows you to detect the occurrence of an error. If you hadn’t thought about failures, why would you put the address of a disk block into the disk block?

SB So, really what you’re trying to do is establish confidence in your belief about what’s going on in the system?

BL In a large sense, that’s right. And to validate, as you go, that the data or state upon which you’re going to operate is self-consistent.

Self-consistency echoes an assurance goal also advocated for by Brian Snow where he asks for how can we safely use security gear that we cannot trust?

SB Once you’ve detected the error, now what? You can report it, but the question is who do you report it back to and what do you report back?

BL There are two classes of detection. One is that I looked at my own guts and they didn’t look right, and so I say this is an error situation. The other is I called some other component that failed to perform as requested. In either case, I’m faced with a detected error. The first thing to do is fold your tent—that is, put the state back so that the state that you manage is coherent. Then you report to the guy who called you, possibly making some dumps along the way, or you can attempt alternate logic to circumvent the exception.

In our database projects, what typically happens is it gets reported up, up, up the chain until you get to some very high level that then says, “Oh, I see this as one of those really bad ones. I’m going to initiate the massive dumping now.” When you report an error, you should classify it. You should give it a name. If you’re a component that reports errors, there should be an exhaustive list of the errors that you would report.

That’s one of the real problems in today’s programming language architecture for exception handling. Each component should list the exceptions that were raised: typically if I call you and you say that you can raise A, B, and C, but you can call Joe who can raise D, E, and F, and you ignore D, E, and F, then I’m suddenly faced with D, E, and F at my level and there’s nothing in your interface that said D, E, and F errors were things you caused. That seems to be ubiquitous in the programming and the language facilities. You are never required to say these are all the errors that might escape from a call to me. And that’s because you’re allowed to ignore errors. I’ve sometimes advocated that, no, you’re not allowed to ignore any error. You can reclassify an error and report it back up, but you’ve got to get it in the loop.

Security exceptions in many cases require special handling, so that they may be softened before returning any data to a client or log file. The programming langauge support for error detection, softening, and reporting is pathetic, programmers must literally grow their own wheat to make bread in this case.

SB One of the interesting aspects of this is the trade-off between how long it takes to detect something and how much time you really have to recover in the system.

BL
And what it means to remove the failed component, because there is a split brain problem that I think you’re out and you think I’m out. Who’s in charge?

SB
Right, and while they’re arguing about it, nothing is happening.

BL
That’s also possible, although some of these systems can continue service. It’s rare that a distributed system needs the participation of all active members to perform a single action.

There is also the issue of dealing out the failed members. If there are five of us and four of us think that you’re dead, the next thing to do is make sure you’re dead by putting three more bullets in you.

We see that particularly in the area of shared management of disks, where you have two processors, two systems, connected to the same set of storage. The problem is that if one system is going to take over the storage for the other system, then the first system better not be using the storage anymore. We actually find architectural facilities in the storage subsystems for freezing out participants—so-called fencing facilities.

So if I think you’re dead and I want to take over your use and responsibility for the storage, the first thing I want to do is tell the storage, “Pay no attention to him anymore. I’m in charge now.” If you were to continue to use the storage while I blithely go forward and think I’m the one who’s in charge of it, terrible things can happen.

There are two aspects of collaboration in distributed systems. One is figure out who’s playing, and the second one is, if someone now is considered not playing, make damn sure somehow that they’re not playing.

This lesson applies to a lot of security in distirbuted system, including authentication systems, SE(I)Ms, XML gateways, and a lot more. Part of the solution is to attempt to use tools, for example a STS, enabling "security" to not function as some dualistic boolean, but rather a composeable domain, with its own rules, behavior, and logic to adapt to these situations, since the languages and servers are not able without a tremndous amount of fu. Remember that Survivability has three R's: Resistance, Recognition, and Recovery, the Anasazi certainly did.

 

The Road to Assurance

Brian Snow of the NSA released an informative and insightful paper - "We Need Assurance!"
The paper starts with this question, that captures the balancing act between security and functionality.

When will we be secure? Nobody knows for sure - but it cannot happen before commercial security products and services possess not only enough functionality to satisfy customers' stated needs, but also sufficient assurance of quality, reliability, safety, and appropriateness for use.

This underscores why architecture is crucial to security, because architectural tradeoff analysis resolves conflicts and constraints from disparate domains.

Throughout these years, my mantra has been, "Managers are responsible for doing things right: Technical Directors are responsible for finding the right things to do.

Absolutely, the primary focus of these activities requires different mindsets.

In discussing failures of security products:

First to many of these products are still designed and developed using methodologies assuming random failure as the model of the deployment environment rather than assuming malice. There is a world of difference!

Second, users often fail to characterize the nature of the threat they need to counter. Are they subject only to a generic threat of an opponent seeking som weak system to beat on, not necessarily theirs, or are they subject to a targeted attack, where the opponent wants something specific of theirs and is willing to focus his resources on getting it?

This underscores the hazards of the dualistic "trusted and untrusted" view that many organizations have. Instead of the perimeter focus such as in networks, each architectural layer - physical, network, host, identity, application, and data - needs its own perimeter and control gradient.

The differentiation that Brian Snow points out is a chief design consideration, systems need one level of security to deal with generic door rattling threats and a deeper level to deal with targeted attacks. The latter requires asset valuation to determine which areas need this level.

The paper also brings an important point that questions the old standby that security somehow impedes business. Many car manufacturers have differentiated their products through superior quality, reliability and safety, why is this so hard for software manufacturers to grasp?

Assurance is best addressed during the initial design and engineering of security systems

Build Security In, anyone?

In addressing the current posture:

We will be in a truly dangerous stance: we will think we are secure (and act accordingly) when in fact we are not secure.

Sadly this bifurcation of perception and reality drives a lot of the software market.

During the next several years, we need major pushes and advances in three areas: Scalability, Interoperability, and Assurance. I believe that market pressures will provide the first two, but not the last one - assurance.

Another perceptive distinction, I am inclined to agree with one caveat that increased interop like we are starting to see with SAML, Federation, WS-* et. al. has some incremental improvements for assurance

Assurance is more formally defined:

1. The system's security policy is internally consistent and reflects the requirements of the organization.
2. There are sufficient security functions to support the security policy
3. The system functions meet a desired set of properties and only those properties
4. The functions are implemented correctly, and
5. The assurances hold up through the manufacturing, delivery, and life cycle of the system.

The paper then addresses six areas to target: OS, software modules, hardware, system engineering, thrid party testing, and legal constraints. I will blog/annotate this in a future post.

I really have to applaud Brian Snow for sharing this analysis with open community, collaboratiive knowledge sharing should help the open community if we are prepared to listen, learn, and *implement* these lessons.

Update: Adam, the lead saxophone player in the Emergent Chaos Jazz Combo, has placed Brian Snow's assurance view in the context of an hacker view and an economic view of software security. This is hugely instructive. None of these views, by themselves are adequate. The combination of horizontal and vertical views is what yields the most accurate picture. Obviously, iteration is the only way to work towards that. Adam's brilliant suggestion? OODA Loops.

Update 2: Part 2 analysis of Brian Snow's paper.

Geer, Massachusetts, and Monoculture

Dan Geer has an editorial "Massachusetss Assaults Monoculture" on CNet on Massachusetts' move to the OpenDocument format:

As a matter of logic alone: If you care about the security of the commonwealth, then you care about the risk of a computing monoculture. If you care about the risk of a computing monoculture, then you care about barriers to diversification. If you care about barriers to diversification, then you care about user-level lock-in. And if you care about user-level lock-in, then you must break the proprietary format stranglehold on the commonwealth. Until that is done, the user-level lock-in will preclude diversification and the monoculture bomb keeps ticking.

The Massachusetts Department of Administration and Finance does care, and its Enterprise Technical Reference Model specifies OpenDocument Format. That standard is precisely what is needed and not a moment too soon.

This relates directly to yeserday's post on survivability - diversity is a key factor in ensuring survivability.

Howard Lipson on Survivabilty

I recently got to work on a project with Howard Lipson. One interesting concept that he has been working for awhile is survivability in software systems. Survivability encompasses a number of domains I have seen in deployment and operations, but it brings them together into a unifying structure. Howard's slides from a recent presentation "Cyber Security and Control System Survivability" are online.

Survivability addresses

Traditional computer security is not adequate to keep highly distributed systems running in the face of cyber attacks. Survivability is an emerging discipline - a risk-management-based security paradigm.

Survivability is defined as:


the ability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents.

The 3 R's of Survivability

Resistance - ability of a system to repel attacks

Recognition - ability to recognize attacks and the extent of the damage

Recovery - ability to restore essential services during attack, and recover full services after attack

And the fundamental goal of survivability:


The mission must survive
  • Not any individual component
  • Not even the system itself
  • This concept reminds me of something I heard Guy Kawasaki say about building software start ups. Everyone, he said, focuses on what it takes to get the plane in the air, but not many people focus on what it takes to keep the plane in the air. In reality, the plane should spend much more time in the air than taking off.

    I agree with Howard that this is an emerging discipline. There are numerous issues to consider. I blogged about an interview with Bruce Lindsay that discusses some of them. Not the least of which is the total lack of programming language support for error detection - Bruce Lindsay:

    In fact, there is zero language support for detection. What we see in the languages are facilities for dealing with the error once it has been discovered. Throwing an exception in the language is something the logic of the program does.

    Most of the scripting languages, for example, have very little support at the language and semantic level for dealing with exceptions. And at the end of the day, most of what’s in the languages is stuff that you could have coded yourself.

    There are some fairly dangerous features in languages—in particular, the raise error or throw exception and the handlers. How does that relate to the stack of procedure calls? What we see in some early approaches to language-supported error handling is that the stack is peeled back without doing anything until you find some level in the stack that has declared that it’s interested in handling the particular exception that’s in the error at the moment.

    In general, folding a procedure or subroutine activation—method activation—without cleaning up the mess that may have already been made, the partially completed state transformation of that function, is very dangerous. If there have been memory allocations, for example, and you just peel back the stack entry, those memory allocations are likely not to be undone.

    So it’s very important that at every level of the procedure activation, those procedures be given a chance to fold up their tent neatly, even if they can’t deal with the exception.

    Programming languages and frameworks need to be built to support the non-functional qualities of a system not just the functional ones.

    My Photo