Some security topics require more than a tweet, to that end today on Security > 140 we we talk with T.Rob Wyatt (@tdotrob) who is an independent consultant specializing in security of the IBM MQ Messaging family of products. He enjoys being a father and grandfather now and then. In his spare time he does pretty much exactly the same things he does on the job, which is a windfall for his clients but somewhat distressing for his family who occasionally bust down his office door to stage an intervention. He blogs about MQ security at https://t-rob.net where you can also find links to his presentations and articles.
Gunnar:You’ve spent a lot of years on middleware security with things like MQ Series. We always hear that security is supposed to be risk focused, but the majority of security budgets go to things like network controls and endpoints. These are some of the least valuable parts of the enterprise. Meanwhile messaging systems like MQ literally run the most critical transactions, the most critical data and applications. Yet these middleware systems get short shrift, no way they even get 10% of the focus, budget and effort of perimeters and endpoints, right? Why is that and how do you cope with that?
T. Rob: There are some concrete steps to cope with it that have been effective. The "why" part is a bit speculative but I'll take a shot.
People who work on Internet-facing servers, web admins, firewall admins, all the folks who deal with the perimeter, know all too well their technologies live in a hostile environment. Their public-facing servers are under constant attack and the technologies to deal with that are well known and mature. When we decide the internal network is also hostile, applying the familiar controls is not difficult. The delta in administrative overhead can be so small that in some cases it is easier to apply strong controls everywhere than to manage multiple sets of baseline configurations.
But middleware administrators ran their networks wide open for many years while web technologies were maturing. In the case of IBM MQ (formerly WebSphere MQ, formerly MQSeries) the product has had TLS connections for many years but only recently have users been implementing it in earnest. With little demand and even less actual implementation, there was almost no feedback to IBM as to what worked well, what needed fixing, and what was missing.
The perimeter admins can show logs and other evidence of constant threat. Meanwhile, there have not been any publicly confirmed breaches of MQ. We've had confirmed interceptions of data in transit on the intranet but even so it is harder to convince anyone of the threat to middleware on the intranet than to the perimeter.
The "what do you do" bit is a bit easier and more optimistic. When I joined IBM in 2006 they were gracious enough to let me write articles and speak at conferences with a message that wasn't entirely flattering: MQ is delivered wide open by default, it's tough to lock down, and you (the users) aren't doing it right. I also have written and delivered education sessions for auditors and PCI assessors to let them know just how vulnerable many of these middleware networks are and how to properly assess them.
I've been beating that drum for almost 10 years now and it's moved the needle. Enough MQ customers began enabling security that many of the latent potholes were mapped and repaired. For example, I used to tell customers to stop making everyone an MQ admin and also to enable and monitor authorization events. The first customer to actually do so immediately thought they were under attack when they started seeing tens of thousands of authorization errors a day. It turned out that one MQ system queue could only be displayed by an admin. When any low-privileged user asked MQ to enumerate the queue names - in other words they opened MQ Explorer to the Queues screen - it threw the auths error.
In a network with hundreds of users and MQ nodes that result is obvious in hindsight once you know how it works, but literally nobody had ever done that before. Thankfully we seem to be well past that stage now and it is much more common when I go to a new client to see the network running TLS channels and the connections authenticated. One of my conference presentations from MQTC last week reflects that progress. It is called "Beyond Intrusion Prevention" and asks MQ admins to consider intrusion detection, mitigation, recovery, and forensic analysis. (http://t-rob.net/links)
Now that more customers are attempting to use MQ's security features for routine intranet links and IBM has lots of real-world feedback, momentum has been building. The last three releases of MQ have featured major security enhancements, including that it is now shipped secure by default. The MQ admin must explicitly provision *any* remote access, and a higher level of provisioning is required if that access is to be administrative. As you noted, much of the world's critical infrastructure runs on MQ so this is a major win for, well, everyone.
GP: A big challenge for organizations is something I call “its perfect or its broken” mindset. If we think about the the externally facing DMZ, that usually represents about the best that an organization can do in terms of security. But many places have a hard time pivoting to what - “ok what does my ‘internal’ security model look like?” In other words, if I cannot have all the dials turned up to 11 like I do on the DMZ do I even have a security model. As you say, it used to be a total after thought, but its started to get better. I think one trap is to avoid that perfect or broken binary mindset, I noted that one of the goals in some of your design patterns is ‘containing the blast radius’, can you describe that a bit and how it applies to middleware specifically?
T. Rob: I don't run into the perfect-or-broken mindset quite so much but again it may be because the middleware crowd haven't considered the intranet a hostile environment. A client once said to me with a straight face "It's the trusted internal network. That's why we call it that." Normally I have the opposite problem and have to overcome a belief that security ramps up as a function of the amount of effort expended on it rather than as a function of having addressed a specific set of threats. Case in point - MQ networks today commonly have TLS configured on their channels but do not authenticate the certificates. If the customer uses the same CA that I do, sometimes I demo the problem by connecting to their MQ node using my own personal certificate. Whoops.
Because of this you may hear me from time to time telling audiences that having a little security is like being a little pregnant. If people compared my advice with yours they might conclude we were saying opposite things. I think we are actually nudging our respective communities toward the same goal, but working with communities of practice who come at the problem from very different starting points.
The blast radius analogy is useful because it is memorable and perfect for the principle I use it to illustrate. I included one slide showing instances where we use controlled explosions for specific useful purposes - nail gun, internal combustion engine, fireworks - and another slide with an X-ray of my own hand after an explosion took a finger off of it. If we think of a breach as having a "blast radius" then enabling the controls already in the product that might contain the blast seems obvious.
That slide deck makes the argument to the middleware community that perimeter security, which many shops have now spent some time on, is not the end of our journey. I want to set the expectation that there's more work to be done so I ask the audience to consider blast containment and several other topics posed as questions.
If your MQ network was breached what is the probability that:
* The incident would be detected by the monitoring in place today?
* The controls in place today would prevent the breach from compromising adjacent nodes?
* There is an effective security incident plan in place and the key players are prepared to execute it?
* The scope and impact of the breach could be reliably, accurately and quickly identified?
* Affected business applications could be resumed quickly and safely?
* You would have sufficient forensic data to perform post-breach analysis to successfully identify the source?
Many shops now routinely use TLS to protect the MQ network from unauthorized connections but continue to allow administrative access from all adjacent legitimate nodes. Compromising one node on such a network grants administrative access to the entire network. Configuring an MQ queue manager to refuse administrative commands from an adjacent node is quite easy to do and is an effective means of "containing the blast radius." Rather than treat this as an afterthought, for which they need to spin up a project and perhaps get funding, I want MQ admins to consider it early and include it in their security baseline configuration for MQ.
I spend the most time in the presentation on breach mitigation because it is closely related to perimeter security MQ admins are familiar with and an area where we are likely to be able to make a lot of progress quickly. But I also queue up several other security topics I hope will stay on the radar.
Obviously, some of the questions I pose describe goals that would be challenging for administrators of any technology and I don't pretend we have mature solutions for them in MQ. My goal here is simply to get people thinking about how they might accomplish some of these goals and which of the existing controls might be used to do that. Once we start that conversation as a community and start playing with the tools we have today we'll be able to see better where the gaps are and work with IBM to address them.
As an example, the more the community discussed archiving error logs over the years, the more functionality IBM provided. We've gone from patchy support for tuning log sizes to formally defined, consistent controls over the last few releases. Enough people now use this functionality that there's demand for more fully realized log management. At last week's conference IBM's product architect mentioned they are already working on more log enhancements for the next release. Momentum is great.
GP: Middleware systems are in place because they need to bridge environments. When you try to implement access control in them, you typically have to deal with minimum 2-3 different security domains and credential types. This is one factor that’s led people to punt to a lowest common denominator “Well the mainframe supports 6 character password so I guess that’s what I’ll use on the front end web server” kind of thing. What’s a good way to find the highest common denominator for the service requesters and service providers?
T. Rob: Try to contain your laughter to a dull roar when I tell you that I have no freaking idea. I have been working with IBM MQ and its family of related middleware products exclusively for twenty years. Almost everything I know about security was learned in that context.
So what is that context? The base product has no message hashing or integrity assurance features. The message context, which includes a user ID and a few other fields, is used to pass identity around, but the thing consuming the message has to trust the provenance. There's an add-on product called Advanced Message Security or AMS for short that provides signing and encryption of messages, but even this product doesn't provide assurance of identity carried in the message headers. The identity assured by AMS is carried in the payload's AMS signature which has no relation whatsoever to the value of the ID in the message header.
So you see, it isn't just MQ users who have concentrated on the perimeter. The security model of the product itself is that of a hard crunchy perimeter with a soft gooey center - what I like to call the Catbox Crispy Security Model. We trust the message headers because we have to, and we trust the perimeter security to keep the headers trustworthy. If the assumption of the strong MQ perimeter is bad, then all bets are off. That's the soft gooey center.
From a pure MQ standpoint the community is very pleased with the progress in the product version-to-version. It really does keep getting better, and new security features arrive generally at a pace faster than the MQ community is prepared to adopt. Within our little island, MQ security is the best it's ever been and it's all very well received. However from the outside looking in, and especially when someone who administers an Internet-facing technology is doing the looking, this all seems very rudimentary.
It's not just MQ, by the way. The JMS specification makes provision for identity fields but none for controls to assure the integrity of those fields. There is no JMS-defined class of service, for example, that signs a message's headers. From a middleware design standpoint, identity and policy are related to the connection and not the message. There is some attempt to connect the two but not with assurance or auditability.
There are of course several IBM middleware technologies. MQ Extended Reach implements the MQTT protocol. MQ Light implements AMQP. WebSphere App Server has an all-Java implementation of JMS that interoperates with MQ. Integration Bus is the Universal Translator of the bunch and can communicate in a huge number of protocols. But there is no common identity or policy repository across these platforms. If you want to propagate identity through the middleware, you will need to do it yourself and most likely do it in the message payload.
To be fair, as of v8.0 which was released in June of last year, MQ will validate a user ID and password provided to it in a connection request, and can do that against the local operating system, LDAP or Active Directory. If MQ authenticates against the same identity repository as, say, the web application server, then we at least have the same identity repository. We still do not have assurance that the value in the message header identifies the entity who put the message, or that it hasn't been changed, and there is no formal mechanism for reliable identity propagation and no common repository for authorization and policy. But ID and password validation is progress and I'll take it happily.
All of which makes your question absurdly easy to answer in an MQ context, although perhaps not terribly comforting. The highest common denominator for identity propagation is that the user's ID and password can be validated against Active Directory or LDAP. Assuming that's where the Enterprise accounts are held, and that the perimeter security is sufficient to assure integrity of the ID in the message header end to end, then we have a winner.
Meanwhile, the highest common denominator for authorization policy is zero. MQ access control entries are managed by MQ's Object Authority Manager. These are not even common across an MQ network. The administrator is obliged to define access control entries on a per-queue manager basis, even when those objects share a common namespace in the MQ cluster.
As you can imagine, I stay quite busy trying to show my clients how to manage this at scale. I suppose I could stop trying so hard to get new features implemented and just enjoy the work, but I'm the one who has to explain all this and then I'm the one who gets beat up over it. The steady work is great but I'd like it a lot more without the beatings.
GP: For the companies running TLS, do they usually have an internal CA setup, or do they do those custom for the MQ project? Do you run into any certificate lifecycle management challenges when you go operational?
T. Rob: OK, now you are intentionally provoking me, right? Cert management is the biggest single issue in MQ-land that I see. My recent conference presentations are on certificate management in one way or another and I wrote them as a response to this specific issue.
The latest one is about the use of commercially signed certificates with MQ. Most CAs have tutorials for a plethora of different types of servers but nothing specifically about MQ. Even an admin who manages certs for other technologies won't find much useful guidance for MQ. That little validation step browsers do to check the URL against the CN and SAN? MQ doesn't do it. So what *are* the requirements?
My "Managing CA Certs for MQ" presentation explains the different types of validation (any domain validated cert will do) and granularity options (don't bother with wildcards or UCC for MQ). At the end I also provide a step-by-step implementation procedure to cut down drastically on setup issues.
The most popular presentation though is "How To Build Your Own Certificate Management Center of Mediocrity." The title is a play on the too common approach to security of implementing it gradually across the environment rather than implementing comprehensive security models against smaller subsets of the estate. Rather than aspire to "Center of Excellence" level of security management, there's an assumption that it can be ramped up gradually, and that the effective level of security ramps up proportionally.
Unfortunately, the level of effective security achieved doesn't ramp up proportional to effort and for a variety of reasons. Obviously, leaving out critical controls is one issue, but "anti-lock brake syndrome" plays a big part too. When anti-lock brakes were first introduced they did not result in the crash reduction that had been forecast because people who had them felt safe enough to reduce their driving safety margins and engaged in riskier behavior.
Something similar can happen in shops after the MQ security implementation is complete. At that point it is assumed to be secure enough for any data, including PII, PHI, PCI, etc. But on inspection it is common to find applications and users running with full admin rights, little or no application isolation, the ability for legitimate users and apps to spoof one another's IDs and even anonymous remote administration.
A strong indicator of trouble is when that event is referred to as "*the* security implementation" as in "oh, we addressed that during the MQ security project." The wording reflects an understanding that this was a once-and-done event.
The How To Build Your Own Certificate Management Center of Mediocrity presentation is basically a retelling of some of the worst horror stories. The one I always tell is the large credit union where I asked whether they use CA signed or self-signed certs. The admin told me self-signed so I started explaining how it works. When he interrupted me to explain he needed signing requests to send to their internal CA we paused while I explained what self-signed means.
When I asked how long it takes to turn around a signing request I was told it just takes a few minutes but we'd have to wait until "the guy" came back from lunch.
"The guy," I asked? "This is the internal CA that supplies ALL certificates used by databases, app servers, backup jobs, Active Directory credentials, and that the business is literally betting its continued existence on and it all depends on one person?"
"No, there are about 5 people who can run the internal CA. But he has the laptop with the root cert with him."
Great. No doubt it's in the food court at the mall right about now and logged onto the open Wi-Fi.
All my presentations are linked from the same index page, by the way:
https://t-rob.net/links/
Over the years I've pitched the idea of lightweight certificate management to IBM, including during my stint as an MQ Product Manager, and to various vendors in the MQ 3rd party ecosystem. None have taken me up on it so now I have a business partner writing the code, I'm doing specification and design and we are using Avada's IR-360 as the integration platform. We do not intend to provide a complete PKI, but we will manage the public certificates centrally, manage the personal certs in situ, and provide expiry reports and lifecycle management. While we are at it we will go a bit outside the cert management space and monitor MQ configuration settings using IR-360's native MQ management and reporting capabilities.
So, yeah, I run into a LOT of cert lifecycle management issues. The tools available are either Enterprise-grade PKI or stone knives and bear skins. In between is a barren wasteland right now but I believe that with a decent product in that space we would see a lot more MQ security deployed in the field. My goal is to make bring the cost down so low that people will secure the non-Prod environments.
GP: You say your goal is to see security applied in non-Prod environments. How do you see thins integrating into the Software Development Lifecycle?
T. Rob: Let me put it in perspective of how we develop business software applications today. Typically the environments include some version of Development, Integration Test, User Acceptance Testing, QA, Performance Test, and eventually Production. The idea is to systematically identify and resolve software defects through a well-defined process that minimizes risk of a variety of negative impacts. Business application software is revenue producing and customer-facing so this rigorous and disciplined process is a no-brainer.
Now consider how we approach the infrastructure management, monitoring and security tools. There is no question that these are required in Production so we put them there. But they are not revenue producing and not customer facing so the non-Production environments have less risk and we tend to deploy fewer tools the further we get from Prod. Many shops deploy infrastructure and security tools only in Prod. To the extent we do deploy into non-Prod, it tends to look like an extremely shallow inverted pyramid. There are many consequences of this but there are two in particular with which I am concerned.
The first of these, and we've all seen it, is that the highly visible project deployment fails due to security errors. Rather than back it out, we disable the security controls over which it is tripping and deploy. There are many things we are willing to miss a deployment date over but security isn't one of them. This is never more true than with the highly visible, mission-critical projects that ironically are the ones we really are betting the business on.
If we currently have no certificates in Dev then adding them reduces our risk, even if we automate the management and use self-signed certificates in that environment. I believe that if the incremental cost of a single certificate at a single queue manager approached zero, it would be hard to argue for not using them. I want security on in all environments, beginning with Dev.
The second consequence of this Prod-down approach to security is a little more abstract but a lot more significant. The infrastructure management and security tools are not revenue-producing applications, but they *are* applications. Because they ensure the integrity and availability of the environments that run the business applications, they are just as critical as the most critical of the business applications. So if the first time the MQ admin turns on TLS channels, CHLAUTH rules, cluster security, or granular authorization is in Production then there has been no opportunity to test these things beforehand.
The same business case that says we can't afford monitoring agents, certificates and granular authorization in non-Prod environments also guarantees that any defects with our implementation or configuration of these things occurs in the Prod environments, and they are not trivial to administer.
The Prod-down strategy is a deliberate choice to bypass the rigor and discipline of the software development lifecycle and bear all the risk of infrastructure and security defects in the one environment where any defect is customer-facing, reputation-risking, cash hemorrhaging, and potentially business-ending.
Described in these terms and the average person wonders why security isn't turned on in all environments all the time. It takes a special kind of logic to justify not doing so and when we write that logic down we call it a business case. Without a revenue stream, there is nothing against which to offset infrastructure and security expenditures. They are perceived almost entirely as "stuff that reduces profit" and deployed with great reluctance. My goal of getting the incremental cost of a single cert on a single queue manager down to near zero is my attempt to address this problem head-on.
"I want to move the risk of infrastructure and security defects out of the one environment where any defect is customer-facing, reputation-risking, cash hemorrhaging, and potentially business-ending."
"We do that today? Seriously? Never mind. What's that going to cost?"
"With the degree of automation I want to deploy, there's a year-over-year operational cost per certificate of bit more than zero."
"I'm sorry, how many zeros?"
"Just the one."
Hopefully that business case is a bit easier to make than what we do today. Of course automation doesn't relieve anyone of the responsibilities of a competent security architect. But if we offload the bulk of the grunt work to automation, that person is freed up to design and update the security model, manage the policies and validate that everything is working the way it should. For example, if you need to update a signer certificate across a few thousand endpoints doing it by hand is time consuming, error prone, and expensive. I propose not only to automate that, but to do so in Dev so the business case to deploy certificates there becomes a slam dunk.