Mark O'Neill is to Web services security as Paul Westerberg is to music in the 1980s, yesterday I posted that Schema Validation is an underutilized security tool because its cheap and easy to distribute validaiton logic. I made the pro case, but of course its never that simple, here is Mark's response Downside of Schemas Validation:
Schemas are indeed useful, but here are some downsides:
- The availability of a schema helps in a plaintext-guessing attack against encrypted data, since an attacker knows what the structure of the unencrypted data is, names of element and attributes, and maybe default values also.
This brings up a good point, I talked about this issue in regard to plaintext schema in XML encryption. So if you rely on anything for validation, you are now putting that on the critical path so it must be protected and customarily schemas may not always be. This creates the issue that schemas must be protected, and relates to schema storage, protection models and a host of other concerns, including:
- Applications which are coded to validate all incoming XML can be diverted to a malicious Schema using the SchemaLocation attribute. The malicious Schema could include very compex checks which would choke a parser. This behavior can be turned off in some platforms, for example here is how it's done in .NET:http://msdn.microsoft.com/en-us/library/ms763691(VS.85).aspx . I should note that Gunnar's code is not vulnerable to this attack since he specifies the Schema in the "schemaFile" variable. But many applications are. It is a neat way to turn a security measure against itself.
For better or worse, the parser and (if used) the schema are now on the front lines. They were never ever designed for network operations, but now they are shunted to and fro and have to deal with Internet threat models. Rots o ruck.
- As Gunnar says, Schema validation is only as good as the Schema itself. Most Schemas are just about the data-types ("this is a string, this is a string, this is also a string", etc). That is not useful for security purposes.
And many schemas don't even do proper enforcement, but my point is that its still a cost effective place to reduce attack surface.
- Schemas define what should be in an XML document, but are not useful for defining what should not be in an XML document. That is where threat scanning for attack signatures (e.g. SQL Injection) comes in.
Schema validation is not a complete solution. My $0.02 is that its best for types and structures, but not so easy at semantics (even though its technically possible), so yeah, there will still be other validation to be done.
- Schema validation does not apply to RPC-encoded SOAP messages (partly because type information is included in each element that appears within the SOAP message). Unfortunately, RPC-encoded SOAP still exists in the wild. However, here at Vordel we do the seemingly impossible in our XML Gateway: allowing RPC-encoded SOAP messages to be validated against Schemas. If you need to validate RPC-encoded SOAP messages, check out the Vordel XML Gateway.- And the biggie: Performance. Without an XML Acceleration system such as VXA in place, Schema validation can add significant latency to message throughput. In fact, that is one of the big reasons why Schema Validation is often skipped in Java and .NET apps (a bad idea!).
I'm not discouraging Schema usage, just saying that some caveats have to be kept in mind.
Standards are great, but they only get us so far. The software development and security communities should not expect to turn the global software architecture inside out with no repercussions and no added security capabilities. Publishing legacy data and functionality on the web through web services without revisiting the threat model, is a recipe for vulnerabilities. There are standards to help mitigate some of these issues, but they only go so far, we shouldn't be surprised that a plain vanilla naive SOAP stack is not sufficient to protect a mainframe hooked up to the web. Instead we need security mediators to build, enforce and manage our 21st century security containers, rules and strategies.
So, for known-plaintext attacks against the encrypted data XML is already well-formed with lots of <> in it. If what you're trying to do is validate data using keys for any reasonable encryption algorithm, I'm not sure that knowing the schema really impacts the crypto strength. If you pick the right key you're going to be awfully clear that you got it right when you op an XML doc out the other side, schema correct or not, right?
Not being a crypto expert take the above with a grain of salt, or perhaps an even larger quantity.
Posted by: Andy steingruebl | July 08, 2009 at 11:12 AM
Let me go two steps further than Andy here. (1) If schema validation causes a cryptosystem weakness to be exposed, then Mark is absolutely correct. You should drop everything else security and upgrade to a cryptosystem designed in the last 20 years.
(2) If you're using XML+a schema, then there's lots of known plaintext. I'm not sure what the problem is.
I'd comment on Mark's blog, but it's locked.
Posted by: Adam | July 08, 2009 at 12:40 PM
Crypto-wise, there are two different things here:
1) The brute-force approach of trying many keys and looking for "interesting" content in the resulting plaintext. You're right that XML has characteristic content (angle brackets, etc) which makes detection of an "interesting" match simple, even if you don't have the Schema.
2) More sophisticated cryptanalysis attacks based on knowledge of part of the plaintext. This is where the Schema could be more useful. You could use a Schema to create a relatively small number of possible plaintext results. If you had access to many encrypted "specimens", perhaps one would be match for one of the guessed plaintext examples.
All quite theoretical. But in cryptanalysis, anything which can narrow down the problem set is useful.
Posted by: Mark O Neill | July 08, 2009 at 12:52 PM
Mark, I'm sorry, but:
1) If your cryptosystem is going to fall to brute-force attacks before the sun blows up, you're doing it wrong.
2) If your cryptosystem is substantially impacted by known-plaintext, you're doing it wrong. Current cryptosystems are subjected to a stronger set of attacks, like adaptive chosen plaintext attacks. That's a stronger attack--you get to choose the plaintext and vary it to learn more faster.
The rest of your points are very good. But here I think you need to dig into modern crypto more deeply.
Posted by: Adam | July 08, 2009 at 09:31 PM
So re-reading my comment, let me say: I'm doing it wrong.
I should have taken the time to read up on modern XML crypto standards, and made a positive suggestion rather than saying "You're wrong." I apologize for that.
If your cryptosystem is AES (or built on top of AES), then you're not vulnerable to brute-force or known-plaintext attacks. If your cryptosystem is not built on AES, you should look very carefully at why, and probably plan to upgrade.
Posted by: Adam | July 09, 2009 at 04:41 PM