Tuesday, December 2, 2008

Un-fucking-believable

The potential evacuation of Kiwis trapped by the protests was thrown into disarray after the air force's two Boeing 757s were declared out of action.

I have no words.

"I am sure the Government is trying to do its best, but I was rather surprised to hear that there are no contingency plans," Mr Goff said.

Shut up labour boy. I voted for you, but there is no way you slime your way out of this by blaming the newbies. Contingency plans should have been there a long time ago. Some time in, say, the last 9 years.

Somewhere, there's a flight engineer banging his head against a fuselage going "I FUCKING TOLD THEM AGAIN AND AGAIN AND AGAIN..". He will go home and rant to his poor, long suffering wife, and not a single one of the morons who let this happen will ever be held to account.

Time and testing

A common problem encountered when creating test suites against stateful libraries (particularly database-backed ones) is the presence of time and time dependant states and actions.

In general, this gets even nastier if you're using NOW() or similar in your SQL statements. So, a few pointers:

Where practical, supply the time to the query specifically. I know it's annoying, but there are a few good reasons for this:

1. Supplying it means that a sequence of actions in a transaction will all get the exact same time, which helps if you're trying to reconstruct things later on.
2. PostgreSQL specific: NOW() is an expensive operation, and *it is not cached*. That is, if you put NOW() in a subselect, it will execute once for every single execution of the subselect. This sucks performance wise and can crush your query if you've got a big dataset.

If you need to use a database call for the current time (or just want to, sometimes it's more elegant, in defaults etc), I recommend using a stored procedure as a wrapper for NOW(). This way, when you're building up your database for testing, you can replace the wrapper stored function with one that returns a fixed time, allowing you to shift time around as you see fit without mucking with the real system time (which is always painful, especially if it's your desktop).

Isolating time like this allows you to test the full gamut of scenarios within your library and ensure that everything will happen as expected in the future - a situation all too many test suites fail to take account of.

Monday, December 1, 2008

I hate TV

Seriously, the absolute last thing they should be doing is putting that kid on TV. Quite frankly, all it's going to do is encourage all the other script kiddies. Despite the constant assertion of clueless TV reporters to the contrary, creating and managing botnets is neither difficult nor a sign of some kind of amazing insight into information technology. The software to do it is readily available and barely more difficult than installing itunes, just about any teenager who spends all their time in front of a computer could do it trivially, let alone people with actual training in the relevant areas.

The only difference between them and this kid is that they don't, because it's *wrong*.

There are people on the internet, many of them suffering from one form of social dysfunction or another, who are unable to empathise with others, and thus are happy to take advantage of them. This is not news, nor is it confined to the internet. Possibly the only news here in fact is that law enforcement managed to catch him. The people who are actually good at this aren't on TV, because catching them behind myriad layers of fakes, crypto and one-way lines of control is extremely difficult to coordinate, with control relays and cutouts in countries across the globe, often in uncooperative jurisdictions and in organisations with no IT staff. Fortunately for all of us he got greedy before he got better at it, and no doubt a money trail provided both better information and more motivation for investigators.

And will you all stop asking when he'll be offered a job? he didn't do anything that would make him more valuable than the risk posed by his clear ethical deficit. It's not like we (the IT profession) don't know how people like him achieve what he does, it'd be like asking someone who performed a standard smash-and-grab when the police were going to hire him for his insight into how it's done. It's a smash-and-grab, the police *know* how it's done, the difficulty is simply that it's not practical to secure everything against it - we rely on the fact that we catch most of them, eventually, and that the remainder of the population has some sense that it is cruel and unfair to do this kind of thing to others.

Personally, I wouldn't hire him over most half-decent coders his age - at least with the others there's a reasonable chance they won't think they're clever trying to install backdoors in your systems when you're not looking. Experience suggests it takes these fools another ten years minimum before they grow up and start to understand the kind of impact their idiocy has.

SHDH coming up

New SHDH coming up on the 7th. Yours truly will be doing a workshop on improving your website for mobile browsers. The idea, basically, is that the majority of websites (that have a reasonably CSS-heavy design element at least) tend to display fairly poorly on browsers like mobile safari - they're there, you can read them, but they're not simple. The addition of a mobile stylesheet is often all that's necessary to dramatically improve the usability, especially where touch-screen devices are concerned.

The workshop will involve a short rundown on strategies to achieve improvements, and then a collaborative attempt to improve some sites. There will be no mocking of existing designs so you should bring a laptop with a checkout of your site on it so you can play.

The motivation for this is mostly selfish - I have a mobile browser and I'd love it if more websites sorted their CSS out to make my life easier.

Other secrets

Occasionally within web applications we have to generate secrets that aren't passwords. While I've covered passwords in general before, these secrets often have a different context.

A classic example is a coupon code. A coupon code is distinctly different from a user password:

1. It is normally one-shot, and often linked in an email, so remembering it isn't such a big deal
2. It rarely has a "key", in the sense that if your keyspace for your coupon code is 100,000, and you have 10,000 coupon codes active, an attacker only needs to guess 10 times on average to hit the jackpot - they don't need the "user name" to go with it.

In this case, you always want a bigger keyspace than a regular password. In addition, you want something that works well when printed and, of course, doesn't contain any naughty words.

One of the simplest ways to make this happen is the following:

1. 12 characters, all upper case
2. Remove confusing characters, I, L, J, 1, 0, O, U, V, 5, S from the list.

This gives you nice readable characters with a space 95,428,956,661,682,176 or so in size. Then, to get rid of all the naughty words, a trivial trick:

3. Remove all the remaining vowels

You can't make dodgy words without vowels. Not ones people can reasonably take offense to anyway. It's simple, and avoids having big long useless blacklists.

And out of it, you get:

HF8DDHNRRPKQ

If you're sending this in an email and it's likely to be a phone-in, remember to give a phonetic representation as well (Hotel-Foxtrot-8-Delta-Delta etc etc). This saves your users coming up with embarrassing phonetics of their own.

Saturday, November 29, 2008

Passwords

Ok, so, there you are designing your web application and you've done the sensible thing, which is to simply pick up an off-the-shelf registration/user management system and plug it in right?

No, of course you didn't. Nobody I know does that. We all like reinventing that wheel too much.

Ok, that's a bit harsh. The truth is that user management isn't something that's just..fixed. A well designed application customises the security to the purpose and data.

A bank application really deserves decent passwords and 2-factor authentication (don't you dare get excited about the fact that I used that phrase). On the other hand, your michelin.com account tracking the size of your tires..well...if someone breaks in, all they're going to discover is that I don't own a car.

So the first thing you're going to do when thinking about your new app is deciding what level of security is really required. This influences a number of things:

1. Do you let users set their own passwords?
1a. What level of complexity do you require from user-provided passwords
2. How do you handle password resets?
3. Do you need password-authorised actions?

In this case I'm just going to talk about 1. People seem to swing back and forth on this one, from aggressively user-provided (User gives password on signup, reset lets them set new password) to mid (User might give password on signup, reset provides new auto-generated password), to aggressively not-user-provided (Signup generates password, reset provides new generated password, user cannot set password, only reset).

There are arguments for each approach, but in general unless you have specific security requirements you should think twice before taking control out of the users hands. The problem is that if the user can't use their usual junk password on your site, they're going to forget what it was. If your site isn't essential to them, they probably won't bother going through the reset process, so you'll be securely protecting a complete lack of interesting data.

What kind of situations would get you to reject user control? the simplest scenario is one where users spend a lot of time in a "compromised domain". That is, for example, an intranet web server where everyone uses the same username and password. If the users don't have any other sites they use, there is an excellent chance all of them will use the same shared password, leading to a miserable failure in isolating identities on the new service you're building for them.

If you find yourself having to do this, for gods sake remember that people need to be able to *remember* these passwords. 8is62jks0-_ is fantastically hard to brute-force or guess, but it's also stupidly hard to remember. Peoples minds do not take that kind of thing into storage well in general.

It's remarkably simple to generate passwords that are easy to remember. The most trivial system is this: follow every random consonant with a random vowel. Try it, pretty much every time you get something you can say to yourself in your head. The consontant-vowel formulation is easy to promounce (even when it's nonsense) and the rhythm makes it simple to remember.

There are downsides, the password becomes rather more predictable, but it has a couple of weird benefits as well.

In English, the consonant-vowel formula is associated strongly with baby talk. That is, despite it being so easy to remember, it is quite rare that you get a valid word out of it at any lengths beyond about 4 characters. Think of any swear word - almost none of them will ever be generated by this scheme.

A common source of confusion in passwords are the letters i and j, or 1 and l. Again this scheme completely dodges the bullet by virtue of context - if it's an even-numbered character it's going to be a vowel. The user knows this at a level below consciousness, they will automatically select the correct character most of the time (not that it's bad to remove these for safeties sake if you can handle the decrease in keyspace)

Finally, the pronounceable part means that it's easy to tell someone over the phone. While I highly recommend including phonetic representations of any passwords that are likely to be phoned out as a matter of courtesy, it still helps when someone is trying to tell someone on their cellphone from the supermarket.

It turns out that the numerical weakness isn't as scary as it would seem at first. It's not anything like as good as a truly random password of decent length, but remember that it is still random - and this makes it a significant step up on normal user passwords that are almost always derived from something.

If we decide to take "A-Za-z0-9_-+ ,." as a reasonably secure character space, with a minimum length of 6, we get a minimum keyspace of 98,867,482,624. That's a hell of a lot, especially since your secure application will start rejecting login attempts at 6 failures a day and you've salted all your stored passwords (right?). If you take a length of 8 for the scheme above, you get 121,550,625. That's 813 times more guessable than the best case, but it's pretty equivalent to the common space of 6 chars a-z (191mil).

What this means is that it's still perfectly secure against anything except the hash of the password being stolen, and quite frankly you might be screwed if that happens regardless.

Bruce Schneier talks about password selection here. He indicates that with a standard guesser, 24% of user-provided passwords can be retrieved within 100,000 guesses. Even if the guesser was designed specifically for the algorithm above, there is no pattern that will enable that kind of thing. The first 100k guesses will only get a tiny fraction (0.08%) of the passwords in the db (if any). Conversely, the first 121,550,625 guesses will get every single password - you get a better baseline level of resistance at the cost of losing everything if someone is determined enough.

Assuming you're using a decently difficult hash algorithm and that you've salted it decently it's going to take almost anyone a pretty decent amount of time to get through that lot but in the end they probably can if they have your database and they want it enough. Assuming the value of the data is reasonably low however, your real concern is the vulnerability at the first N attempts, where N goes from 6 (casual attempt to break in) to 2200 (spends all year trying to get in via the website). In this case, the random babytalk method is considerably more secure than letting users select their own passwords (go go 'password1').

I guess I'm not particularly recommending this method, so much as saying that if you need to generate passwords, pay attention to making it memorable (so people don't sticky it to their monitor), and this is a simple way that isn't likely to be the weak link in the security chain.

Friday, November 28, 2008

Dealing with deleted users

Someone happened to ask about this in relation to a project of mine so I thought I'd share a few notes.

When writing any user based app, one of the problems that is often missed is "what do we do about deleted users?". The reason is mostly because we're all too busy focusing on trying to create users, and in general most of us have dozens of accounts around the 'net that are still there and we haven't deleted, so we're not that concerned about it.

When it does become an issue is when you actually *need* to delete a user for some reason - they're abusive, or it's a screwed up account or they've demanded to be removed, or even that reports are being generated and they're contributing to useless noise.

The typical thought pattern is:

1. We'll delete the user record
2. Oh shit, there's like a hundred other tables with FK dependencies on the user table, erm, how about a deleted flag?
3. Ack! deleted flag works but now new users can't be created with the username because of unique constraint!, lets add "(deleted)" to the end of the username!
4. ACK! can't delete username a second time because then we get conflict with unique constraint again..erm..lets do (deleted 1), (deleted 2) etc.
5. SON OF A B***H I have to undelete a user oh woe oh woe hack hack hack.

Right. So, if you've already made it to stage 5 you're screwed, you've done the work, you'll have to live with it. But how to avoid this nonsense in the future?

The essence of the problem is that we tend to create a single table for a user. We are overloading this single table with a number of roles.

1. Unique authentication credentials
2. Ownership
3. Identity

Authentication must be unique, always. Moreover, a deleted user should not have any authentication at all, it's gone, no-loggy-inny.

Ownership must remain, as long as any entity within the system has a claim to it. If you have any kind of historical data or a forum or some other conversation that would make no sense if one of the participants mysteriously disappeared, you cannot remove the ownership relation.

Identity depends a bit on the application. In some rare circumstances it might need to go away when deleted (Owned items may be removed, or simply resolve as "Anonymous"). In most cases, it needs to hang around, at least in a limited form, with a flag marking it as inactive.

Now that we understand the roles, we can design a table layout that properly resolves these issues.

In most cases, Ownership and Identity can be combined together. The target of ownership is almost always going to be the users identity. Authentication on the other hand should be separated out to avoid all the uniqueness nightmares.

CREATE TABLE authentication (username VARCHAR PRIMARY KEY, password CHAR(32), identity_id INT REFERENCES identity (id));

CREATE TABLE identity (id INT PRIMARY KEY, email VARCHAR, display_name VARCHAR ...);

You can either place a flag explicitly in identity for is_deleted, or alternatively simply create a view which checks for a null on join with authentication to see whether the identity has authentication ability.

In either case, deleting a user simply involves removing the entry from the authentication table. The identity entry almost certainly remains intact, providing the historical reference for forum posts, logs etc. In the event that you need to restore the user, you can do so by restoring their authentication entry with a new username and password, without having to touch the identity details. An event log of the delete or storage fields within identity could be used to determine the previous username but in the event the username isn't available it's easy to manipulate it in some fashion to one that is.

In the final analysis, this is a perfect example of why it pays to think about the true purpose of tables and their roles, rather than simply following what everyone else is always doing - the one-table-per-user thing is endemic, so much so that it rarely occurs to any of us that it's simply a bad idea.