Daily Archives: 27 June, 2014

Designing Software

Software design is hard.

I’m going to frame this in the context of developing a game, but honestly…. this is about all software projects.

The easiest way to do something is only sometimes the right way to do it. That’s the hard lesson. That’s the thing that you don’t immediately realize when you sit down to write code and you’re working on one bit of something that is eventually going to have to be a large, complex program.

Because you can write the bit. You can make something work. And then you can realize that you’ve made a wrong choice – perhaps because of some constraints of a data structure you’re using, the bit you’ve just written has made something else with a conflicting requirement very hard to write, or forced it to be written in a very inefficient way.

This was what happened to my first effort to implement a roguelike game. It was a few years ago; I adopted it as a kind of project to make myself intimately familiar with the C standard libraries and, yes, software design in C. I had written C code before, and actually like the language because of its old-fashioned hardcore-ness. It’s very useful to know as the fundamental language of most unix utilities, as the effective ABI for writing anything that interfaces with Unix, and as a language that’s so simple, unconstrained, and formless that it allows you to write something in damn well any way it needs to be written – so you can make the structure of the program EXACTLY fit the requirement.

But that last thing? That’s actually a two-edged sword. Because being able to write something in exactly the way it needs to be written implies that you also have the freedom to get it wrong. And doing it wrong, as ever, is easy. That is what software design is supposed to try to save you from. And we rarely realize until we’ve already done something wrong that we really needed to sit down and do the design work.

Here is a chunk of wisdom; The more freedom a programming language or framework gives you, the better you can make something exactly right. But conversely, the same freedom means you have to think hard about how to avoid all the other choices you could make that would result in getting it exactly wrong.

C is a notoriously sharp-edged language, specifically in terms of memory management; you have to do all of your memory management yourself. If you try to refer to something you’ve already freed, you crash. If you try to follow a NULL pointer, you crash. If you follow a pointer without first setting it to point to something you’ve allocated, random things happen, often leading to a crash in an unrelated and non-obvious part of the program. If you fail to free something, memory leaks occur and, eventually, when too much memory has leaked, you crash. If you free something twice, random things happen, usually causing the program to eventually crash in some unrelated and non-obvious part of the code. If you use a pointer to refer to something on the stack, you have to be sure that you don’t use that pointer after the routine that allocated that bit of stack returns – because then random things happen, sometimes but not always causing you to crash, which is way worse than just crashing immediately because it’s harder to figure out what you did wrong. Those … well, those are just the rules. If you do your own memory management, that’s what happens when you fail. C++, and most low-level systems languages, are the same way.

So, every time you allocate something, you’re committing to deallocate that thing in the future, and deallocate it exactly once, and never follow a reference to it ever again.

There isn’t anything inherently hard about this; standard algorithms for all the most often-used dynamic data structures are simple and correct to express in C. But when you’re implementing a game, you are trying to express many interrelationships between dynamically allocated things. And that means that, at least in theory, all these things can be used to refer to each other. The most obvious way to do this is to give all of them pointers to each other. But that way lies madness. I’ll explain why.

You have to be able to find an actor (like a monster, or the player’s character) when the schedule says it’s time for that actor to move. So you give the schedule a pointer to that actor. So you follow the schedule’s link to the actor, and then you have to know where the actor is in order to figure out what move it ought to make. So given the actor you have to be able to find its location. So let’s say you give the actor a pointer to its cell of the map. So, your actor decides to attack the creature next to it? All right, you go to the actor’s location, then you consult the map, and given the map location you have to be able to find the creature standing on that location so you know which one the actor’s attacking. So the map location has another pointer back at an actor that’s standing on it. Awesome. Now, on the following turn, when the actor who’s just been attacked is deciding what to do, maybe it should know exactly who just smacked it, so it can decide whom it ought to aggro at? No problem, you can stash a pointer at its most recent attacker in an actor.

So, you implement this deal…. you get a nice map, a couple competent-seeming monsters, a good combat sequence, and one of your actors kills the other. Awesome, right? You got something working! Now deallocate the dead monster without leaving any references to it hanging around in places that will cause a later crash when you follow the pointer to the dead monster.

Um. Wait. How exactly do you do that? Thats…. going to be complicated, isn’t it?

Now, it’s important to realize two things. The first thing you should realize is that, yes you can do it. You can implement this in a way that works, traversing all your data structures, checking all the places that pointers to that actor could be, and carefully excising them when you delete the allocated structure for the actor. And realizing this first thing but not the second, many a naive designer has gone down the road of building on an initial wrong design, making their own future exponentially more difficult and usually getting armpits-deep into a project that becomes unbelievably hard and makes them give up on it.

But the second thing you should realize is that you should not do it. The complexity of the problem of deallocating this actor is caused by an underlying design flaw that will also make many other things more complex before your project is done. You got too happy about using pointers, and you implemented something in a way that made it easy to do the first part you implemented. But in the process you committed to a design that would make the whole project much harder to complete, update, and maintain, and you don’t ever want to be in the position of completing, updating, and maintaining the final project as it would be if you continued using that design, because doing anything with it would take forever and you’d never be able to find all the bugs.

So you should back up, figure out a better design that doesn’t have this basic problem, and you should instead spend your effort on fixing the design flaw. This means reimplementing all the stuff you just implemented using a different software design. That’s okay. You’ve just discovered that your first implementation was crap anyway, so you shouldn’t be upset that you’re throwing it out. If you’re the sort of person who cannot get your head around the idea that something you have already spent time and effort creating, and were happy with while you were creating it, is now revealed to be crap, then you should not be writing code. In fact, if you’re that sort of person you should not be managing people who write code. This is harsh, but fundamental.

This whole experience should be seen in a different light; You haven’t just produced a crap code design. You’ve hopefully learned something about designing code. As you get more coding experience, and you know more about designing code, you’ll do some design first – often without even thinking about it – in order to avoid producing the crap in the first place.

But no matter how much we learn, sometimes we learn more instead of heading off the production of more crap. We have to keep in mind that both outcomes are good. Learning more about design makes us better coders, and heading off the production of crap saves effort. Another gods-damned learning experience can be frustrating, but recognizing it as a learning experience will allow you the serenity to do the correct thing with the crap you’ve produced. And as with other kinds of crap, once it’s produced, no matter how much effort you put into producing it, or what you learned from producing it, the correct thing to do is still to flush it.

Now that you’ve made the crucial second realization, and learned something about designing code, sit down and really, really think about what your software has to do and the rest of the stuff you know about designing code. You’re writing a game which is essentially a simulation, and it has to manage time, space, creatures, items, and actions/events. So, here are things you need to manage and what managing each of them means.

Time: You must have cold solutions for these 4 problems, because a change in any of them will invalidate most of the code you’ve written before the change.

  1. Determining whose turn it is.
  2. Keeping track of future events and making them happen on schedule.
  3. Handling effects with intervals denoted in absolute time, player turns, or monster turns.
  4. Making sure things still work right when player or monsters can have their speed change or be frozen/reanimated in time.

Space: You must have cold solutions for the first four of these “space” problems. You must have a cold solution for the fifth if your game is ever going to do it. Six and seven should have standard solutions, but can be swapped in and out usually without too much trouble or effect on the rest of your code. And eight is something you should have a standard method and pattern for doing, including adding new ways to do it.

  1. Representation of your map.
  2. Saving and restoration of your map.
  3. Ranged effects, aiming, and obstructions.
  4. Area effects and how they interact with restricted areas.
  5. Player and/or monster gametime alterations of the map.
  6. Planning and pathfinding.
  7. Illumination, field of view, line of sight.
  8. Dungeon generation.

Creatures and Items: You must have the following standard functions that can be applied to items by creatures, and this will influence how you can represent items and creatures. And given the way roguelike-game magic works, creatures are subject to effects from carrying or wielding particular items, so each of these actions has implications that will make changes in the way the game treats individual creatures.

  1. Picking it up.
  2. Putting it down.
  3. Wearing and wielding.
  4. Using as a weapon and/or ammo.
  5. Eating and drinking.
  6. Putting it in a container.
  7. Using it as a container.
  8. Attempting to break it (if you will allow that).
  9. Transforming one thing into another.

So, now it’s time to move on to the second step. Subject to all of those entities and the operations you need to perform on them, what goals are you trying to achieve with your implementation?

  1. Writing code for new item/monster behavior (your infrastructure should have known, standard places already to put that code and known, standard ways to make sure calls to it happen at appropriate moments).
  2. Determining time, mana, fatigue, damage, etc. effects of doing things (and your infrastructure should have known, standard places already to put the values for amounts, and known, standard routines for actually inflicting the effects).
  3. Determining how items and monsters will be represented in live game data (without tangling up your references so pointers to an item or a monster exist at more than one place and one can be left dangling when it is killed/destroyed! And your infrastructure should already have known, standard places and methodologies for such representation, ideally facilitating save/restore without you having to do anything more).
  4. Determining how items and monsters will be represented in saved games (and your infrastructure should handle saving and restoring easily and well, ideally by traversing “standardized” data structures for items automatically).
  5. Okay, that’s a list of basic requirements. The structures you choose to represent the simulated world must allow all of those things your game is going to do. But we’re not done. Moving on to the third step, let’s think for a few minutes about what kind of basic traversals and lookups we will have to do to make all of that stuff work. In other words, how do our basic entity representations (which we identified in the first step) have to be linked together in order to implement our basic operations(which we identified in the second step) in order to achieve our goals (which we identified in our third step)? These are your data design requirements. Being able to do or not do these things will affect what utility your application can have (or in the case of a game, what kind of interactions it can model).

    Here is a list of data design requirements for roguelike games. 1,2,3,4 are universal to all roguelikes. 5 is required if monsters can use passive or reactive items. 6 is required if a game has containers. 7 is required for game designs where outside influences can alter the timing of a creature’s next turn. 8,9 are required if your game has proactive items. 10 is required if there is a way that the timing of proactive items can be interfered with. 11 is required for a game with both proactive items and containers. You’ll recognize the first few of these as the things that the initial (learning experience!) design implemented using pointers.

    1. You must be able to find the monster when it is that monster’s turn to act. (schedule->monster mapping)
    2. You must be able to find the monster when you know only that monster’s location. (location->monster mapping)
    3. You must be able to find an item when you know only that item’s location (location->item mapping).
    4. Monsters cause effects that are local to the monster, which implies that it must be possible to find and affect a location given the monster. (monster->location mapping)
    5. For reactive or passive items which are common in roguelikes, Given a monster, you need to be able to find the things it’s carrying (monster->item mapping, carrier->carried).
    6. If the game has containers, it must be possible to find the contained items given the container (item->item mapping, container->contained).
    7. If the time of a monster’s next turn can be changed by another monster or the player, then it must be possible to access and change the monster’s turn given the monster (monster->turn mapping).
    8. If your game has proactive items, you must be able to access items when it’s the turn of those items to act. (turn->item mapping)
    9. If proactive items do things to or for the monster that’s carrying them you need to be able to find the monster carrying the item given the item (Item->monster mapping, carried->carrier).
    10. If items are proactive, and outside influences can change the time of their next turn, then it must be possible to access the item’s turn given the item (item->turn mapping).
    11. If your game has proactive items and containers, then certain effects such as locating an item or a contained thing having an effect on the container must be able to find the container given the contained item (item->item mapping, contained->container).

    Degenerate (player-only) solutions to 5 or 5,7,8 are still required if the player is the only creature that can use items.

    Good games can be made on the basis of 1,2,3,4, degenerate 5. I think Moria and early Angbands had 1,2,3,4, degenerate 5, 7. A full implementation of 5 allows a quantum leap in game design. Nethack has 1,2,3,4,5,6,7 at least, and may have 8,9,10,11: I’d have to source-dive to know for sure.

    The design problems of 8,9,10 are simplified dramatically if you start with an “actor” type that can be either a monster or an item. That essentially reduces them from four points to two, or possibly even to one.

    So you have to pick your design requirements based on the game you want to write, then find a good way to arrange data so that all of these searches work efficiently and you can still deallocate something without leaving dangling pointers to it lying around somewhere. I don’t think it’s possible without having some redundant information stored somewhere. But you get to pick what redundant information you store and can choose to only store things that don’t cause danger of wild pointer references.

    Anyway what I’ve outlined here is about designing the implementation of a roguelike game, but more broadly it’s about designing software. Here’s my basic sketch of the software data design process.

    First, you figure out what basic kinds of entities your software has to deal with. In our dungeon simulation, that was time, space, creatures, items, and events. For a different application it would be a different set of entities, but that doesn’t change the fact that the first thing to do is figure out what entities you need to deal with.

    Second, you figure out what you need to be able to do with those things. In the game, this is a series of actions such as wear, wield, contain, carry, etc, each of which has implications and effects on the simulation’s future treatment of the entities involved. But no matter what kind of system you’re writing, this is the basic functionality that your application needs to do in order to handle those things competently.

    Third, you figure out your design goals and constraints. You need to implement all of the stuff you listed in the second step, without making it so that it gets particularly hard or painful to also do any of these other things which you’re identifying as important. This is where you list things you want your system to be able to do that may distinguish it from other systems, or may provide good features that the designs others have already committed them to won’t allow them to achieve. Put one way, this is where you outline what competitive advantages you want your system to achieve. Put another way, it is where you list the things you don’t want to prevent yourself from doing in the future. If you have something on this list, a later design decision that doesn’t allow you to achieve it is revealed as a boneheaded mistake (and therefore another learning experience!).

    Finally, having done steps one through three, step four is coming up with your list of design requirements.

    And after all this, we still haven’t said anything about the actual solution; Our step four, or a set of design requirements, is still no list of instructions for achieving them. It’s just a set of things that we have to be able to do in order to get the entities and basic operations implemented without violating our goals and constraints.

    But this is the design process. This is the list of considerations you need to have in front of you when you’re designing the data structures of a new application. This is what you have to be looking over every time you make a decision, and saying, “nope, can’t do that or we blow goal #5”. or “nope, if we do it that way we can’t implement basic operation #4 efficiently”, or “hum, yes, I think that can work…..”

    And it’s the design process that distinguishes the code monkey who works on designs that others create from a software architect who creates designs that he (and often others) then has to work on.

    Props and respect for code monkeys everywhere; without all the work we do in the trenches of implementation we would never become people who can really understand the problems of design. But if you want to be better, if you want to become a designer, you have to look at all the worst crap you’ve ever worked on, even the stuff that you personally designed, and recast every piece of it as a learning experience. Understand where the most horrible designs you’ve ever worked on came from, what the designer had to achieve and what he thought he had to achieve (and often they aren’t the same things at all), and try to figure out what mistakes he really made. Instead of just thinking of how horrible the bit of it you had to work on was, try to think of a design that could have made it better without making a bunch of other stuff worse.

    That … well, that’s like Zen. It’s a path, not a destination. You will never become so good that you can’t learn anything more, unless you first become so arrogant that you won’t learn anything more.