[2008-12-26] One is the most important number
I'm sure any half-decent software developer has had this drilled into their minds many times over, but perhaps it's worth repeating (otherwise, feel free to skip this post): don't repeat yourself!
Now, I'm probably a statistical outlier in that most of my programming education (and computer education in general) has been of an informal nature: I taught myself. So when trying to get points across to others I tend to refer to my experiences for examples of what can go wrong. So here's how I ended up learning the "don't repeat yourself" rule:
Way back in 2005, I was trying to figure out how I could standardize my PHP applications' layouts on disk. I'd started in early 2004 with a single firm idea: that includes went into a separate subdirectory, from whence they could be referenced by any script the application was comprised of. From there, I got the idea of doing the same for images, stylesheets, and so on. By early 2005, I'd set up a simple template system I still use today, consisting simply of text files with placeholders for variable data. For example, this would be a valid template text:
Hello, {{UserName}}, and welcome to {{ProgName}}! Where do you want to foo today?
The bits between curly brackets are the variables, and they can get filled in with basically any text you want. For instance, set "UserName" to "Narc", and "ProgName" to "CMBlog", and you get:
Hello, Narc, and welcome to CMBlog! Where do you want to foo today?
Simple, clean, and surprisingly fast, at least in my use case.
I liked this so much that I decided to use it in all my personal projects. And that's where I started running into trouble.
You see, what I ended up doing was copying the code from one project to the next, but then I'd find a bug in it and have to update all the copies. Can you say, oops? But wait, it got worse: I had some ideas for improvements, extra features that would be handy, like automagically including other templates into the current one by using a special version of the curly brackets placeholder. But whenever I thought of it, I either added the improvement only to one version of the code (whichever the latest project I'd been working on had been), or I didn't add it at all because of this duplication tax.
Now, clearly, the tax is not that high, after all, I only needed to copy some files from one place to the next. Maybe I'm just too lazy for my own good (though there is some belief that laziness is a good feature for programmers, but that's a topic for another post), but clearly the multiple copy model wasn't working for me.
So I decided, instead of making these multiple copies, I would keep all the interesting, reusable bits of code in one place, and just include them as needed. But I also knew that reading in all those bits of code would be a performance hit if they weren't needed, so they had to be split up into segments of functionality. Like the template system I just described above, or a set of functions that make database interactions less painful.
What I ended up creating was a very interesting (to me) system of includes driven by a bunch of configuration variables. You could set a variable (a key of a well-defined and standardized array) and the bits of code you'd need would be included, and you could use the functions you needed without including any of the dozens of other functions (the current count is 176, if you're curious), and thus without wasting CPU time waiting for unnecessary resources to load from disk. That system is now called NeoFW and I still love it very much. It lets me make changes in one place, and have them show up instantly in all the places it's used.
Let me restate that, in case I haven't bashed you over the head with it enough: I make one change in one place, and it shows up everywhere!
On top of that, this code provides an interface for user-handling functions, meaning I can have a user log in only one time in one application, and have him logged into all of them at the same time. This is a wonderful change from the previous implementation where I had to be careful to use the same session variables across all applications, and the same session storage method, and heck, the same code, for that matter. Which is why it was perfect for this system.
Generalizing from this example, we find that it's a very good idea not to duplicate code when you can help it. Don't repeat yourself. When you repeat yourself, one of two things (or a combination thereof) is bound to happen: either you'll end up with synchronization issues (some versions of your code work differently than others), or you won't want to make any more changes to your code because it's so much of a hassle to keep track of where else that same code exists.
If I'd started with a Linux server, I'd probably have ended up using a lot of symlinks, and then I wouldn't have made NeoFW, which would be a shame, because I've learned many useful things while putting it together. But I couldn't have failed to learn this lesson: that the best number of repetitions of your code is one. Not two, not four, but one. And, of course, five is right out, as Monty Python taught us.
This is the second of hopefully many "personal experience" posts, where I detail the things I've learned and how I came to learn them. The previous post was Recognizing Failures, about not biting off more than you can chew. The next is Blogging For The Perennial Lurker, a slightly meta expression of my experience as a habitual information consumer, as applied to the inherently productive medium of blogging.
Why did I rescue this?
Because it's kind of hilarious and makes a good point. The hilarious part is that this article about not repeating yourself repeats itself quite a bit. The good point is that copying code is an invitation to losing track of the latest version, and the differences between them.
Overall, it's good shit.