Random Thoughts: 2007

Tuesday, November 13, 2007

war haiku

two soldiers each see
a cruel and mindless killer.
young face peers through door.

Monday, October 29, 2007

DHTML is the technique of combining JavaScript and CSS to create dynamic page elements that are not possible with static HTML. For example, it's easy to dynamically show or hide page elements, or move them.

Given the differences in broswers, it is by nature a bit difficult to write clean DHTML. And worse, most free DHTML examples of code on the internet are written poorly, as they have countless little browser checks, and conditional code branches like:


    if(isIE){
        //do something one way
    } else if(isNetscape6){
        //do something else another way
    } else if(isNetscape4){
        //do something else yet another way
        //...
    }

This is poor design, as it is brittle, hard to maintain, and often breaks when new browsers are released or change. And likely, any browser not in the checks will not supprted correctly either. Also I see a lot of newer DHTML scripts doing more css and in place of javascript, although, I still think css support across browsers is still a bit flakey.

A better way is to design something with these simple principles:

1. sniff for functionality rather than browser name/version
2. create a single wrapper function that encapsulates brower specific code
3. page should "degrade gracefully" if possible (meaning the page will still be readable and functional if javascript or dhtml not supported)

Here is a small example designed with these principles in mind (modified from Apple Developer's site):


    // cross-browser dhtml utilities
    // wrap blocks of html in div> tags with unique id's

    // try to get a style object given its id
    function getStyleObject( objectId ) {
        if ( document.getElementById && 
                document.getElementById( objectId ) ) {
            // W3C DOM
            return document.getElementById( objectId ).style;

        } else if ( document.all && document.all( objectId ) ) {
            // MSIE 4 DOM
            return document.all( objectId ).style;

        } else if ( document.layers && 
               document.layers[ objectId ] ) {
            // NN 4 DOM.. note: this won't find nested layers
            return document.layers[ objectId ];

        } else {
            return false;
        }
    }

    // a template function for setting two-state style properties
    function setStyleBoolean( objectId, booleanValue, 
          propertyName, valueOn, valueOff ) {
        var styleObject = getStyleObject( objectId );

        if ( styleObject ) {
            if ( booleanValue ) {
                styleObject[ propertyName ] = valueOn;

            } else {
                styleObject[ propertyName ] = valueOff;
            }

            return true;

        } else {
            return false;
        }
    }

    // try to show/hide object.  a empty visual space will remain in place
    function setObjectVisibility( objectId, booleanValue ) {
        return setStyleBoolean( objectId, booleanValue, 
            'visibility', 'visible', 'hidden' );
    }

    // try to insert/remove object from display.  page will redraw and no space will remain in place
    function setObjectDisplay( objectId, booleanValue ) {
        return setStyleBoolean( objectId, booleanValue, 
           'display', '', 'none' );
    }

    // try to move object
    function moveObject( objectId, newXCoordinate, newYCoordinate ) {
        var styleObject = getStyleObject( objectId );

        if ( styleObject ) {
            styleObject.left = newXCoordinate;
            styleObject.top = newYCoordinate;
            return true;

        } else {
            return false;
        }
    }

This code works in all javascript-enabled browsers I've tested (FireFox 1+, IE 4+, Netscape 4+, Opera), despite having no checks for any specific browser. Also, all code that interfaces directly with the broswer api is isolated, so it is very easy to extend or debug if there ever is a problem.

To use the code above, for example, save it to a file named "dhtmlutil.js" and use it like:


    <html>
    <head> 
        <script language="javascript" 
               src="dhtmlutil.js"></script>
    </head> 
    <body> 
     
    <h2>Clean DHTML example</h2>
     
    <div id="test1"> 
        this is a block of html that can be hidden or removed
    </div>
     
    <p>
        you can change the properties of the 
        block above with these buttons:

    <p> set display
    <input type=button 
        onclick="setObjectDisplay('test1', false);" 
        value = "off">
    <input type=button 
        onclick="setObjectDisplay('test1', true);" 
        value = "on">

    <p> set visibility

    <input type=button 
        onclick="setObjectVisibility('test1', false);" 
        value = "off">
    <input type=button 
        onclick="setObjectVisibility('test1', true);" 
        value = "on">

    </body>
    </html>

If you add more functions, put them all together in the head with the included file.

to use hyperlinks, use a dead link with an onclick handler like:


    <A HREF="javascript:void(0)" onClick="....">

DEGRADING GRACEFULLY

A fundamental problem with DHTML is figuring out how to design the page so that it degrades gracefully. In other words, if javascript or css aren't supported, the user can still use the page. This is especially difficult for menus, and othter page entitties that are hidden/collapsed by default. A menu is the most important element on the screen, and should be viewable on the lowest common denominator. Even if old browsers are obsolete, there is always a chance it won't be compatible with future broswer releases/bugs (for example, the Netscape 6 to 7 fiasco). The general rule is:

4. Any hidden entity should be visible if DHTML is broken.
5. No core functionality should ever depend on DHTML.

The best way to achieve this is to wrap javascript around css elements in the header, such as:


    //write styles via javascript, to degrade gracefully
    var idx = 'yourid';
    if (document.getElementById || 
           document.all || document.layers ){
        document.write('<style type="text/css">')
        document.write('.switchcontent{display:none;}')
        document.write('<\/style>')
        document.write('<style type="text/css">')
        document.write('#' + idx + '{display:block;}')
        document.write('<\/style>')
    }

This code, for instance, will hide all members of the class "switchcontent," except for the one with the id "sc1". So, if javascript or css is broken, all these entities will be visible. Note that these tags conflict, but that the style closest to the element takes precedence. Corresponding div tags for this code might look like:


    <div id="sc1" class="switchcontent">
        something visible by default
    </div>
    <div id="sc2" class="switchcontent">
        something invisible 
    </div>
    <div id="sc3" class="switchcontent">
        something invisible 
    </div>

Calling command line tools from a web application

Occasionally, it's convenient (or practically necessary) to have a web application call a command line tool. If not done wisely, it can open up a huge security risk. For example, suppose we are using a simple command line tool that excutes a command like:

./simpletool $username

Now suppose someone creates a new user named "bill;rm -rf /" and passes it to a command line utility which executes:

./simpletool bill; rm -rf /

This will of course wipe everything out under /. Not good! It's not hard, if these interfaces have not been thought through, to find a string that does anything we want. Quotes, single quotes, and other special chars are easy to slip through and create major security holes. Buffer overflows could also be a problem... the command line can only accept a string of some finite length.

But while it's not ideal, it's not a huge security risk as long as there's some special safety considerations when handling user input. Here's a short list of security principles when using system calls:

After authentication and authorization is handled:

1. for all data, limit strings to a known set of characters
2. limit all strings to known sizes
3. limit all hashes of user input to known fields
4. avoid passing any data to the command line which isn't
programatically generated by you. If possible write user data to file, and pass your file names around. Otherwise, escape the data properly ... especially non-alpha-numeric chars like " ' ; \ ` $() which have meaning on the command line! Or better yet, just don't pass it on the command line.
5. escape all data that has special meaning (or could have special meaning) in the context in which it is used. For example, in a tab-delimited file, tab chars have special meaning.
6. limit file permissions to bare minimum ... nothing can be executable.
7. keep log of all interactions (leave the created files around for reference)

Friday, October 12, 2007

Using an SSL certificate for IIS under JBoss/Tomcat

If your certificate was requested for IIS, it will likely throw a pop up warning on pages if you install it under JBoss. Although there is a way to fix this.

For different servers, often the CA will use a different chain to sign your SSL cert. The only trick is to explicitly chain these together when building a keystore. This takes a little guesswork, but there are a limited number of intermediate and root certs.

To test out a configuration, it's probably best to add an entry in the local hosts file on the machine, and use a local (or development) server that can be started and stopped quickly.

For example, if it's a go daddy certificate, google for "go daddy root ca". You might get this page:

https://certificates.godaddy.com/Repository.go

Download them all for testing.

in my case, these worked:
intermediate: gd_intermediate.crt
root: gd-class2-root.cer

But overall, use the same instructions below as below:

Setting up SSL in JBoss/Tomcat with an Intermediate SSL CA

Verisign has switched to using a "intermediate" certificate authority (CA). These are a little tricker to install. The trick is getting the appropriate intermediate and root CA files from verisign, and "chaining" together into one file. Basically just concatenating them... one after the other with a header.

-----
General explanation ... Assuming you have these files:

* server.key - your certificate's private key
* server.crt - your certificate
* inter.crt - the intermediate CA that signed your certificate
* root.crt - the root CA that signed the intermediate CA

First, concatenate the CA certs. Be sure the intermediate CA goes first:

$ cat inter.crt root.crt > chain.crt

Next, export the pkcs12 file:

$ openssl pkcs12 -export -chain -inkey server.key -in server.crt\
-name "server" -CAfile chain.crt -out server.p12

You'll be prompted for a password ... this the the one referenced by the server.xml config .
Enter something ... don't leave it empty.

Now, you can use keytool to verify:

$ keytool -list -v -storetype pkcs12 -keystore server.p12

Requires the password entered above... Then you should see a line like:

....
Certificate chain length: 3
....

That's it. Since this is complex, Here's a simple script that should work, with minimal editing:

#!/bin/sh

# this script creates keystores for domains

# notes on script setup

# server_key - The file name of the private key for server.

# server_crt - The file name of the certificate for server.

# cert_name - Something unique, can be anything, just
# a friendly name.

# inter_crt - The file name of the intermediate
# certificate authority (CA)
# The intermediate CA can be downloaded from Verisign.
# although it might take a bit of googling to figure out where
# it is. Try searching for "Verisign intermediate certificate"
# for example.

# root_crt - The file name of the root certificate (from verisign)
# which signed the intermediate CA
# This either needs to be .cer or b64 format.
# The only difference is a header footer
# which may need to be added (see below echo's).
# The package of roots can be downloaded from verisign.
# Good luck guessing which one it is! :)
# Usually the error message ...
# or the certificate information in the browser will give a clue.
# Again, google for this. Try searching for (for example):
# "Verisign root certificate"

# chain_crt - The file name where intermediate and root
# is concatenated. Can be anything really.
# Well almost anything ...
# just don't overwrite some important system file :)

# out_file - The file name where the keystore file is saved

#set up your file paths and variables ...

server_key=yoursite.net.key
server_crt=yoursite.net.cert.2007
cert_name=yoursite_net
inter_crt=verisign_int_ca.cert
root_crt=../roots/VeriSign_Roots/Pca3ss_v4.b64
chain_crt=chain.crt
out_file=keystore_yoursite_2007.pkcs12

echo "creating $out_file ..."
echo "remember"
echo "1. Change owner and permissions"
echo "2. move to jboss/server/default/conf"
echo "3. update jboss/server/default/deploy/jbossweb-tomcat55.sar/server.xml"
echo "4. restart jboss"

echo ""
echo "on errors check $chain_crt ... certs should appear one after the other with header and footer."
echo "the second most likely problem is just having the wrong intermediate and root certs"

## clean up
# rm $out_file
# rm $chain_crt

# build chain certificate

cat $inter_crt > $chain_crt

# may need headers if simply b64
echo "-----BEGIN CERTIFICATE-----" >> $chain_crt

# might need extra line feed ... check $chain_crt
# echo "" >> $chain_crt

cat $root_crt >> $chain_crt

# may need footer if simply b64
echo "-----END CERTIFICATE-----" >> $chain_crt

# create keystore

openssl pkcs12 -export -chain \
-inkey $server_key \
-in $server_crt \
-name $cert_name \
-CAfile $chain_crt \
-caname root \
-out $out_file

Setting up SSL in JBoss/Tomcat

For whatever reason, I think setting up SSL in JBoss is a major pain. It's hard to find good documentation, and the steps that are required seems over-complicated.

For now, JBoss / tomcat only supports JKS (the format java had to invent) and PKCS12 format (which is somewhat of the general standard). JKS is better supported in java tools, PKCS better supported generally outside java. So, ptobably painful either way. I'd recommend PKCS12.

REQUIRED TOOLS:
openssl -- for windows users, i installed this tool under cygwin

REQUIRED FILES:
cert and key files from these will need to be converted into a keystore file for jboss/tomcat.

HOWTO:

1. convert these two files into a pkcs12 file ... this will prompt for a password.

openssl pkcs12 -export \
-in YOUR_CERT_FILE.cert \
-inkey YOUR_KEY_FILE.key \
-out keystore.pkcs12

2. config jboss/tomcat.

you'll need to edit this file (tomcat 5)

jboss/server/default/deploy/jbossweb-tomcat50.sar/server.xml

and uncomment the lines that refer to SSL.
note, ny default, jboss is configured for port 8443.
by default, https: goes over 443, similar to http over 80.
The configured settings might look like:

<Connector port="443" address="${jboss.bind.address}"
maxThreads="100" minSpareThreads="5" maxSpareThreads="15"
scheme="https" secure="true" clientAuth="false"
keystoreFile="${jboss.server.home.dir}/conf/keystore.pkcs12"
keystoreType="PKCS12" keystorePass="" sslProtocol = "TLS" />

3. meta security

save any copies of the key and cert as root:root, with 400 permissions.

the keystore file should have as few permissions as needed to run.

In Linux, as a security precaution built into the kernel, only the root user can bind to a port below 1024. this means that either,

A. you leave the port set at 8443. of course, since this is not the defualt port, it will need to be specified in all URL requests. or,
B. configure the server for 443, and run the server as root, (not recommended for production server) or
C. might use iptables to route traffic from 8443 to 443.

FOR MORE EXAMPLES SEE:
http://mortbay.com/jetty/faq?s=400-Security&t=ssl
http://tomcat.apache.org/tomcat-5.0-doc/ssl-howto.html
http://mail-archives.apache.org/mod_mbox/tomcat-users/200409.mbox/%3C4150BB34. 3010905@ddai.net%3E

Tuesday, October 2, 2007

Principles of Software Development

Here's a list of principles that I've found helpful over the years (from experience, friends, or books). Some are perhaps so obvious that it hardly seems worthwhile to mention (although they weren't obvious to me as a beginner). Others are more subjective and can be disputed. Although usually, when I run into problems, it's because I (or someone else) didn't follow these these rules of thumb. Hopefully not too much redundancy or bad advice here... :)

* Take all principles in moderation. There's an exception to every rule. Aim for a balance, not an all-or-nothing approach. There's often not a clear right or wrong way to design something, but a choice between several trade-offs. For example, what's more important to you ... price, quality, or customer support? What's the context in which something will be used?

* understand a problem.. Perhaps there is one, perhaps not. Do what's best for the customer ... not what will necessarily make the most money. Sometimes, I think problems are created artificially, or needlessly. People are trained to see things *as* a problem, or get locked into a particular approach, and decide that the mountain must move to the person.

Trying to make a problem where there isn't one creates problems in the long run. So, rushing into "fixing a problem" is generally not a good idea on new projects. Often the problem vanishes, or becomes irrelevant in light of new developments.

* At the bottom, a "problem" is just the effect of someone not liking something, or wanting something to be different. Not every problem is a technical problem. Some are best solved through social engineering. Remember that there's a person at the other end of every problem :)

* Client's ideas: If something seems like a bad idea, it probably is. Do NOT do anything that seems outright silly. If you do everything the client asks for, often, they won't even remember asking for it, and will be more unhappy when it needs to be changed. Rather, partly ignore what the client says, with the realization that they don't always know what they want. Or in other words, look at the spirit of the law rather than the letter of the law. If something is a major change, let the client know. Otherwise, doing so might only cause confusion (many ideas are hard to explain), or cause the client to dig their heels in on some otherwise pointless issue.

* Clients and bugs. If there is a bug, the client will often use the bug fix as a chance to creep on the scope of the project, or to introduce new functionality that the bug drew attention to. Also, a few minutes invested into doing something "the best way" is usually better than trying to fix it after the fact. Building something and fixing something are two different processes. Or perhaps there's a psychological factor here ... if something hits your car, you stop and inspect it, then noticing all sorts of unrelated scratches and bumps that were previously not seen. The threshold of perception and pickiness changes on focus.

* Responsibility is passed by contact. If you are the last person to touch a project, you will become responsible for any problems it has. Evaluate the existing code to determine if you really want that responsibility, under the given payment.

* On "the mythical man month" (see book): Just because a woman can create a baby in nine months, it doesn't follow that nine women can create a baby in one month. Adding more people to a project can easily slow it down. If something is a "one-person-job" adding two people could actually make it a disaster. More people can be work together, only to the degree that the project can be split into mutually indepent units. Ideally, it's probably best if one person designed everything.

* Note Assumptions: what is known (has been done at least once) and unknown (supposed to be easy, but hasn't ever actually been tested)? Don't be surprised that a seemingly simple task ends up being extremely difficult, or even impossible (not often but sometimes happens).

* Research. It's possible to save weeks worth of by one or two day's worth of planning. Even a week's worth of thought might save a month... Even if you are in a rush, do not short-change the planning. For example, large components might really be unneeded, or already written, or available as open source. Always write up a rough blueprint. On a side note, on many times when I've started doing planning, the client changes their mind several times, and might end up reverting to what already was in place.

* IMO, it's better to use free and open source solutions over proprietary solutions, when possible. Proprietary solutions will lock you into using more proprietary solutions. Note: free solutions are not really "free" since there is usually more of an upfront investment of configuration, or understanding on how the software works. However, they are cheaper in the long run I've found. The real advantage comes though, in free access to source code. Owning something is not necessarily a good thing. Release your source code as open source as possible .. and let other people fix your bugs.

If you're using open source tools, sometimes it's best to avoid the very newest release. Use the second newest, and let everyone else find the new problems first. Or, make sure it's at least six months old before trying it. You probably want to stick with larger names, even if the product is not as good ... because you may also get better long term support.

* When evaluating new technology, beware of the popular terms like 'legacy' and other buzzwords. In some part, old technology is good, and that is why it is old and not dead. Those that sell software, and developer tools use the term 'legacy' and other buzzwords to create artificial churn, to sell 'new' solutions. After all, if all the problems have been solved, what is there to sell? :) Most new and cool software really isn't necessary, and doesn't make the problem easier. Nor does it make the job easier (as often promised) since it's just one more thing to learn from scratch. A lot of dialog in the form "X is better than Y" is mostly just marketing hype or evangelical statements.

* swim downstream. If you are using a server that has one well-supported platform, use it. Don't install something else, if you will be the only person using it. For one, the existing system is probably workable, just not ideal. Moreover, you will be resposonsible for this platform, and possibly be the only person in a large group that cares if there are issues/bugs that need to be worked out. Otherwise, if your ship is floating out in a different direction than the rest of the fleet, you will be the one to hit every iceberg that comes along. Unless you can get an entire team/unit to go with a platform, stick with what is best supported.

The same goes for programming approach. If an application is written in a less-than ideal style, but is otherwise useable, continue with the existing style. Don't create a Franken-program that looks like it's hacked together from parts of each contributing author. There's something to be said for large-scale consistency, which is above and beyond the elegance of any one particular component. If you are going to change the style, change it globally.

* Define the project. This means also, defined the lifetime, and length of support. Don't forget that support costs money, and things don't support themselves. You you either need some kind of cutoff limit, or monthly fee. If you are using 3rd party software, what is the cost of that? Note assumptions and limitations of software, if known.

* Dump your short term memory. Keep a scratchpad of notes, or a simple text file. When you figure out how to do something, note it, or explain it to yourself so that 6 months later you can search and find it. It is your hard disk.

* Write comments! It amazes me how many people fail to do such a simple thing. Not only will you thank yourself in six months, but on the off-chance that someone else is forced to maintain your code, they will not hunt you down and shoot you. I think the law should be: if a person does not add comments to code, they must maintain it for life. :)

* keep a todo list. mine is usually a simple text file like this:

[ ] configure server for bla bla bla
[ ] setup bla bla bla
[ ] asdfasdkljasdlkjads

Then you can grep the list for '[ ]', and see what's on your plate.

When I finish, I just mark these off as resolved.

[r] asdfasdkljasdlkjads

In vi, I just search for these with / ] and press 'r' twice to mark them off. I work from the top of the list to the bottom. I'm sure there's some nuclear-powered-clam-digger that will organize your day for you. But in my opinion, most apps could be replaced with:

1. a good text editor, and
2. a little thought put into how the data is organized/tagged.

* Take breaks. Trying to rush something, or working while you are tired will usually only end up wasting more time. If you are tired, or rushing your work, you will probably spend an equivalent amount of time the next day debugging the code, as you would have spent working at a more reasonable pace. Never underestimate the power of "doing nothing" ... you'd be amazed at the good ideas that might come to you. Because even when you sleep, your brain is still working on the problem. If you don't have a good solution to a problem, don't waste time coding a bad solution. You won't save time with a bad solution.

* software engineering is basically comprised of three simple tasks:

1. *moving information* from one place to another
2. *adding* two numbers together
3. *scheduling* the two tasks above.

anything above and beyond this is just abstraction...

* What the program does, from a user's perspective, is just an illusion. Don't assume that the data or structure of the underlying software needs to somehow mimic the end appearance. Sometimes it's more convenient to operate on two separate classes of things: the programmer's view of the data, and the user's view of things. Often, many things are viewed as starting point constraints, when in reality they are only end points. Also, you can mark up data any way you like, as long as you strip this out in the end. You can always "render" one view into another, or pull a better view out of thin air.

concrete example:

problem: Write a program that dynamically manipulates c/c++ "make" files.

solution: In c, "make" files are a strange, brittle text format and tricky to manipulate dynamically. So instead perhaps a better approach is to create an xml format that is rendered to a make file (through xslt). Then manipulate the xml (as the libraries for doing this are plentiful). The make file is always overwritten. [optional] if at some point you need to use an existing make file, then write a converter that parses a make file into your xml format.

* Look at software components in terms of function (purpose, use). These should be split out into clean, distinct units. If two functions are similar, collapse them into a single generic unit. But don't have one unit do two different types of functions. Remove specifics from function if possible. For example, rather than building a function that adds 1 to a number, build it to add n to a number. Rather than storing two types of data in a field, split these up, or add a flag that distinguishes a type.

This rule applies to both small and large scale of the software. In terms of overall software design, the MVC pattern is a clean approach:

the model: the data, or application state
the view: what the user sees / formatting
the controller: changes the model and selects views for user

* Simplicity (especially on interfaces to a unit of functionality): If you can get 90 percent of the desired effect with 10 percent of the work, use the simpler solution. It's better to have a simple or slightly incomplete implementation than a "perfect" interface. I tend to favor the so called "worse is better" approach over the "MIT" approach. One advantage, for example, is that needs of the software will evolve, and the simpler solution will be more flexible than an extremely complex framework that is perfectly fit to one solution. More abstraction is not necessarily better. Eventually, long lived software ends up doing things it was never intended or designed to do from the beginning.

* Avoid special cases. If they crop up, rethink the entire approach so they can be handled as "just another case". Special cases tend to introduce special bugs, and will be cursed over and over. They will also require more and more special cases to be added since every new function may need special cases to be added. This will spread through the code like a cancer. Remove them early. A special case is probably the worst wart an application can have.

* If something is ugly, it's probably wrong. Code should be elegant, with symmetry, deeper structure, and pattern running through it. Ugly code is harder to maintain, and naturally more error prone. Ugly solutions force more ugliness down the road. This is subjective of course, but don't discount your instincts. Don't bolt a bad solution on top of another bad solution ... go for the root of the problem.

* Assume that there will be changes. It's equally important to design something that both works and has the ability to change in the future. Isolate your interactions with systems, third party tools, databases, etc into one section of code for example, so that these things can theoretically be swapped out in the future. Avoid peppering the code with something that can't easily be tracked down and changed. It's better to package these things into a "black box" type interface, that hides all the specific details about the implementation.

* Try to eliminate redundancy if there are more than two occurrences of a block of code (eg, more than 2 lines of complex code). Two instances of the same things is possibly okay (not good, but not always terrible). Generalize several related problems into one problem. Redundancy will help introduce bugs, and make things harder to maintain.

* In general, you should consider pulling a block of code into a function (or class of functions) if:

* it's used repeatedly in several places in the code
* it's very important
* it could possibly change
* it's "low-level" code relative to the rest of the application
* it interfaces with another system
* it can conceal many quirks/exceptions within one layer of abstraction
* it does something that can be labeled with a different verb/noun
* the code block is getting fairly long (more than 2 pages)
* the level of nested indent is getting deep (eg. more than 4 indents)

Factoring redundant code is similar to factoring a number into basic, building blocks relative to some operation. Eg: 12 = 2*2*3.

* Centralize the requirements and components of the project. The more you manage yourself, in one place, with one tool, on one machine, the better.

For example, if you have many clients and a one server, it is generally more reliable to do something on the server side than client side.

Also for example, any time you use another vendor, or remote set of tools, you introduce yet another layer of bugs and problems. External vendors never really care as much to get something to work as you would. Putting clients in the middle will make the problems exponentially worse. If you do use external vendors, make sure there's a way to get away from that system if needed.

Unless you create something yourself, beware of assumptions. Also, beware of data that someone else gives you. For example, "unique keys" might not be be unique, or may change. If you get a list, there's a good chance it's not 100% complete or correct.

Avoid distributing your application, or application logic across multiple distinct entities, multiple authors, or divisions of a company. Of course, for performance, security, or technical reasons, this may not always be a luxury.

* Some things are better not centralized. Things you trust can be centralized. Things you do not trust can (or possibly should) be decentralized, with added redundancy. Redundancy is usually only good, if it's intentionally added to the design to protect against the entire system failing. Or if it's added to increase the performance of something. In databases, particularly, some carefully added redundancy (on keys, flags, data from other tables, calculatable data) might speed up queries significantly.

* if something is decentralized, minimize the requirements of the decentralized systems. Have one part of the system as the central controller ... the brain of the entire system.

* If you need to design a death star, decentralize everything, in case one part is detroyed, it can self-heal. :) For hostile environments the same applies as well.

* Use source control. git or svn are popular (git probably has more power where svn is possibly easier for beginners). Arguably the most useful features are: two or more people can work on the same project if needed. If something gets screwed up, you can always revert to a previous version. You can always diff the working copy and the committed copy, to see what last changed (which is the most likely suspect for anything broken)

* Make it break big (MIBB). This is a counter-intuitive principle my father noted. If something is a mistake, the best case is ideally you want the program to break at the largest scale possible. So that it's easily and quickly seen. Ideally, if there's a bug you want that it should never compile. Not all bugs are equal, and many times, you can control the type of bugs that crop up by good design.

For instance, by simply putting a non-writable variable on the left, this will catch the common mistake of typing a '=' instead of '==':

// c -style code
if( some_variable == 1 ) {} // correct

if( some_variable = 1 ) {} // common mistake: assigns instead of testing
// equality. Will compile.

if( 1 = some_variable ) {} // common mistake that won't compile

by best design bugs ranked to worst are:

(1) break at compile time (THE BEST)
(2) cause the entire program to fail instantly
(3) cause the particular function to fail instantly
(4) function fails in most cases
(5) function randomly fail in some cases,
but does not kill entire application (self healing).
(6) randomly cause entire program to fail with no detectable pattern.
I've seen one bug (I didn't write) that took months to reappear...
they are hard to solve. (THE WORST CASE!)

When writing software, think: if there WERE a bug in this code, how would it fail? Then aim for (1). This is one reason that statically typed languages are popular, despite imposing more of a constraint on programming. It allows more compile time checking. Although, one can often write unit tests that find bugs (2) - (5).

* Of course if there is a failure, the software should ideally take an active role in reporting the failure, and trying to repair the problem (by reseting, trying again, etc).

* Plan on finding/replacing text. Decorate the code in a way that allows easy find/replace. It's very hard to do a find and replace on a short variable name like 'o'... so if it's important decorate these in a verbose manner that can be uniquely identified (like SomeTypeOfThing). Remember if this word occurs in a lot of other words/lines it makes textual changes more of a pain. Also for example, rather than splitting something up into several lines, keep blocks of related functions in groups together ... or collapse to a method call on a single line. Add comment markers consistently, or use verbose names. Pay attention to spaces and keep consistent. Align on vertical fro column-wise editing (e.g. in vim). Some tools exist for refactoring's, although they don't really help with changing large scale symmetry in the program (or as it should be there).

Sometimes, as an extension of this rule, you might want to introduce variables and functions for sheer semantic value for someone reading the program. For example, consider the clarity of:

----
EXAMPLE

if ( data[0] < game_over =" data[0]" a1 = "asdf" a2 = "asdf" a3 = "asdf" a4 = "asdf" i="0;">load_file('magicalfile');

it's pretty much unclear where on earth this magical file is to someone *reading* this. This forces someone to have to track down the Util class, for example, then possibly locate many config files, ultimately wasting an lot of time on unneeded details ... which should have been encapsulated by the Util class (whatever it is). In the worst case I've had to scouring the entire project, or large swatches of a machine to find the file I wanted. Now consider if this was written as:

Util->load_file("lib/template/admin/magicalfile.txt");

This shows more or most the information needed in the call, even though it's redundant, it makes the transaction more self contained. Might not always be a good idea, but keep this in mind whether one is shrinking AND obfuscating a program.

But note in this example, I wouldn't use absolute paths, since this anchors the program to a specific configuration... moving it will break it. So a relative path, from the base of the project, is a good middle ground. It doesn't hardcode too much information, and omit too much. There's a difference between making a unit more generic, and hardcoding information into it.

If the same strings are used over and over, put them in a global config file. But don't put some information in a Utility class, and some in a config file ... splitting it into several places. Classes should be generic enough to work if copied anywhere, and ideally should be absent of localized information.

Also the same principle applies to variable names, it's better to avoid abbreviations for the same reason. The same name can be abbreviate many different ways, and are often not even consistent within the same program. The only legitimate use of abbreviations would be for performance reasons, for example, in a transmition of data, or extremely feeble processor, or a variable with a very short life. But it's a bad habit that programmers have developed, perhaps a throwback to slow typists and feeble machines.

* view data in a context. if data is placed in another context be sure it's escaped properly, so that no special characters could break the second context. If writing something that calls a second app, restrict the data so that only known characters are passed (within a range) and have a known size.

Also, try to keep data and (a context for) presentation separated...

* porting code. Beware of porting code between syntactically similar languages. Just because two languages look alike, that doesn't mean they are alike under the surface. For example, order of operations might be different. Some things could be cast differently, leading to subtle bugs.

For example:

(float) int/int

in Java: produces a float
in C#: produces a *rounded* float.

A subtle bug is worse than a full out breakdown. It's possibly safer to port between unlike languages, because any breakage is larger and more noticable.

* interfaces for remote/external systems:

if someone else controls/uses the other end of a system interface, coordinate with the people on the other end of an interface to verify that data is being exchanged as expected. Set up a meeting just for this. This also serves to minimize the unhappiness if there is a bug, if the other person is involved in testing. Run through all cases, or types of cases.

when a system is first deployed, log every single byte send and received from the remote interface. After a few months, you can scale back the logging to something less verbose. This will at least leave a trail to recover from, in case there are any subtle bugs. Otherwise, if it's not possible to log everything,

if there is any monetary transaction exchanged, explicitly test each case of the transaction. Even small amounts of (literally, a penny) will cause the customers a lot of anxiety. A small bug, in a general application, won't cause the same anxiety as if money is involved. The axiety is not proportiate to the amount of money involved. So, for example, check the pennies. Check how the money is being rounded.

on the service side, add a test method to indicate that the system is up

require all users/programs to identify themselves (for debugging, security)

all methods acknowledge success/failure

return a status for every operation with:
a status code
some short text string that has meaning to you, but not too revealing

can handle failure and reset state
can handle multiple hits/hiccups
log all success
log all failure verbosely <<<<< VERY IMPORTANT!

try to make the interface as generic as possible, so that features can be added in line with what already exists. avoid specifics if possible.

although simple interfaces are easier to use, it's better to have a simple implementation than a simple interface.

is there anyway to upgrade/deprecate the interface? How do you know what all will be calling it?

* time bombs:

what will the program do in the future?
will the data get larger?
is it bound by a known constraint?
will it become cluttered and slow down (cleanup)?
what happens if a calling program (hacker) send unbounded input?

* whenever interacting with the operating system, make sure that:

all data is within know sizes
all characters within known set

* a couple fields to consider adding for every important database table:

a uuid for the id
a read/write state (dirty/updating/written/etc)
a date of when last accessed
a date of when last modified
a date of when created
a version number
an active/inactive state
an ip, or some physical identification
a username, or what the user claims to be
a generic "type" int field

note in some cases, old data should never be updated. instead, each update should really be an insert, with an increment of the version number.

* beware of "perfect server" assumptions:

write a simple monitor that reports if the server is up/down and send alerts

also, each client should check that the server is up before trying to use it.

also, can the system be taken off line if needed?
how will this affect users?
is there a development/test environment?
how will it be tested, upgraded?

for example:

a global flag could be set indicating that the system is shutting down.
new processes/users are locked out.
after old processes/users finish,
or after some time frame (say :30 min)
the system stops

is there a way to notify users currently on the system?

is there a way to see who is using the server?

* smokey the bear: leave things in the same state you found them in. Or provide a method that allows this. If you are working with an old peice of software, try to leave it's outward apearance (API) the same. That is, if there was some external program that used it, it might not break. If you are working with data, keep your copy of data clean. If you need to do something to the data (append, or substitute) create another variable and work with that. Or avoid changing the data in place, or do formatting all in one seperate place. That way, if someone else comes along, they aren't using a variable that you messed with, thinking it's normal data. For the same reason, avoid functions that morph the arguments (unless maybe they are explicitly noted as output variables).

* atomic transactions: if possible try to create atomic transactions that either work or do not work. Consider what happens if something fails in the middle of a process. For example, rather than writing a huge file out (replacing an old) ... write it to a temp location then rename it to the old. This makes the window of failure very small, and minimizes the chance of the file becoming corrupted. A transaction context should be at the level of business logic, or higher logic of the application. A single transaction may include a number of steps. Ideally if something fails, you should be able to notify all steps and rollback to a previous state, or commit the transaction.

* enforced security: if security is required, you need to wonder about the security of the security system. Sure, you have a system in place, but how do you know it's not being, effectively, disabled by the end users?

For a username/password system, for example, you might add checks that test and require stronger passwords. People are sometimes lazy, and will use common words (like 'password', their username, 1234, etc). Even if the system is not worth hacking, you still have the problem of people logging into the wrong account, and screwing things up by accident. Just because you put the framework for a security system in place doesn't mean people are actually using that system.

* Don't design anything that depends on people remembering to do something in the future. Surely, someone will forget. In such cases, the software should actively remind people something should be done before breaking. When possible, send out emails, pages, or make bleeping noises. Or smoke signals. There's no guarantee anyone will look at the log, unless something breaks (or ever that the person who wrote the software is still around to answer questions).

* old carpenter's rule: measure twice, cut once. Do not cut twice. There's a temptation from ordinary life to to "do something twice just to be safe." On testing, this is useful. Ourside of testing it usually creates more problems than it solves ... and is inherently UNSAFE. Do things correctly, once and only once.

* Meta-programming: Write tools that do your job: If you find yourself doing the same type of thing over and over again, write tools to do the job (or custom macros). Computers are made to do grunt work. Doing something by hand is tedious and error prone. Ideally, the goal is to write a robot monkey that does everything for you...

* On refactoring: When finished, look over code to see if things can be organized better or cleaned up. Investing a little time here can greatly improve the code. This step is often overlooked.

* Note oddities. Often I'll see something and think "hmm? that's different" ... something that seems fairly minor, but not what you'd expect. These things are often the tip of an iceberg. :) Even in cases where things seem very trivial and would not affect function (like spacing). If it doesn't make sense, there's probably a bug ... assume it's yours.

* Optimization: make the project work first, work correctly second, and be work fast third. Don't try to optimize something before the problem is basically solved. A good algorithm is worth more than secondary optimization.

Small scale optimization (like tweaking the assembly code) is usually not as worthwhile important as large scale optimization ... picking the best overall algorithms, and logic for the software.

It's importatnt to generally understand speeds of operations, like memory access, file access, database access, system access, computations. Then look at potential bottleneck. Things done repeatedly, things accessing remote systems can slow everything down.

On large sets of data, try to compute as much as possible in advance. For example, suppose you need a fuzzier pattern matching algorithm. Compute it when the data is first input and save a key or additional data side by side with the original data.

Lookups (in memory) are cheap compared to a computation. On moving data, only move the small apects that are truly needed. Cut out redundany operations, use cacheing to speed up repeated operations.

Indexes are a good example of using both precomputation and redundancy for the sake of speed. Both the search is precomputed, and data trimmed down to allow faster navigation ... separate from the main store of data.

Caching is another example of precomputed redundancy ... saving a local copy of something that, for example, might come from a slow remote location. The improvement is speed; a cost may be that the data is slightly out of date, or you may use more memory/disk.

* Testing is 90% of programming. With hands on testing, you must try to break your code. Although the thing to keep in mind is while designing the application, split things into groups or entities that allow easy testing (for example with junit or nunit). Write one section of code that handles all cases, then test that section of code.

Some testing can be automated, if you design the components so that individual units can be hit with a battery of tests (or test scripts) that will detect problems with a change. This way, individual components can be treated as black boxes, and tested in smaller units. junit and nunit, for example, are useful for running tests on something to see if any new change had any undesirable effect. Might write a single class that generates random test data for objects.

Though generally there are three dimension to a product: quality, price, and service. Sometimes, for simpler projects, I will have the customer do more of the testing, and end up with less hours in development.

* On maintaining (future edits): If something is not broken, DO NOT touch it or fiddle with it in any way. If something is ugly, leave it. Given the law of unintended consequence, the likelihood of something breaking on a complex system is high. On the other hand, if something is broken, fix it, and fix it right. Rewrite entire parts of the application if needed. Do not add a hack.

* write documentation. That way you don't have to walk around with all this in your memory. It's easy to forget something after 6 months.

* Upgrade: Remember that software (including the language and platform it depends on) will need to be upgraded eventually ... good software is never finished. So, there needs to be a way to test and make changes to the existing system without breaking it. It's probably best to set up a mirror of the production site (where all code is identical, except for perhaps the configuration settings. Keep in mind, during designing something, or designing two components that interact: Each should have a development copy, or at minimum a development mode that can be tested against without actually screwing up anything. Isolation differences in config files, xml files, implemented objects, etc.

Hello There

Hello there, I'm fixing some peas!

Random Thoughts