deutsch

C++ Programming (Style) Guidelines
plus some Tips & Tricks

Original location of this document:
cgvr.informatik.uni-bremen.de/progr/codingGuidelinesEN.shtml

Gabriel Zachmann
Universität Bremen, Fachbereich Mathematik/Informatik, Computergraphik und virtuelle Realität
Linzer Straße 9a, OAS
28359 Bremen
zach at informatik.uni-bremen.de

Version 2.3.2, Oct 2013


Contents

Part 1 - A Short Version (for Experts)

Part 2 - For Non-Experts


Intro
Coding-Style
    The n Commandments for Programmers
    The Python Way
Naming Conventions
    Intro
    Meta-Naming (C++)
    C and C++
    C
    C++
Commentaries
Structure and File-Layout
    Layout inside a Class
    Indentation and Spaces
Maintainability
    Readability
Good and Bad Programming
    C++
    Round-off errors
    Hard-Coded Paths
    Arrays
    Magic numbers
    Macros
    Assumptions
    Input-Parameters
    "Can't happen"-Cases
    Misc
    Beginner-Bugs
    Minor Issues
    Optimization
Object-oriented Design
    General Guidelines
    Liskov's Substitution Principle
    Open/Closed Principle
    Class or Algorithm?
Robustness
Working Methods
    "There is no later"
    How to Steal Code?
    How to Find Bugs?
    RTFM
    Tools
    Bibliography, more Guidelines


Part 1 - A Short Version (for Experts)

This part is meant for experienced programmers.

Naming Conventions

Here are some examples that show our conventions. You can find more details below. (GoTo Meta-Naming.)

The most important part of a name is, of course, it's meaningb) to the user (You can recognize truly great programmers by their good object- and method-names).

Type Example Explanation
Local Variables mylocalvar, my_local_var lower case only
Instance Variables m_var m = "member"
Class Names MyCoolClass begin with upper case, InCaps
Exception-Classes XThrewUp like classes, begin with X
Class Variables M_MyClassVar combination of class name and instance variables
Constants const int MaxNum = 10; similar to class variables
Methods calcSomeCoolValue() begin with lower case, inCaps (ausführlicher)
Template-Parameters <MyTypeT> like class names with T at the end
Getter Function getVar(..) const retrieves Variable _var
Setter Function setVar(..)  
Properties isProperty()/hasProperty() const  
typedef SomeThingT like class names, with 'T'
Defines #define DEBUG_ON all-caps with underscores
enum-Types myEnumTypeE Suffix E; more about enums & Co.
enum-Members COL_TRUE, COL_FALSE like defines
Pointer- or
Reference-Declarations
String *str; * is next to the variable, not the type
b) "Nomen sit omen".

Commentaries

Use the templates in template-comments.cpp.

More about commenting.

Structure and File-Layout

Use the templates template.cpp
and template.h. (see details)

Use template.c for C files.
C++ files have the suffix .cpp, header files have the suffix .h, implemented templates have the suffix .hh.

Every file in the project must have a unique name.

Only a few classes per file. If there is more than one class in a file then they should have to do with each other or be part of a common package (e.g. small helper-classes that would otherwise be implemented as a class in a class).

Brackets are aligned vertically. For example:

int myFunction( .. )
{
    if ( .. )
    {
        for ( .. )
        {
            ..
        }
    }
    else
    {
        ..
    }
}

The indentation is 4 spaces per level!
K&R-Style is prohibited!
Structure Files by using spaces.
More about indentation and spaces.

Larger coherent parts of the code (or the header) should be separated by a blank line (like a paragraph).

Do's and Dont's

Read Scott Meyers' "Effective C++" and "More effective C++".

Always initialize in the constructor. Use initialization instead of allocation in the constructor (whenever possible). Reason: performance.

Always call the constructor of the base-class if you are in the constructor (in the initialization-list of course).

You should always declare a copy-constructor and an assignment-constructor. If the class doesn't need them, make them private (Reason). Implement the copy-constructor / the conversion-constructor like this:

A::A( const A/B &source )
{
	*this = source;
}

Constructors with only one parameter (called conversion-constructor) must be made explicit (Reason).
Except the copy-constructor! (Reason: return-by-value and pass-by-value will cease working; the compiler may optimize the copy away, and usually does so, but the standard calles for it -- probably, to make the program work on compilers that don't support that optimization.)
The same is true for constructors, where all but one argument have default values.

Use const or simple functions instead of define.

Favor new over malloc (see alloca regarding the performance).

No public instance- or class-variables. (Unless they are const)

If class A is used only by pointer or reference in the header file of class B, then use a forward declaration; do not include the header-file of class B. Example:

class A;
class B
{
    private:
    class A *a;
}

Do not change an operators "meaning" and pay attention to it's semantic counterpart. (Example)

If possible, implement binary operands globally (not as instance methods). (Reason: see "Effective C++", Item 19. In the end it comes down to this: if it's not absolutely necessary to be in the class, then define a method outside of the class, thus as a global method. Scott Meyer about this: "I've been battling programmers who've taken to heart the lesson that being object-oriented means putting functions inside the classes containing the data on which the functions operate. After all, they tell me, that's what encapsulation is all about. They are mistaken." Here you can find a detailed explanation of Scott Meyers (source) ).

Reference parameters are always declared as const, otherwise as a pointer (i.e., they can be altered) (Reason).

Use namespaces.
Never write using namespace my_package; in a header-file!

If methods have to be overloaded, they also have to be virtual. (ARVIKA 15.1 -- does someone know why?)
Virtual methods must also be declared as virtual in subclasses (if they are overloaded) (Reason).

Never overwrite an inherited default parameter.

Private methods can not be made public in a subclass.

Avoid casts. If you do have to use them, then use the new C++-Casts (Reason and Example).

Be careful with the definition of cast-operators! Here too can happen unwanted things unnoticed. (Example)

Write a downcasts only with a dynamic_cast (Reason).

Don't do inlining by hand. (Compiler-options, constructors)

Header-files with template-classes shouldn't contain any other code. Bei Templates steht der "Code" in einem extra File mit Suffix .hh.

Use the C++-features RTTI (typeid and dynamic_cast) and exceptions.

Avoid nested classes.

Avoid multiple inheritance.

A variable that is declared in a for()-construct is only valid in that loop:

for (int i=0; i<10; i ++ )
{
    // i is valid
}
// i is not valid any more

Set the option -LANG:ansi-for-init-scope when compiling on SGI.


Set the option -Qoption,cpp,--new_for_init when compiling on Intel's compiler.

Activate the warnings of the compiler and write warning free code.

No temporary objects in function calls (Reason).

Instance variables for which there is no get-method may be protected.

Don't do any "real" work in constructors or use exceptions (Reason).

Document null-statements in the source (Reason).

Write const-correct code. (This takes effort in the beginning ;-) ) Write your code const-correct right from the start! (It is practically impossible to do it later on.) And don't forget the const at the "right" side of a function.

Use the assert()-makro) often. (Set -DNDEBUG in the release!)

Do not use malloc directly, but the malloc-wrapper from defs.h.

Use unsigned int, if you don't need the negative values. This is usually the case in loops. You should think about this for every prototype that gets int.
Activate the warnings for this (-Wsign-compare in gcc).

Don't count on the order in which global or static member are initialized!
(The standard defines this order for a translation-unit, but you just have to switch the order of two definitions and might break!)

One function, one task!
Bad examples: Stack::Pop() and e.g. EvaluateSalaryAndReturnName().
(Reason: well arranged, easy to make exception-safe.)

sample_icon

Code-Example

For those in a hurry we have an example with annotated code, that contains some of these guidelines.

Compiler-Options

The following options should always be set:

For g++ :
-ansi -fno-default-inline -ffor-scope -Wall -W -Wpointer-arith -Wcast-qual -Wcast-align -Wconversion -Woverloaded-virtual -Wsign-compare -Wnon-virtual-dtor -Woverloaded-virtual -Wfloat-equal -Wmissing-prototypes -Wunreachable-code -Wno-reorder -D_GNU_SOURCE

For Intel under Windows :
-Qansi -MDd -Gi- -GR -GX- -Qrestrict -Qoption,cpp,--new_for_init -TP

Teamwork

We want to reuse as much code as possible.
Therefore we should do the following:

Ask!

You should ask on the mailing-list if there is already a piece of code that you can use before you start to code one yourself.

See how to steal code.

Tell!

Send a short mail to the mailing-list once you have implemented a new function, a new feature, a new action/event, etc...


Someone might profit from this and can reuse your code / your feature later on.




Part 2 - For Non-Experts

Intro

These guidelines and tips are meant for all those who are relatively new in the department and do not yet have decades of coding experience in C/C++ (i.e. student assistants and graduate students)

These guidelines should help you to develop a good programming style. I also put some general programming tips together that (hopefully) help you to complete your program faster, to produce less bugs and to find bugs faster.

Now I hear some of you groaning: "Now I can't even program the way I want!", and: "Do I really have to read through all this?". I thought the same when I'd been given guidelines for the first time.

From my years of experience I can assure you that: yes, it has to be if you are working in a team. And even if you are not working in a team, some basic rules and a good programming style are useful because they help you. In any case you'll get on your colleagues bad side if you have a bad style! :-)

All in all I tried to have as few rules and restrictions in the guidelines as possible and only as much as necessary: obviously there are several good programming styles1), and everyone should and must develop his own style. But it is also obvious that there are much more bad than good styles ...

1) Mathematically speaking: on the set of programming styles, there is only a partial order, not a total one :)

Generally speaking, these guidelines may be broken if (and only if) the source code becomes more readable, more robust, or better to maintain.

By the way, it goes without saying that you wont write perfect code just after reading these guidelines. No master has yet fallen from the sky. As with man-pages you have to look in to the guidelines every once in a while. Even I am still working on my style :) .

These guidelines can only give the roughest advice; the subtleties are too many and too individually to be listed in a guideline. It is better if you run a "style-daemon" in your head that checks your style while you areprogramming and that constantly expands it's database. ;-) .

The second half of these guidelines contains some tips and tricks for Unix, C and other stuff that you need in daily life as a programmer or that you can make use of.

Good points, bad points

This section is from "C++ Coding Standard", but I simply copy it here because it summarizes quite well what guidelines are good for:

Good Points

When a project tries to adhere to common standards a few good things happen:

Bad Points

Now the bad:

Coding-Style

Syntactic and "semantic" practices are a part of this. I claim that it is not possible to write good code without a good coding-style (in terms of robustness, maintainability, efficiency, elegance). I also claim that one can only program efficiently in the long term with a good programming style! (Because: bad style -> more bugs or bad design -> longer bug search or more redesigns -> more time in total.)

The n Commandments for Programmers

Frame these and put them next to your bathroom mirror!
:-)

I never want to hear: "I know that I should do X, but I'll do it later once everything works"
Believe me: You will not do it later.
Only elegant code is good code.
Comment! (It's basically in the code, but no one likes reverse engineering!)
If you have to break the rules, comment why (and not that you broke them).
Choose the names of your functions, variables and methods carefully!
Choose descriptive names ("labeling names") and a consistent naming convention.
Look for clear indentation and spacing!
Always ask yourself when writing "what if ..."! (Complete case analysis)
Never write code with side effects!
But if you must do so, comment them detailed and highly visible!
Learn to use your tools to their full extent.

The Python Way

Here is a small list of programming rules I've picked up from the net.

  1. Beautiful is better than ugly.
  2. Explicit is better than implicit.
  3. Simple is better than complex.
    (Aus Big Ball of Mud: "A complex architecture may be an accurate reflection of our immature understanding of a complex system or problem.")
  4. Complex is better than complicated.
  5. Flat is better than nested.
  6. Sparse is better than dense.
  7. Readability counts.
  8. Special cases aren't special enough to break the rules.
  9. Although practicality beats purity.
  10. Errors should never pass silently.
  11. Unless explicitly silenced.
  12. In the face of ambiguity, refuse the temptation to guess.
  13. There should be one -- and preferably only one -- obvious way to do it.
  14. Now is better than never.
  15. Although never is often better than right now.
  16. If the implementation is hard to explain, it's a bad idea.
  17. If the implementation is easy to explain, it may be a good idea.
  18. Namespaces are one honking great idea --- let's do more of those!

Naming Conventions

Intro

It may seem absurd or annoying to some of you that we set such high value to namesa). But the naming of variables, functions, methods and classes is actually the most important criterion for a good programmer (and a good design)! This is even more important in object-oriented-programming than in pure C. Bad naming can make a library almost unusable.

a) And yet Goethe said: "Name is but sound and smoke.". (Martha's Garden)

When choosing a name for a class, an object, a variable or a type you should think about what another programmer, who sees your code for the first time and knows nothing about it, can discern from that name. It is best if he can get the meaning from the name. Longer names are usually better for the understanding then short ones (but they can annoy the users of your code if they are too long :)) For example ParameterUnavailException is much more understandable than parmunavlex.

Names are a good indication of a flawed object-oriented design: if they get too long, make no sense from a global perspective or if every function is called doIt, make and thing then it's hight time to check the design! If class names consist of more than 3 words, then that is a sign that you are mixing up several entities of your system.

Meta-Naming (C++)

The terms member function etc. that Stroustrup introduced are preposterous! They are called "class methods" (static member functions) and "instance methods" (non-static member functions). Analogue for variables.

C and C++

Functions/methods usually do something, so their names should be composed of a verb + noun (inCaps-notation). Here is an example of our convention: calcDistance. (Other conventions are: SetParam, or create_stripe. I personally think the underscore-notation is not so nice.)

If a function returns a property, then it is better to compose the name of "is" or "has" + adjective; e.g. isFlat or hasColor.

Sometimes a suffix does help, e.g. Max, Cnt, Key, Node, Action, etc.

Use the common conventions for temporary variables, i.e. i, j, k, etc., for integers (especially loop-variables and indices), s, s1 for string-variables, ch, ch1 for chars, etc.

If several methods/functions in the same module/class have similar parameters with similar meaning then these names should have the same (or at least a similar) name. This is especially true for overloaded functions.

Names that are used for conditional compiling should be "all caps" (e.g. #ifdef DEBUG_ON).

The names of enum-types should indicate that it is such a type. They should therefore end with an E, e.g.: renVisibilityE. The names of the "members" of an enum are composed like a define:

typedef enum                 // Comment about my great enum
{
    XYZ_RESULT_MIN,          // invalid value (parameter check)
    XYZ_RESULT_SENSIBLE,     // blub blub
    XYZ_RESULT_SILLY,        // bla bla
    XYZ_RESULT_STONED,       // lall
    XYZ_RESULT_MAX           // invalid value (parameter check)
} xyzResultE;

Enums that are in a class do not need a prefix. The member-names should indicate to which enum they belong.

See the "enum-problem in C++".

Struct-, union- or pointer-types do not need a labelling because the type is clear from the context. If you want to you can consider suffixes analogue to the enum-convention. Some possibilities are: renViewpointS, or renViewpoint_s for struct-Types; objPolyhedronP for pointer. Other conventions are possible; I think the convention "cap-suffix" is the best (and the fastest to type :)).

Furthermore I'd like it if you could think of conventions that show the semantic meaning of a variable/object in it's name; e.g. begin all vectors with a v, end all matrix-names with mat, begin all exception-objects with ex, etc.

C

It is the same as above for C++; although functions must have a prefix (2-4 letters) which represents the module in which they are defined: prefix + verb + noun. E.g.: plhCalcDistance, INTOsetButtonParam, or pf_create_stripe. The same is true for functions that return a property: prefix + "Is" + adjective; e.g. plhIsFlat.

C++

Classes use the same naming conventions as functions with the exception that class names begin with a capital letter. It is not necessary to add a capital C as an additional prefix or suffix to class names (redundant). Class-variables/-functions and instance-variables/-functions are composed with the same naming conventions.

Functions that are used to set instance-variables should begin with set. Functions that return the value of an instance-variable should have the same name as the instance-variable (or begin with get). The variable itself begins with an underscore.

If several classes together form a library, then it can be very useful if the class names have a prefix (e.g. pf for Performer). It is not necessary to add the prefix to the function or variable-names of those classes. The files of a class have the same name as the class (e.g. the file matrix.cc contains the class libMatrix).

All functions begin with a lower-case letter. If it is not too much of a change for you than use the inCapsNotation (i.e. getBla or setNewWonderfulBlub).

Avoid redundancy in the names. In the following line:

myWindow->setWindowVisibility( libWindow::WINDOW_VISIBLE);

we had to type 4× "window" and 2× "visible". This is just as good and understandable:

myWindow->setVisibility( true );

(Everyone knows that you can only pass boolean values because of the name. Therefore 1 is ok as a parameter in this case.)

The Enum-Problem in C++

In C, it was easy to use enums where several options are ORed and passed as a parameter. E.g.:

typedef enum                // renderer options
{
    renWithWindow,          // create window
    renStereo,              // stereo window
    renWindowDecorations    // windows has decorations
} renOptionsE;
void renderInit( renOptionsE options );

which you could invoke like this:

renderInit( renWithWindow | renStereo );

In C++ this does not work any more. I therefore propose the following "detour" (Alex' idea):

typedef enum
{
    ...
};
typedef int myEnumE;
void foo( myEnumE options );

Unfortunately you can't tell the usual documentation extraction tools that they should document the unnamed enum under the different name myEnumE!

(The only tool that I know of, and that might be able to do it, is Perceps.)


All the other variants to solve the enum-problem can't be processed "transparently" by the documentation tools either.

Commentaries

Generally speaking, as little comments as possible, as much comments as necessary.

On the one hand comments do help to organize your thoughts (and therefore to program better); on the other hand it helps you if you have to change something in the code after a year (or if someone else has to do it) --- don't tell me that you can remember all of it, or that everything is self explanatory! :) .

There are four types of comments:

  1. An overview of the "big picture" at the beginning of a class, the functions of the file (the class), used "compile flags" (conditional compilation via ifdef).
  2. Before each function a description what it does, it's in- and output-parameters, pre- and post-conditions, side-effects, caveats, bugs, etc. (see the assert-macro for pre- and post-conditions)
  3. In the code itself for larger blocks of lines.
  4. For variables (all global or class variables and some local), members of struct-typedefs and the like.

Comments should have a consistent form in the entire module (see the commentary-template). Comments for functions should be made with the block from the template, so that they can be extracted by an automated tool. You can write comments in German or English, whatever you are more comfortable with.

The comment must make the meaning of the parameters clear, what class-variables (or static variables) are used (as few as possible), what is returned, the conditions that must be met by the caller. A description of the function has to be included of course.

If some parameters are return-parameters, then this has to be made clear without ambiguity! (Width in the example.)

Here is an example for the comments of a function:

/**  Do something (single line)
 *
 * @param param1    blubber (in)
 * @param param2    bla (out)
 *
 * @return
 *   Description of the return values, e.g.
 *   -1 if it failed, 0 if it's ok.
 *
 * Detailed, multi-line description, e.g.
 * This function calculates ...
 * Based on the algorithm of ...
 *
 * @throw Exception
 *   Detailed description of the exceptions and their conditions, e.g.
 *   XOutOfMemory, if no more memory is available.
 *   XCoffee, if the coffee is gone.
 *
 * @warning
 *   Description of input parameter values or conditions,
 *   that cause errors or crashes which are not intercepted.
 *   (Should only occur very rarely!)
 *
 * @pre
 *   Detailed description of the preconditions, e.g.
 *   It is expected that the init() function has been called.
 *   Param1 has to be calculated by the function blub().
 *
 * @sideeffects
 *   Side effects, global variables that are modified, .., e.g.
 *   @arg The global variable @c M_Interest will be modified.
 *
 * @todo
 *   Make faster.
 *
 * @bug
 *   Produces a core dump, if it is @a param1 = 0.0 .
 *
 * @internal
 *
 * @see
 *   Cross-references to other functions, e.g., because they calculate something similar,
 *   or because there is any relationship between the two.  Example:
 *   anotherFunction()
 *
 **/

In my experience, it's fastest if you write the comment to a function when it is "half" done, because then everything is still fresh. Once it is completely finished, you should look over it again, and see if the comment is still correct.

Sometimes it is helpful if you write the comment partially even before you start coding! E.g. a few lines about what the function should do, and a list of parameters can help a lot to order ones thoughts.

He who writes a complete module without any comments --- "I write the comment once everything is finished" ---, will never write the comment at all. (Because it is just too much at once, and because the subtleties like caveats are no longer in memory.)

It is important that you get an overview of the function and its "workings" by skimming through the comments in its body. An in-line comment should only tell what happens in the code block, not how it happens (e.g.: "calculate mean" is better than "sum up and divide by n").

If there are important conditions that must be met, or loop invariants, then it makes sense to record them in a comment so that are not violated by mistake if you (or maybe someone else!) is modifying the code later on. One example is at the top, here is another one:

// do the following *after* ... !

Here is an example of a bad Comment:

a = malloc( 100 * sizeof(int) );    // gimme more memory
x = glob( ... );                    // do file completion
// now sort the elements
qsort( e, n, sizeof(elemT), compfunc );

Do not comment newly found library functions: everyone can look into the man pages themselves.

Comments of variables and types could be formatted like this:

#define MAX_REC_DEPTH 1000                // max depth of a boxtree
static int RecursionDepth = 0;            // used in bxtConstructGraph

typedef struct                            // General comment for the struct
{
		vmmPointP x, y;                       // Comment for the individual members
    int a,                                // Comment ....
        b;                                // .. for the individual members
} MyStructS;

Structure and File-Layout

A template for C and C++ files can be found in the CVS in internal/Templates/class.{cpp,h}. (Replace CLASSNAME with the name of the class, or, better yet, let the editor do it while generating the files.)

The .cpp-files do not contain any CVS-keywords except an id-keyword. This is meant for the product version and will only be expanded for that version (to identify the separate versions of which the system is made up of).

That means, that everyone has to put the following lines in their ~/.cvsrc:

status -v
update -P -ko
add -ko
checkout -P -ko
diff -ko -b -B -d
cvs -z 9
edit -a none
tag -c

Thus, unnecessary "diffs", that arise only due to various expansions of the CVS keywords, are avoided. (Für die Produktversion muß dann cvs mit der Option -r aufgerufen werden.)

(Besides, everyone should have this file ~/.cvsignore in their home.)

Use 4-Space-Indentation!
Lies of code should not be longer than 80 characters. (There are exceptions.)

There should be be only one statement per line2) (there are legitimate exceptions).

2) A psychological evaluation has shown that programmers think in rows, i.e. the smallest unit that programmers read and understand code are single lines.

Every C-file includes stdlib and stdio (unless this has already been done in a "global" defs.h).

Look for a "nice" formatting of the function prototypes, the declaration block of local variables, etc. Your colleagues will wrinkle their noses if it looks like shit.

sample_icon Here you can find an annotated example of how source code should look like.

Layout inside a Class

Classes have several sections, analogous to C files, which should be ordered as following:

  1. public constants, public types
  2. public variables (if permitted)
  3. public functions
  4. protected stuff (same order)
  5. private "parts" :-) (same order)

Header-Files

The structure of header files is similar to that of C-files.

The "content" of header files has to be protected against multiple inclusion with ifndef like it has already been done in the template.h. (The pragma-line does the same as the ifndef-brace and is more efficient at compilation but it is note available on all platforms.)

The name of a header-file is the same as the corresponding C-file (i.e. Foo.h and Foo.cpp). One should avoid names that are already assigned in /usr/include for standard header files such as math.h or gl.h).

Pure C

Only that, what a user of the lib or the object-file really has to know belongs in a header file -- everything else goes in the corresponding C-files or in "internal" header files which do not get included in the "user-header-file". (This can be non-trivial for big projects! :) ) Normal applications usually do not have extra header files!
This is not so easy/elegant in C++ because unfortunately you also have to declare the private-methods in the header file.

Make C-header-files compatible with C and C++. That means that you have to encase C-header-files with an extern "C" {}:

#ifdef __cplusplus
extern "C" {
#endif

....

#ifdef __cplusplus
}
#endif

So they can be included both in C files and C++ files.

Indentation and Spaces

We use the following convention:

for ( ... )
{
    if ( .. )
    {
        ..
    }
    else
    {
        ...
    }
}

Thus, the relationship between opening and closing brackets can be seen most easily.

The K&R-style is forbidden because it is difficult to read (The goal of this style is to make the code as "tight" as possible):

for ( ... ) {
    if ( .. ) {
        ..
    } else {
        ...
    }
} else {
    ...

It's not a must, but it's nice if variables and comments are tabulated so that they begin in the same column:

int              x, y;             // dominant coord planes of polygon
int              xturns, yturns;   // # turns of xslope, yslope
objPolyhedronP   p1, p2;
int              i;

is much nicer than

int x, y;    // dominant coord planes of polygon
int xturns, yturns;    // # turns of xslope, yslope
objPolyhedronP p1, p2;
int i;

At least the comments should be aligned the same (unless they wont fit in the line otherwise).

Spaces within a line are just as important:

for(i=obj->begin();i<obj->l()&&obj->M(i)!=-1;i++){
     obj->M(i)=-1;
}

ismuchworsetoreadthan

for( i = obj->begin(); i < obj->l() && obj->f(i) != -1; i++ )
{
     obj->f(i) = -1;
}

Maintainability

Maintenance in general consists of slight modifications to the source. Those who make this maintenance, are almost never the ones who originally wrote the code (for various reasons). Even if the person who modifies the code is the original author: if you're not constantly looking at the code for a year then you're like to forget the details or even the "big picture".

This applies to modifications of the code by the author himself a few months after the code has been created (best case), as well as modifications 1-2 years later by someone who has no idea of the "big picture" (worst case).

You can't establish many specific rules to make code easily maintainable -- you have to develop a sense of what constructs in the code are difficult to maintain. Though you can heed the following general tips:

  1. If a function can only be implemented through the public-interface of a class than this function should not be a member function! This increases the encapsulation. (See [12])
  2. With comments, you can give notes to others.
    • In the comment before the function: "Assumptions" and "Caution". These notes are filled out while writing the function/method! Otherwise after only one week not even the author knows what has to be considered when calling or even changing the function!
    • In-line-comment in the functions body for conditions, e.g.: /* now nelems = 2 */:
      or /* MUST be done before ... because ... */
      So in one year, if you have to change something, you still know that this condition has to be valid in the entire function. It it also for your own control to see if you considered whether this condition exists for the rest of the function! (Who remembers wp- or Hoare logic?)
    • If a particular section of code is very sensitive to changes you should comment that.
      If making changes in a piece of code make changes in another piece of code necessary you should comment that (in the former!).
  3. Factoring: If you have two functions
    foo( a, b )
    {
        ....
    }
    bar( x, y, z )
    {
        ,,,     // additional code
        ...     // same code as in foo
        ,,,     // additional code
    }
        
    than you have to transform bar into:
    bar( a, b, c )
    {
        ...
        foo( a, b )
        ...
    }
        
    Even if you discover such a possibility for factoring later on!

    Candidates that almost ever need factoring are multiple constructors of a class, increment-/decrement-operators, the copy-constructor and the assignment operator, the == operator and !=, etc.

    Another example: This is really bad

    if ( ... )
    {
        blabla
        ...
    }
    else
    {
        same blabla as before
        different code ...
    }
        

    This can be summed up very poorly. And if you have to change the code blabla in one year it is easy to forget one of the two branches! And then you'll be searching a bug for hours if you're lucky and it'll occur right away and not after 2 month when you've forgotten that you changed anything at all...

  4. Avoid code that can be "misunderstood" later on; e.g. assignments in conditions:
    if ( a = b )
        
    this will definitely be "repaired" by someone who is searching a bug in the function so that it will look like this
    if ( a == b )
        

    Or: does this code really mean

    for ( c = s; c < ...; ... )
    c = f(...);
        
    a for-loop without a body (in that case the semicolon got neglected) or is it just badly indented? But if you write
    for ( c = s; c < ...; ... ) {};
    c = f(...)
        
    or (depending on what was meant)
    for ( c = s; c < ...; ... )
        c = f(...);
        
    then it is clear.

Readability

The primary objective in this context is the good readability of the source code for others! (Especially for me.) He who programmes with the motto, "it was hard to code, so it should be hard to read", is just acting uncooperatively.

To achieve good readability you also need a good structure of the entire file, (see Structure and File-Layout), reasonable modularisation, structure in single lines, and also comments (see Commentaries)

Good and Bad Programming

Or: "It's not a feature - it's a bug."

All bugs listed here did actually happen! Most of them took hours to locate.

Conclusion: if you think about your code for two minutes after a programming session to see if you have covered all cases than you can save hours of frustrating bug-searching later on! "What happens if that pointer is NULL?", "What happens if that string is longer than 100 characters after all because e.g. a file-path is very long?", "What happens if this number of ... is 0?", "What happens if the graphical object changes? If it moves? If it changes its form? Or its colour?, "What happens if the object does not reside at the root of the scene graph?".

I even know of cases of extremely bad code (bugs, bad modularity, horrible modifiability, etc.) that took three people each(!) one week to maintain, adjust and debug. The code was (unfortunately) hacked/copied together in just one week. -- If one had invested one more week to implement it correctly than a total of two man-weeks could have been saved.

C++

Inheritance

Before you declare one class B as a subclass of A, you should ask yourself if they really have the relation "B is an A" or rather the relation

Especially multiple inheritance is often mistakenly used when an object actually contains pointers to several other objects (multiple "uses" relationship)!

Constructors and Destructors

Always write a virtual destructor, even if the class does not need one! (Problem: delete on pointer to baseclass.) The only exception are very small classes(in terms of memory), and when the extra memory for the vtable is unacceptable. In such a case it must be noted in the class comment!

If a class does not have/need a constructor than write a default-constructor as private without an implementation in the C-file (With the comment not implemented). This prevents the compiler from creating one that might be faulty. Always declare a copy-constructor and an allocation-operator. If the class does not need them, make them private without an implementation.
The problem with C++ is, that you can't see from the code when the copy-constructor and the assignment-operator are called. See this example.

Constructors cannot return error codes. There are two solutions for that:

  1. Constructors may only contain code that is guaranteed not to fail.
    Always use an init-function instead to do the "real" initialization of an object (which may fail):
            Class *o = new Class();
            if ( o->init() < 0 )
            {
                error ...
            }
            
    Problem: Exceptions can still be thrown (e.g. new can fail).
  2. Use exceptions.

Always initialise all instance-variables in the constructor. Do not use global variables n the constructor.

Do not call virtual methods in a constructor.

Use, if possible, initialisation instead of assignment. By using assignments in the constructor many temporary instances might be created - which leads to bad performance. Basetypes (int, float, etc.) can be initialized in the constructor via assignment.
Here is an expensive example with assignments:

class String
{
    public:

    String(void);                           // make 0-length string
    String( const char *s);                 // copy constructor
    String& operator=( const String &s );

    private:

    ...
}

class Name
{
    public:

    Name( const char *t )
    { s = t; }

    private:

    String s;
}

void main( void )
{
    // how expensive is the following ??
    Name neighbor = "Joe";
}

The following happens:

  1. Name::Name is called with parameter "Joe"
  2. neighbor.s is constructed with the default constructor String::String. This creates a 1-Byte long block of memory for the symbol '\0'.
  3. A temporary String "Joe" is created as a copy of the parameter t by using the copy-constructor (another malloc).
  4. The assignment is executed (via the =-operators).
    To this end the old String in s gets deleted, a new one via new created and then a strcpy is executed.
  5. The temporary String gets deleted (delete/free).
Totally: 3 news, 2 strcpys and 2 deletes.

And here the better alternative with initialisation in the constructor. The only difference to the above code is the constructor of Name:

Name::Name( const char *t ) :
   s(t)
{}
  1. Name::Name gets called with the parameter "Joe"
  2. s gets initialized by t via String::String( const char *)
  3. String::String("Joe") does a new and a strcpy.
Totally: 1 new, 1 strcpy. (no temporary objects, delete!)

If you don't make constructors with exactly one parameter (conversion constructors) explicit than they might be used at some "invisible" places, e.g.:

class A
{
    A(float);
}

void foo(A a) { ..  }

foo( 1.0 );     // the compiler converts 1.0 automatically to an A!

Sometimes this can be desirably, but generally it's hard to find performance problems in such code (which is especially important in computer vision).

Casts

Use the new casts:

Self defined cast-operators can have very shady effects (like 1-parameter-constructors).
Example:

class A
{
public:
    A() { .. };
    explicit A( char* ) { .. };
    ~A () {};
};

class B
{
public:
    B() { .. };
    ~B () {};
    operator char *() { .. };
};

void foo(void)
{
    B b;
    A a(b);         // this works because b can be casted to char* !
}

Thus: use it sparsely.

Exceptions

Pay attention to "exception-safety": If an exception is thrown the object must still be consistent and may not leak resources (like memory) and the program has to be in a "reasonable" state so that the execution can be continued.
That means that the programmer has to consider for every line of code that an exception can be thrown.

Do not throw exceptions in a destructor!!

Derive all exception-classes from std::exception (#include<stdexcept>). It might make sense to use one of the subclasses of exception and derive from that (logic_error, runtime_error, domain_error, invalid_argument, length_error, out_of_range, bad_cast, bad_typeid, range_error, overflow_error, bad_alloc).

Catch by reference (catch (XClass &x)), never catch by value (catch (XClass x)). (Reason: the exception that comes in might be a subclass.) Or simply catch(XClass).

If possible make the try-block big.

C-style callbacks (e.g. callbacks to C-libraries) should always be declared as "no-throw":

    void myCallback( ) throw ()

Obey the idiom "Resource Allocation is Initialization". Avoid new in the constructor or nest it in try (because the destructor will not be called if an exception is thrown). Maybe use auto_ptr from the stdlib (they provide a simple mechanism to automatically free memory). Maybe use "strong pointers" (see the Official Resource Management Page).

Always perform unmanaged resource acquisition in the constructor body, never in initializer lists. In other words, either use "resource acquisition is initialization" (thereby avoiding unmanaged resources entirely) or else perform the resource acquisition in the constructor body.
For example, say T was char and t_ was a plain old char* that was new[]'d in the initializer-list; then in the handler there would be no way to delete[] it. The fix would be to instead either wrap the dynamically allocated memory resource (e.g., change char* to string) or new[] it in the constructor body where it can be safely cleaned up using a local try-block or otherwise.

Do not use exceptions if it makes the code more complicated. A normal return-code-scheme is probably better in that case.

Do not declare an exception-specification (throw( X, Y )); document the potential exceptions in the functions comment instead.
Reason:

Also see Exception-specification rationale of the Boost-Library.

Methods, Functions and Operators

If a method is declared as virtual somewhere in the inheritance hierarchy then it should be declared virtual in the entire hierarchy.
Therefore it is better documented that a method can be overloaded or rather that the correspondent class in the superclass is actually virtual. While the standard does say "once virtual, always virtual", it is "better to be explicit than implicit".

Inline-methods can be: access- (to instance-variables) and forwarding-methods (that do nothing but calling another method). The inline-declaration is not really necessary any more because the compiler does that by itself as long as optimization is enabled (see optimizations). Inline-Methoden können sein: Zugriffs- (auf Instanzvariablen) und Forwarding-Methoden (die nichts tun außer eine andere Methode aufrufen). Die inline-Deklaration ist heutzutage aber kaum noch nötig, da der Compiler bei eingeschalteter Optimierung das von alleine macht (s.a. Optimierungen).
Warning: the following functions should never be inline!

A method, that should not change the instance by design, should be declared with const. This prevents that code is inserted later on by mistake that changes something. This is also the only kind of method that can be called for const-instances.

Warning: the default assignment-operator creates only a "shallow copy"!

The assignment-operator shall return void. (Reason: something like if ( a = b ) can never happen then.)

Use operator-overloading seldom and consistently. An operator should always have the same "meaning". (The is also true for function-overloading.) Every user expects that, e.g. the ++-operator "increases" some internal state and that the *-operator is an algorithmic multiplication.
Always implement the semantic opposite of an operator too. If there is the operator == then everyone expects that there is also != and if there is ++ then -- should also be there (sometimes this is not possible, e.g. an iterator over a singly-linked list).
If there is the operator < then there should also be >, <= and >=. If there is +, then -, += and -= should also be there.
These "balanced" operators are good candidates for factoring!

Never return a pointer to an instance-variable! If it really has to be than only as a const pointer or reference!

Avoid call-by-value-passing of objects as arguments of a function.

Pointer or Reference?

The Problem: When looking at function-parameters that are declared as reference you cannot see that they are not call-by-value5)! If you want to use references as formal parameters in functions then only as const type &parameter!
Reason: If you agree upon the convention that a pointer-parameter can be changed and a reference-parameter cannot than you can just declare it const so that the compiler also "knows" about this convention (and can therefore better optimize).

5) In my humble opinion, references are not a good "improvement" in C++!

Another reason why references should always be declared as const which should also convince pragmatics: temporary objects are const in principle. So you cannot write something like this if a formal parameter is a not-const reference:

foo( A() );

Misc

Instance- or class-variables should never be public. Use get- and set-functions instead. (Otherwise "data hiding" is compromised.)
(Exception: const public variables. These can only be set in the initialisation-list of the constructors.)

Offset pointer to members have to be justified very well! They usually are a sign that something is wrong with the design (e.g. wrong identification of which objects are best suited for the problem or wrong distribution of functionality).

Do not use temporary objects in function calls. Unless you really know when they will be deleted (do you know it?)

// Ugly!!
setColor( &(Color(black)) );

// So ist's schoen
Color color(black);
setColor( &color );

Initialising instances via assignment is forbidden!
Instead of

A a = A();      // forbidden

write this

A a;
a = A();        // ok (although unnecessary)

if it has to be.
Reason: the first variant shows different behaviour on different compilers and can lead to the destructor being called once more than the constructor (SGI's compiler-bug).

Floating-Point Arithmetic and Round-Off Errors

All variables in these examples are floats.

wrong:

ca = Dotprod(v1, v2) / (Len(v1) * Len(v2));
sa = sqrtf( 1 - ca*ca );

right:

h = vmmLen(v1) * vmmLen(v2);
if ( h < epsilon )
    /* fallback stuff */
else
{
    ca = Dotprod(v1, v2) / h;
    if ( ca >= 1.0 )
        ca = 1.0;
    if ( ca <= -1.0 )
        ca = -1.0;
    sa = sqrtf( 1 - ca*ca );
}

totally wrong:

if ( x == a )
    ...

still wrong:

if ( x < a )
    ...
else
if ( x > a )
    ...
else
    ...

better:

if ( x > a-epsilon && x < a+epsilon )
    ...

right:

#include <math.h>
if ( fabs(x - a) <= epsilon * fabs(a) )

Hard-Coded Paths

Very lousy:

file = fopen("/igd/a4/home/mies/bla", "r");       //hard-coded file-name!
fscanf(file, ...);                                // can result in core-dump!

just slightly better:

#define BlaFile "/igd/a4/home/mies/bla"
file = fopen(BlaFile, "r");                       // still hard-coded!
if ( ! file )
{
   fprintf(stderr, "couldn't open ...");
   exit(1);                                       // exit is always bad!
                                                  // it should always - as meaningful
                                                  // as possible - continue

a bit better:

file = fopen( getenv("BLAFILE"), "r" );           // can core-dump again!
if ( ! file )
{
   fprintf(stderr, "couldn't open ...");
   ...

the best:

blafileenv = getenv("BLAFILE");
if ( ! blafileenv )
{
    fprintf(stderr, "env.var BLAFILE not set - using default %s\n",
            BLAFILEDEFAULT );
    blafileenv = BLAFILEDEFAULT;
}
file = fopen( blafileenv, "r" );
if ( ! file )
{
    perror("open");
    fprintf(stderr, "couldn't open %s!\n",
            blafileenv );
    ......                                        // set some meaningful
    return;                                       // default-values here
}
fscanf(file, ...);

At system(): As I said earlier, there are always exceptions. The system call is one of them --- here you have to use hard-coded paths! The problem: otherwise you depend on the users environment (PATH).

Example: you want to start a remote shell via system. Wrong is:

system( "rsh machine ..." );

because the command rsh might not be in the users PATH, and if it is, then the restricted-shell might be first in the PATH, and not the remote shell!

Therefore: always specify the full path of the command when using system (specify with #define at the beginning of the program). It's best to test beforehand with stat if the command really exists. So as an example:

#define RSH_PROG "/usr/bsd/rsh"
...
err = stat( RSH_PROG, &statbuf );          // on some Unixes rsh is not
if ( err )                                 // where you expect it!
    ...
err = system( RSH_PROG " machine ..." );

Arrays

Hard Array-Sizes

Consider: "Hard" array-sizes have created many security-holes in Unix (this is the class of "buffer overflow security leaks")! (e.g. rsh with 100k argument-string, or >1000 telnet connections per sec.)

Array-Indexing

In C the index of an array begins with 0! It never begins with 1 (although it is done like that in Fortran). If you do it anyway you will confuse everyone else and will definitely produce an off-by-one bug directly or indirectly.

Magic numbers

Numeric constants are often called "magic numbers" They make the code at least unmaintainable and incomprehensible and will also cause one or the other bug.

Very wrong:

void foo( int bla )
{
    if ( bla == 1 )
        ..
    else
    if ( bla == 2 )
        ..

Problem: You never know who else will call foo()! What happens if you have to change the meaning of bla at some point?

Better:

typedef enum
{
    Case1, Case2, ...
} FooCasesE;

void foo( FooCasesE bla )
{
    ...
}

If you want to OR several cases then you have to use const int (at least in C++).

Macros

Thanks to the automatic inlining of modern compilers macros have become mostly obsolete. Debuggins macros is also very tedious, inlining of functions on the other hand can be disabled. Use macros only if it not possible to do with a function.

You have to be careful with macros, both when using them and also when defining them! Because: macros and their parameters can have side effects! You should write macros so that they follow the principle of least surprise. For this reason we have a naming convention (all-caps) for macros that reveals them as such.

Multiple assessment of arguments: If foo() is a macro then you should never pass arguments that have side effects, e.g.

foo( i++ )

is strictly forbidden!


If the macro gets expanded to:

if ( arg < Max )
    x = arg;
?!

Even worse things can happen in such a case if the argument is a function:

foo( bar(x) )

How often is bar(x) called?! What if the macro foo does also appear in the recursive function bar() itself?!

And to add insult to injury: the program gets horribly slooooowww even if no bug occurs because bar() gets called way too often --- and you can't even detect that any more!!

Variables in macros must definitely be chosen so that they can never have the same name as actual variables. Such a bug can also not be found any more! (I.a. the compiler will not even issue a warning!) Therefore you should always use upper-case for macro-variables; It's best if you use doubled upper-case letters or something similar.

When defining macros you have to take all possible cases and contexts of how the macro can be used into account. Two typical errors are:

You should just as well try to define macros in a way that arguments are used just once (which obviously not always works). E.g. instead of this

#define blub( X )                \
    bla = X;                     \
    blub = malloc( X * ... );
you can write
#define blub( X )                \
    bla = X;                     \
    blub = malloc( bla * ... );

Assumptions

Never rely on functions to behave like you think they do unless it written explicitly in the man-page! Some examples:

Input-Parameters

A lot of bugs are occur because parameters are not checked for validity and plausibility.7)

7) A quote from the net: An ounce of prevention is worth a ton of code. (Anonymus).

Functions that are not called more than about 100x per frame should always check if their parameters are in a valid range of values! This can easily double the amount of code --- but: it is worth it!

"Can't happen"-Cases

Custom Functions. If you use your own functions there will be lots of cases where certain parameter-combinations could happen according to the code but which you know never to happen because the function is only called with specific parameters.

But trust an experienced (and sorely afflicted) programmer: it will happen! (Assuming your code exceeds a certain "threshold" of about 5.000 lines.)

Therefore: everyswitch and most if's need a default: for the "can't happen"-case! This must at least ensure that the program provides a noticeable error message and continues without a core dump.

System calls. Even system calls (i.e. malloc() or open or fork/sproc) can go wrong! Even when it can't happen. (E.g. it can always happen that the memory or the i-node table is full.)

Therefore an fopen always looks like this:

f = open("bla", "r")
if ( f < 0 )
{
    perror("open");
    fprintf(stderr, "module: Failed to open file ...");
    do something sensible instead
}
und jeder malloc so:
m = malloc( n * sizeof(type) );
if ( ! m )
{
    fprintf(stderr, "module: malloc failed!\n");
    ...      // do something sensible instead
{

Of course you could write wrapper-macros for that. My experience however is that they often look bad in the code and you only save some type-work that a good editor can reduce anyway.

Misc

Never use the same file names multiple times in a software-system! Neither for header-files nor for C-files.

The assert-macro (see man assert) can help to increase the maintainability and also helps to find bugs faster (even if you weren't looking for one at that position).
In addition the assert-macro explicitly highlights conditions in the code, e.g. loop-invariants or pre- and postconditions.
Warning: Warning: make sure that this macro is not enabled in the product version! (-DNDEBUG)

Do not use paths when including (e.g. #include<../mydefs.h>)! Use the -I-option of the compiler instead (then you can re-organise the libraries later on much easier without having to change all source files).
Use #include <...> for standard-header-files (usually in /usr/include) and #include "..." for all the others (small speed up when compiling).

The header-file should always be included in the C-file, in which the corresponding functions or variables are actually defined. Then, the compiler can check that the declaration still matches the definition.

Use isascii you use one of the other ctype.h-macros. E.g.

if ( isascii(*c) && isdigit(*c) )

scanf can stop before it has scanned all parameters. Check the return-value!

Use a malloc-wrapper of the Form

#define xmalloc( PTR, SIZE, ACTION )            \
{                                               \
    PTR = malloc( SIZE );                       \
    if ( ! PTR )                                \
        ACTION;                                 \
}

This forces one to actually give some thought to the case that there is no free space available.

If you use the fall-through feature of a case-statement then you have to comment on that! The default-statement of a case must always be present. Avoid embedded statements. ++ and -- also count.
Although it can can make the code more readable sometimes, i.e.

while ( (c = getchar()) != EOF )
{
process the character
}

Beginner-Bugs

Every beginner makes the following bugs --- so don't worry if they happen to you too :) (they happened to me too):

Minor Issues

If you have to typedef structs in the "front" and "back", you should choose the same name:

typedef struct blubT
{
    ...
} blubT;

Avoid excessive "typedef"-eritis! It makes no sense to introduce a type intReturnType or myFloatT or typedef int bool or uint!

Unary operators are usally written without spaces, binary operators (except "." and "->") have spaces on the left and right. For complex expressions, you have to decide on a case by case basis.

If a for-loop contains long sections, then you should write each of them on their own line, e.g.:

for ( i = 0;
      i < plhGetNFaces(o)*2 + plhGetNPoints(o);
      i += n/2 + (empty ? 1 : 2)
    )

Usage of break and continue in the same loop should be avoided.

Write ANSI-C! (complete prototypes)

RTTI is allowed (it costs no performance any more). But don't use it instead of virtual methods.

Optimizations

Generally speaking, first optimize the algorithm and not the implementation with a few "tricks"! The compiler knows the CPU way better than you do.

Before you "tune" (optimize) the implementation you should ask yourself if the implementation is already sophisticated enough that this makes sense!9)

9) "Premature Optimization is the Root of All Evil" -- Donald E. Knuth.

If you want to optimize, then only after a profiling! You'll be amazed where time gets lost.

First, let the compiler optimize. This is done with the following compile-/link-options:

  1. cc -n32 -O ...
    
  2. For C++-code Inlining can get you a bit of speed, if you have lots of small get- and set-functions. You do not have to write source in the header-file! This is done with the right compiler options:
    cc -n32 O -INLINE:=ON
    
    turns on inlining for individual files, i.e. functions are inlined within this file.

    For C-code this only works, if you know that you have small(!) functions that are called a few 1000 times.

    Inlining across multiple files is done like this

    cc -O -IPA:inline=ON
        
    -IPA must be specified on the compile-line and on the link-line.

    If you want to know what's actually going on you do

    cc -O -INLINE:=ON:list=ON
        
    Then you get output on stderr what is being inlined.

    In general, my experience is: the compiler knows very well when it's worth it! If you still want to make sure that a particular function is inlined you do

    cc -O -INLINE:=ON:must=foo,bar
        
    (For C++ you have to specify the "mangled names" of course.)

    For inlining from libraries you can use -IPA if it is a .a-Lib (not .so) and if this library was also generated using -IPA). Otherwise you have to use -INLINE:library=. One should also set -IPA:plimit=192, otherwise the code gets too large (claimed someone in the newsgroup).

  3. The very severe compiler-options are:
    cc -n32 -O3 
    -OPT:alias=typed -OPT:fast_sqrt=ON:fast_exp=ON:IEEE_arithmetic=3 
    -OPT:ptr_opt=ON:Olimit=3000 -OPT:unroll_times_max=6 
    -LNO:opt=1:gather_scatter=2 
    -IPA:alias=ON:addressing=ON:aggr_cprop=ON 
    -IPA:inline=ON -INLINE:must=foo,bar
        

Who wants to know more about inlining and other compiler options for optimization does man 5 ipa or looks it up in the insight-book "MIPSpro Compiling and Performance Tuning Guide".

Examples of pseudo-optimization

while ( *i++ = *j++ ) ;

is not faster (even more slowly) then

while ( *j )
    *i = *j , i ++ , j ++;

(Even better in this case is strcpy or memcpy :))
Because: if you use side effects (*i++) you take away ways to optimize from the compiler! (E.g. by swapping assembler-lines.) Also strcpy() is coded very carefully in assembler.

With register short i; instead of int i; you just force the compiler to ignore its optimized register-allocation in order to follow your register command! (If it doesn't ignore it at all.)

Inlining a function only yields benefits if they consist of 1-2 lines! (In all other cases only the code size explodes.)

It is analogous with the factorization of functions: he who puts everything in a single function or converts every function into a macro10) should quickly look into CPU benchmarks! (You can look up how expensive a function call really is in there.)

10) OK, I admit that we did that in the Y too --- to our apology I have to say that the compilers couldn't optimise as well (no inlining) back then (1994) and that we simply didn't know how fast a function call really is!

One's own pointer arithmetic is usually not worth it:

for ( p = array + n - 1; p >= array; p -- )
{
    p->item = ...
    or
    *p = ...
}
is just as efficient as
for ( i = n-1; i >= 0; i -- )
    p[i] = ...
The second version is faster by a factor of 10 (Because the compiler has more freedom to optimize)!

It can be extremely embarrassing if a computer scientist does not know the sixth-form mathematics. It has happend that peaople have calculated the expression 1 + q + q2 + ... + qn in a loop! (Geometric series)

Clumsy Coding

Avoid FPEs (floating-point exceptions). While they are usually ignored they still cost time because the exception is still being generated and handled. FPEs can occur among other things by calculating with uninitialized variables or NaNs.

Clums coding can ruin even the most effective algorithm! An example:
An algorithm processes a String of length N and has a complexity of O(N*log(N)). One intermidiate step is the concatination of k substrings of a total length of N. Neat implementation:

char *substring[k];
char wholestring[N];
char *wholeend = wholestring;
char *charptr;
for ( i = 0; i < k; i ++ )
{
    charptr = substring[i];
    while ( *wholeend++ = *charptr++ );
}

here the cost is exactly a*N. Less good:

for ( i = 0; i < k; i ++ )
{
    strcpy( wholeend, substring[i] );
    wholeend += strlen( substring[k] );
}

here the cost is exactly a*2N. (Because every substring[i] is passed twice.)


Lousy:

for ( i = 0; i < k; i ++ )
    strcat( wholestring, substring[i] );

here the cost is exacly a*N2!

Another "bd" example:

length = sqrt( pow( point1[0] - point2[0], 2) +
               pow( point1[1] - point2[1], 2) +
               pow( point1[2] - point2[2], 2)   );

If this code is run often then the performance is down the drain! Except from that it is just very ugly to compute the square of a number with pow instead of x*x. This also shows that the programmer does not understand the system that his code should become a part of --- for every graphical system provides quite a lot of features for any possible vector-matrix arithmetic.

Use alloca(), if you need temporary memory that isn't needed any more after the end of the function. This is faster, the risk of memory leaks is smaller and it avoids memory clutter. Do not use alloca if you might need lots of memory because Unix will kill the program if there isn't enough memory left on the stack.

Nobody still programs a String-Copy, Quicksort, Hashtable, Lists, dynamic Array, etc.! There are good, efficient, established standard-libraries for that! (see RTFM) --- implementing it yourself takes too much time and is never faster then the standard functions because those are well written in Assembler and are tuned for speed.

Object-oriented Design

Some basic rules of object-oriented (oo) design (OOD) are described in this chapter. Most of them are language independent (you can even implement an OOD in Assembler).

Real Optimizations

Align arrays, whose size is in the same order of a cache-line, on the same memory-boundaries. (E.g.: one cache-line is 64 bytes long (Pentium 4), so you should align arrays of roughly that size also on 64-byte-boundaries.)

Use pre-increment instead of post-increment.
Reason: the compiler has to make copy of the object first (Copy-Ctor!), then call the method of the object and then throw that copy away. It is much harder for the compiler to detect that the first Copy-Ctor can be eliminated.
This is much easier with pre-increment.

General Guidelines

Here are some general guidelines for every module or library:

  1. Simplicity.
    Both the interface and the implementation have to be simple. The simplicity of the interface has priority over the simplicity of the implementation -- nevertheless a simple implementation is still important, especially for the maintainability.
  2. Correctness.
  3. Consistency.
    This criteria might be the one that is the worst to measured objectively. Correctness is still just as important as simplicity. It might be necessary to sacrifice a bit of simplicity to achieve correctness -- but never the other way around!
  4. Completeness.
    The design / the library / the module has to cover all kinds of situations and then some more. It is very hard to anticipate every one of them.
    If the simplicity would have to suffer extremely it is better to do without completeness.

It is also important to find the right correlation ("is-a", "uses", "is-like") between classes. It is wrong to think of inheritance first.

There is an interesting thought about completeness, simplicity and consistency in "Worse is better": sometimes it can be better to burden the caller with a bit of the consistency- or completeness-preservation. Namely then when it is much easier to achieve for the caller then it is for the implementation in the library.
The only problem is "just" to find the right balance!

Liskov's Substitution Principle

Let B be a subclass of A; a piece of code is given that contains objects of class A.
Then the code must still behave exactly the same if one replaces the objects of type A with objects of type B.
This must also be the case for all other subclasses B' of A.

It is the idea that a user of the subclasses of A can always expect the same behaviour when using only features from Class A -- there should also be similar behaviour otherwise.

Open/Closed Principle

This principle demands that classes are both open and closed in the following sense:

This principle aims for stability. A class shouldn't be changed any more once it has been judged to be good by reviews, tests and practice.

However it should be designed in a way that it is extendible in case that additional features are needed.

Class or Algorithm?

Some people go too far with objectification. They make everything an object. Maybe they think that they are especially "object-oriented" (and therefore "in").


But just like Alexandr Stepanov (the mastermind behind the STL) said: "It is nonsense to make everything an object (a class) -- a sorting algorithm is not an object."

It is sometimes better to lay out the design so that it laid out in global algorithms which are implemented as templates and get an additional abstraction via iterators.
That is the exact approach that the STL is using.

Robustness

This refers to two things:

  1. Graceful Degradation: The program has to continue to run as good as possible, even under "harsh" conditions. E.g. if a constructor or a malloc() did not work because there was not enough memory, or if a file is not there which should be there, or if the program does not get something that it needs -- it has to continue without a core-dump!
  2. Robustness of the source-code against application-bugs: if someone uses your functions but didn't read the documentation very well (there is one, isn't it?!) then the functions may not crash or output total nonsense.

Murphy's law is Basically Murphy's law applies: if something can go wrong, it will go wrong, and only after it delivered to the customer!

By the way: robustness is closely connected with stability (see. bugtracking) and continuous checking of the plausibility of the input values.

Methods of Operation

It is clear that everyone has their own style and methods of working, like everyone has their own style of programming. However, it is imperative that you accustom yourself to a meticulous style of working.

"There is no Later"

Meticulous programming takes (apparently) more time, but it is more effective in the end.

Every bug will get you sooner or later. (There are even bugs that emerge only after 1 year!) This happens most of the time if you have absolutely no time to repair it. (Because of a demo, a deliver, etc.)

Even worse are bugs and unrobust software that frustrate the customer! 1 frustrated customer = 10 lost new customers.

How to steal Code?

A lot of programmers want to reinvent all the wheels. This is called the "not invented here" syndrome, or "not invented by myself" syndrome.

Why is it so wrong to reinvent the wheel?

You may never reimplement a function that is already in Unix or is part of a library that is supplied by the the computer manufacturer! This includes:

See also RTFM.

The (so called) reasons for not using foreign code are usually:

Conclusion: you need really good reasons if you want to reinvent the wheel. It is almost always faster, more robust and more seminal to use or improve an existing wheel.

How to find the code to your problem? First you look into the man-pages via man -k keyword ("apropos" button in xman). Then you search in the online books (insight and infosearch). After that you ask your colleagues. You can often find useful code after a short internet search or an inquiry in the appropriate newsgroup e.g. comp.*.{source,software}.* (Read the FAQ first!).

What kinds of code-theft (this is called code re-use in software engineering) are there?

How to hunt bugs?

Unfortunately there is no strategy that always works and gets you there in the fastest way possible. Here are a few rules of the thumb:

Tools for the bug-hunt:

  1. The debugger is still the most important tool (not printf). Therefore: learn to use your debugger efficiently! dbx is in my opinion still the tool with which you can debug the fastest (in 99% of all cases) 12), even though it doesn't have any fancy icons and you have to remember a few more commands.
    12) apart from the fact that it is present (in slight variations), on all Unixes

    You start dbx with the dbx program. The most important commands:

    r options start the program with options as parameter. Once you specify the options you only have to type r.
    t show stack trace.
    W show location in source (if source is present).
    stop in function sets breakpoint to the start of function.
    stop at [&]function sets breakpoint before the first instruction of function. The prologue of he function is still executed in case of stop. This way you can securely find out what values the argument registers have.
    stop at number sets breakpoint to line in the current file.
    file "name" switch to current file.
    c continue.
    p C-expression show value of expression.
    dump print value of all local variables of a function.
    <return> repeat last command.
    help [topic] online-help.
    It is best if you copy .dbxinit from my home into your own home.

    you can get online-help with help, help most_used, help cplusplus_names.

  2. Code-instrumenter: purify often helps if you suspect a memory-bug. (Low-cost-alternative: electric fence.)

    ctrace is an open-source code-instrumenter which modifies a C-file in such a way that each line is displayed with the modified variables while the program runs. (Does not work for C++, I think. Is there a PD-tool?) Documentation: see man-pages.

  3. Compiler: different compiler-options (SGI).
    • -trapuv ensures that local variables are initialized with a value which guarantees a core-dump, if these variables are used before they are set.
    • -TENV:large_stack does sometimes help if the stacks gets corrupted or flows over.
    • -fullwarn recognizes a lot of bad/nasty code which is okay in principle but is usually a bug or bad programming-style (e.g. wrong type of parameter).
      Some warnings are unnecessary but you can disable them globally with -woff.
      If a warning does only make no sense at a few places then you shouldn't disable them globally but instead targeted via #pragma woff.
      By the way, you can avoid the warning "Parameter not used" (or something like that) if you simply omit the name of the parameter in the prototype.
    • -DEBUG:trap_uninitialized:div_check=3:subscript_check -DEBUG:varargs_interface_check:varargs_prototypes -DEBUG:verbose_runtime checks various runtime-errors with additional code.
  4. Run-time Loader (rld): this ensures at the start of a program that all libraries are linked and references are unravelled.\
    Useful debugging-flag: setenv _RLD_ARGS "-clearstack". If your program works after this then you're not initializing some variables!
  5. Libraries: (so called intercept-layers). malloc_cv is very good to find memory-corruption. It is linked fast into the program, no instrumentation is necessary and it is sufficiently mighty. (See man-page for more info.)

Apropos "pondering": the right mixture of intuition, combined with a quick start of the debugger and a few targeted breakpoints is most often the fastest way. It takes a long time to develop this intuition but it is worth it and constitutes a good programmer. Although you need to have the "big picture" of the involved software for this to work.

RTFM

As a programmer you have to get used to reading the documentation. This can be man-pages, insight-books, white-papers or whatever! You also have to get used to reading them thoroughly but still fast.

Reading of man-pages requires a bit of practice; but it's not that hard once you get the idea and you can find what you need very fast after it. And before you start to curse about the "damn-man-pages" --- wait until you have to write documentation yourself!

Man-pages that you should know as a Unix-user:
ls, cp, ln, tar,
vi (or another editor), find, the man-page of your shell, ed (the part about regular-expressions), grep,

Man-pages that you should know as a C-programmer:
string, bstring,
printf, scanf, atoi, fputs, fgets, putc, putchar, getchar,
math, stdarg, stdio, stdlib, malloc, alloca, memcpy,
open, fopen, read, write, writev, readv,
isalpha, isdigit, isspace,
intro(2), environ(5)

dbx oder cvd, cc, ld, nm, make oder pmake, rcs oder sccs

Man-pages that you have to re-read every now and then, but especially once a new version of your operating system was released: cc (Beware: there are many! some are out of date!), ld, rld, dbx, ipa(5)

Standard-libs and -functions that you should know as a programmer under Unix (at least you should know that they exist):

As an "advanced programmer" you should know:

Tools

A little word about the tools you will need as a programmer in daily life. You need to master four tools as a programmer: your editor (whichever), the compiler, the debugger and makefiles. (You also demand from every good craftsmen that he knows and is proficient with his tools.)

The most important tool as a programmer whatsoever is the editor. Your editor is a tool for you that shall help you to program fast and efficient. It should have the following features in my opinion:

  1. It should have sufficiently mighty features: search-and-replace of regular-expressions, over the entire text and also only over parts of it; macros; automatic indentin and exdenting; repeating of commands and command- and search-histories; support of tags; it should be possible to reach the declaration of an identifier with a few keystrokes or mouse-clicks if the cursor is over that identifier (regardless whether type, variable or function); call of macros from the editor and support of handling the error-list of the compiler; syntax-highlighting is nice to have (not only for C); key-mapping and abbreviations;
  2. It should be available on every platform. (Remember that you will most likely not work on SGI for ever.)
  3. It should also work without graphics, i.e. have a purely text-based mode, which you can control. (Sooner or later you will be in the situation that you sit in front of a vt100-emulator and hve to edit something remotely...)

In my opinion only vim (the upwards-compatible successor of vi) and emacs fulfil these conditions. Vim also has the advantage that vi is guaranteed to be present on every Unix system. And emacs has the disadvantage that most people can't even use the pure text-mode. 14)

14) Except these to advantages/disadvantages the choice of the "right" editor is nothing but a choice of "character": nedit and xemacs are good for people who want to start typing right away and have no problem with holding down the control- and/or alt-key in order to execute a command. Vim is good for peaople who want to type as little as possible in order to give a command to the computer but don't have a problem with typing "i" or "o" before they can start typing.

Another important tool is a man-page-reader. I recommend xman or, even better, tkman.

Every now and then you should run purify over your code. This is a tool to find memory-leaks and memory-corruption. (Low-cost-alternative: electric fence.)

References

[1]   Henry Spencer: How to Steal Code, or, Inventing the Wheel only Once. how-to-steal-code.ps

[2]   Ian Darwin: Can't happen, or, Real Programs Dump Core. SoftQuad, Inc., 1984-1985. canthappen.ps

[3]   L. W. Cannon et al. Recommended C Style and Coding Standards. Bell Labs, 1990. cstyle.ps

[4]   Mike Haley: Writing C++ Source Code in the Medical Visualization Group. Fraunhofer Center for Research in Computer Graphics, Inc.

[5]   Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides: Design Patterns. Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading, MA.

[6]   Ellemtel Telecommunications Systems Labporatories: Programming in C++ -- Rules and Recommandations. Älvsjö, Sweden. C++rules.ps

[7]   Richard P. Gabriel: The Rise of "Worse is Better". http://www.kde.org/food/worse_is_better.html, http://opera.cit.gu.edu.au/essays/wib.html

[8]   ?: Programmierrichtlinien ARVIKA, ARVIKA Konsortium.

[9]   tmh@possibility.com: C++ Coding Standard, 1999-05-12, http://www.possibility.com/Tmh/.

[10]   David Williams: C++ portability guide, version 0.7, http://www.mozilla.org/hacking/portable-cpp.html

[11]   Geotechnical Software Services: C++ Programming Style Guidelines, http://www.geosoft.no/style.html

[12]   Scott Meyers: How Non-Member Functions Improve Encapsulation, http://www.cuj.com/archive/1802/feature.html

[13]   Peter Schröder: Some Programming Style Suggestions, http://mrl.nyu.edu/~dzorin/intro-graphics/handouts/style.html



Gabriel Zachmann
Last modified: Tue Feb 11 16:44:06 MET 2014