C++ Question

StGabe · May 13, 2010, 6:57pm

Agreed on always using {}'s. I inadvertently created a lot of bugs before I finally settled on that discipline. I don’t force my engineers to do that but I strongly encourage it.

Rather than asserts: run your debugger! If you just ran a debugger through that code and looked at each value for one run through each loop you’d quickly realize your mistake. I know a programmer that literally runs every single line of code he writes through the debugger just to verify that it’s doing what he thinks it should. That’s a bit extreme for me but I do find that this practice is very useful for finding the sorts of logical errors that, for whatever reason, you’re just not able to see with code inspection.

Jason_McCullough · May 13, 2010, 7:00pm

Programmers exist who don’t run everything through a debugger?!?!?

XPav · May 13, 2010, 7:04pm

I’m pretty good at C++ and the STL. I’ve got mission-critical code in important places, and I’m proud to say that I’ve developed an overall architecture that (while not perfect) is robust and relatively easy to program for, in general.

However, looking back after ten years of programming, I’ve decided that I really don’t like C++ very much anymore. It works, but the moment someone thought that it would be a great idea to do template metaprogramming in C++ was when everyone should have stopped and taken a step back.

Classes and Object Oriented design? Great. Exceptions? Yeah, ok, do it right and it makes code more readable than the nested if-statements from hell. Templates? Err… if you must. Templates of templates? Ummm… Functors and bind1st as you attempt to bodge advanced concepts into what’s still the C language? How about “no”.

The toolchain is just so primitive compared to newer languages, but the C++ language and syntax itself defeats the attempts to make the toolchain better. You read hilarious take-down articles on C++ that are nearly a decade old, and it’s just so sad to realize that nothing has changed in that goddamn language.

C# has gone through how many revisions? Objective-C made a comeback, for christs-sake, and C++ still sucks.

Jason_McCullough · May 13, 2010, 7:14pm

That’s because they haven’t changed the language a bit, I’m not sure they really can all that much. All the innovations have been in areas like ScopeGuard, Boost, Loki, and what have you.

I haven’t looked into C++0x much though.

mouselock · May 13, 2010, 8:15pm

The indentation generally letsme know, but then I know that I’m careful to match indentation with current heirarchical level, and others may not make the same assumption. (I find such things vitally important in things like Fortran code where you’ll have loops spanning 10 or more screens)

I’ve had issues with being unable to set proper breakpoints near braceless for loops as well.

This, however, is a very valid point and sufficient to convince me you’re right now, while it’s still early enough to change my habits. It does give you a useful additional location to set a break point (post inner loop, pre-fallout to an enclosing outer loop) that’s simply not there if I don’t include the bracket. Idea sold!

mouselock · May 13, 2010, 8:18pm

That’s all well and good until you care about performance. C# could easily be a better language, but I need stuff that runs at fortran speeds (or as near it as reasonably possible). So if my choice is Fortran, C, or C++; well at that point C++ is gorgeous. I can do so much more with C++ providing easy encapsulation and all sorts of language-standard utilities that are guaranteed to be portable and don’t require me to roll my own everything.

mouselock · May 13, 2010, 8:22pm

I presume the inner loops are what you find opaque? I’d be all over ideas about how to make things less opaque, particularly if they don’t trade off performance.

Jasper_Phillips · May 13, 2010, 9:29pm

The inner loops are somewhat opaque, and I get a vague gut intuition that I’d do it entirely differently, but it’s hard to tell since I’m not quite sure what the code is supposed to do…

What really gets me are all the obtuse variable names, especially for stuff defined elsewhere (e.g. cInd1, cInd2, rInd1, rInd2, eToS, t3_s). Hell, simply having the code in a function/method with a good name would help. What are the different Matrices even for?

Much of it likely stems from my primarily using Python now; I’m not even sure I understand some of the syntax you’re using anymore! I don’t think I’ve even written an old fashioned C-stye for-loop in years…

Anyway, if you’re really interested (and I can totally see why you’d be leery of internet advice!), and can give me a run down on what the heck the code is supposed to do, I’d happily show you my take on it. I work for myself, and it’s nice just to BS about someone else’s code for a change. ;-)

Coca_Cola_Zero · May 13, 2010, 9:34pm

I’m pretty sure what StGabe is saying is that he knows a guy who single-steps through his code in a debugger while watching variable values… for every line of code. Which… is quite a bit more than just running the code in debug mode and also pretty rare, and honestly seems like a waste of time to me if you’re not writing life-or-death code to run a nuclear reactor or a mission critical medical device or something.

Re: warnings in C++

I don’t do much C/C++ coding these days (and like others above, I’m thankful, though I did love coding in them at one time), but it is probably worth looking into some decent static analysis tools. These were getting really good back when I was writing a lot of C/C++ so I’d assume they are even better now.

In the meantime, gcc has a LOT of command line options that’ll make it more verbose with the warnings. If you had passed -Wshadow to it, it would have warned you about this, there’s also (the poorly named) -Wall, -Wextra and -Weffc++ and hundreds of others (many of which are switched on if you use one of all/extra/effc++). Of course, it can be difficult to find a balance between being overwhelmed with trivial warnings and getting useful ones, so you might have to play around with the options to find something that suits your code.

Coca_Cola_Zero · May 13, 2010, 9:46pm

Most of the variable names are relatively self-explainatory to someone who has done a lot of linear algebra related code, though as Jonathan mentioned, the pseudo-Hungarian notation kind of makes them look funny, and just because I can deduce what they mean doesn’t exactly mean the names are good… I do prefer longer, more descriptive variable/function names than most other coders tend to use. Intellisense ftw.

The bigger issue for me is the overloaded indexing. Not that I don’t logically understand what is happening there after he informed us of it, but I just dislike it so much that my brain gets angry when I look at it. I often have this reaction to operator overloading.

StGabe · May 13, 2010, 9:54pm

Yup.

And I agree. When I first heard that I thought he was crazy. Yet he’s an extremely talented and successful programmer … which at least makes me think again. Personally it just encouraged me to be a little bit more liberal with debugger use when something seems amiss in my code.

mouselock · May 13, 2010, 10:10pm

The code implements a specific equation in the process of calculating the elastic constants of an arbitrary 3 dimensional cell from the fluctuations in the cell. The cell definition is contained in a 3x3 matrix. I read in a data file containing many lines, where each line holds the necessary information to construct the cell matrix (The cell matrix is upper triangular, so it’s only 6 data points). The average cell matrix is calculated (as you’d expect, more or less), then there’s a matrix equation which defines some necessary quantities for the calculation of the relative fluctuation of the matrices. At that point I calculate the strain on the matrix which is an outer matrix product (and hence promotes me from using a 3x3 matrix - a rank 2 tensor to a 3x3x3x3 4th rank tensor). The code I’ve posted is simply the contraction of that 4th rank tensor (due to symmetry) into a more amenable form. It turns out that due to the symmetry of real world physical systems, the 4th rank tensor can be contracted into a 6x6 matrix, which is naturally composed, due to symmetries, of 4 3x3 matrices (I excised the calculations of cInd1, cInd2, rInd1, and rInd2, but they have symmetric forms which index into the 4th rank tensor as part of this contraction). You can see some of this symmetry in the layout of the variables, especially if you know that the form is:


X(6x6) = |M1(3x3) M2(3x3)|
         |M3(3x3) M4(3x3)|

So the loops I’ve posted are basically constructing the M1, M2, M3 and M4 matrices via the appropriate tensor contraction (which consists of index permutations to generate cInd1, cInd2, rInd1, and rInd2, and in some case addition operators).

eToS is the conversion factor from strain (e) to compliance (S) – i.e. the dimensional constants. (Of course in trying to be more succinct I cut out the calculation of eToS where I had variables defined like kdBoltzmannSGI which is a double precision constant that defines Boltzmann’s constant in SGI units, using such obtusely named variables as dVolAvg for the average volume of the system. ;) )

Much of it likely stems from my primarily using Python now; I’m not even sure I understand some of the syntax you’re using anymore! I don’t think I’ve even written an old fashioned C-stye for-loop in years…

Perhaps. I suspect much of it is that I’ve named variables for a very specific set of knowledge which you’d neither guess nor likely be familiar with if you did guess. It comes from converting standard notation equations directly to code. I could use longer descriptive names, but frankly I shudder at the thought of N column wide code like so:


ReducedComplianceMatrix_UpperLeftSubmatrix[CurrentWindowIndex](Row,Column)=ComplianceTensor(TensorIndexI,TensorIndexJ,ReducedRowIndex1,ReducedRowIndex2)+ComplianceTensor(TensorIndexI,TensorIndexJ,ReducedRowIndex2,ReducedRowIndex1);

Is something like that really easier to read? Is there a better intermediate that optimizes readability? It has more characters, but is it magically more meaningful if you don’t know what a ReducedComplianceMatrix is or what a ReducedRowIndex1 is? (In fact the latter seems more confusing to me; how do you define the “row” in a 4 dimensional data structure?)

I’m not trying to be argumentative or throw out tons of minutiae here to push you off; I’m genuinely curious if this information helps, because if it does I’ll happily code in a style that makes things more understandable. I just never really considered the thought that this stuff would be understandable anyway if you weren’t sitting with a paper reference in one hand to compare to, and at that point my variables are named pretty concurrently with the literature source they’re taken from.

mouselock · May 13, 2010, 10:16pm

What’s your preference? Implementing a function like:

Tensor.GetElement(i,j,k,l)?

I get that it’s a bit messy with the whole indexing the vector then indexing the tensor differently, so if that’s your complaint I understand. If not, is the above really easier to read than:

Tensor(i,j,k,l)?

Maybe I should do:

TensorRef = &Tensor[CurrentObject];
TensorRef(i,j,k,l);

Can I use a reference that way to basically get rid of one of the levels of indexing?

(If it’s not obvious, I’m flying without a net here really. I had a discussion with my bosses at work earlier involving calling fortran functions with actual arguments, rather than simply passing everything in a globally included common block, where I was informed it was too confusing trying to get the arguments right in both places when you can just shove everything into a common block and pass it through as a global variable. So, y’know, relatively I think I’m batting 1000 here! ;) )

XPav · May 13, 2010, 10:36pm

Your reference syntax isn’t right, but yes, you can use them that way to avoid repeated code.

Scientists are the worse when it comes to coding, by the way. They brute force through every problem, end up with a mess of code that works solely because they beat on it for months, then refuse to change anything because it totally breaks if you look at it cross-eyed.

Then they dump the thing onto a grad student or other junior person and the cycle continues, and that’s why we still have FORTRAN.

BaconTastesGood · May 13, 2010, 10:46pm

A lot of my coding style is dictated by debugger usability. For example, in C/C++ I would tend to do this:


if ( x )
   y = z;
else
   y = w;

instead of:


if ( x ) y = z;
else y = w;

For no other reason than I can set a breakpoint in the first example depending on the case.

It’s also the reason I used to avoid the ternary operator:


y = x ? z : w;

My most recent coding style is pretty, uh, controversial with a lot of coders, in that I basically try to make C/C++ feel like a functional language in terms of constness and side-effects. I’ll make local functions just so I can do:


const float x = local::compute(a,b,c,d);

instead of a bunch of code.

I also inline all code that is called only once, which means I have some functions that are 5000 lines long. =)

XPav · May 13, 2010, 10:57pm

Local functions? 5000 line functions?

I can understand the controversy.

Jason_McCullough · May 13, 2010, 11:06pm

Dear god, yes. There’s nothing forcing you to put it all on a single line though; that’d be my recommended next step.

As to why I think it’s clearer - the code when read should match the intended goal of the operation as much as possible. Unless I’m really missing, something, your goal isn’t “loop through a pile of things”, which is the semantics of for loops - it’s “multiply and add these matrices”, so I personally find the intent way clearer with the example you gave. By contrast, I have no idea what the other one does on first read.

If you need to do it for performance reasons, or you need to make the bottom layer a for loop or something, sure, but semantically I think it’s confusing.

By the way, have you actually looked at some of the matrix libraries out there?

mouselock · May 13, 2010, 11:06pm

Alright, a quick perusal of wikipedia, the C++ FAQ Lite, and Stroustrup didn’t answer this question, so I’ll ask here:

Is the syntax:


type& Reference1 = R1, Reference2=R2;

or


type &Reference1 = R1, &Reference2=R2;

If I want to define multiple references on a line? I’d specifically like to do:


Mat3& rM1 = M1[iCurrWin], rM2 = M2[iCurrWin];

It seems like the reference should work like pointer declarations (int *a, *b; ) but it doesn’t “read” right (Mat3& rM1 = M1[iCurrWin] reads as “A reference to Mat3, named rM1, is equivalent to M1[iCurrWin]”, Mat3 &rM1 = M1[iCurrWin] seems more like it should be read “the reference, rM1, of type Mat3, is assigned M1[iCurrWin]”. That’s probably me just not really understanding references all that deeply or something.)

Scientists are the worse when it comes to coding, by the way. They brute force through every problem, end up with a mess of code that works solely because they beat on it for months, then refuse to change anything because it totally breaks if you look at it cross-eyed.

Then they dump the thing onto a grad student or other junior person and the cycle continues, and that’s why we still have FORTRAN.

Some of us try to do what we can to make things better, but yes, this is true. The problem is that programming stuff isn’t easy, programming math and the like is harder still, and that’s ultimately not what 99% of the scientists get paid for: The programming is simply a small, sometimes relatively minor step to actually doing the science. But in general as a scientist who programs, I’m with you on the above statement. Science inflicts some horrible code on the world.

Rimbo · May 13, 2010, 11:49pm

I don’t see any problem with inlining code that’s only called once.

Coca_Cola_Zero · May 14, 2010, 2:01am

Said like that, I don’t see a problem with it either and I also don’t like putting things into functions when they aren’t going to be reused, but if it results in 5000 line functions, well, uh, that’s pretty extreme. When a single function starts closing in on 100 lines or so I start to get really bothered by it and I have to start breaking it up a bit.