Login to participate
  
       
Register   Lost ID/password?
Louis Kessler’s Behold Blog » Blog Entry           prev Prev   Next next

Fixing a Bug in Only (!?) Six Hours - Sun, 20 Mar 2011

This final phase of getting Behold ready for release is mostly bug fixes and ensuring everything works. A bit more of this is left and then the new help files and the release.

I’ve been getting pretty good at fixing bugs. Most take only a few minutes to track down, and then only a few more minutes to figure out what to do to fix it and make the changes and test. If it takes 15 minutes, I feel really good.

So then you get the agonizing monster bug come along. It keeps hiding on you. Won’t let you pin it down and then when you think you’ve figured it out … well think again.

I just finished one of those today. When Behold crashes in reading and building its internal data from a GEDCOM file, it exits the loop and says the file is not a valid GEDCOM file. But here I had one that worked before and in my current version of Behold, no longer worked. And it had to be in the Good-Engle-Hanks GEDCOM, one of the biggest test programs I have: a monster at 324,738 KB with 741,968 people in it. (That’s as many people as live in Winnipeg!) It takes Behold about 97 seconds to load this monster of a file, and here I was trying to find out where this bug was, somewhere in those trillions of lines executed,

I have a standard procedure for tracking down this sort of error. I would narrow down where it is happening with the Delphi debugger by putting checkpoints in after reading so many GEDCOM records. I’d stop it after 10,000, 20,000, … records until I found the group of ten-thousands it was in. In this case it was between the 110,000th and 120,000th record. Then I’d start with 110,000 and narrow it down by 1,000s. Then by 100’s. Then by 10’s. Then line by line until I found it.

This process took about 30 trials to find the location. And this was such a big file, and debug tracing slows the processing down by an order of magnitude, that each attempt to narrow down took a few minutes. But I’ve got it, so I thought. I open the GEDCOM in a viewer to look at the corresponding GEDCOM code, and there doesn’t appear to be anything out of the ordinary. I really didn’t expect so, because this file used to work.

Now to figure what is wrong with the actual code involved, I’ll have to go back to that point in the program again and inspect all the values of all the variables. So I put a breakpoint at the 119,771st record again. But when I get there, I found the error has this time already occurred. Yikes. I believe I made a mistake and reset the breakpoint and get there again. Yes the error has happened.

So now I have to track it down again. A little more carefully this time. I find its happening at record 119,686. Now I’ve got it. Back to line 119,686 and while there I find the exact line involved. The values all look okay. Not sure what it is. So I’ll trace back there and test some things at that point and – oh oh. Once again, the same number of records does not take me to the error. Again it has slipped to a different location.

5 hours have passed. But the line itself gave me a clue. It’s a memory issue. That file is so big, it must be causing an overflow. A couple more tests, and I found I can get to the exact same point only IF I don’t do anything else on the computer in the meantime. Then I look at the line, realizing its a memory overflow. And I fix it. Test it, and it is fixed.

In case you are wondering what the fix was, I changed:

TotSizeInput := TotSizeInput + LineStart - RecStart;

to:

TotSizeInput := TotSizeInput + (LineStart - RecStart);

That’s 2 characters of code for 6 hours of work. Fortunately this type of bug is more the exception, rather than the rule. It is so frustrating to get one of these, but it is doubly satisfying to solve it.

No Comments Yet

Leave a Comment

You must login to comment.

Login to participate
  
       
Register   Lost ID/password?