Anthony Ballo of Musician's Friend, an Ecometry user in Medford Oregon, noticed that Suprtool appeared to be taking too long to read the last 5% of the CUSTOMERS master dataset. The progress messages that display every five percent showed quick progress through most of the dataset, then it seemed to hang for fifteen minutes after the 95% level. The reason for the delay was that the dataset was only half full, and Suprtool was scanning a huge empty space looking for the last few records.
Picture the dataset as a huge parking lot, in this case with room for 8 million cars (records):
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|...(Sounds like the rental lot at the airport...)
There were only 2.9 million cars in Anthony's lot, most of them parked together in a cluster near the beginning. Suprtool has no clue where the cars are going to be located, so it has to look at all 8 million parking spaces for records. It takes a certain amount of time to look in the last 5 million spaces, even if they're empty. (In fact it takes the same time whether they're empty or full.) In this case it was taking 15 minutes.
Usually cars are parked randomly and are approximately evenly spaced throughout the lot. In Anthony's case they were all parked in one area, with the rest of the lot empty. That is why the progress count went up quickly for a time (during the cluster) and seemed stalled the rest of the time (during the empty area). When the cars are randomly spread out the progress counter will go up evenly throughout the whole lot.
Why were all the records clustered together? Because the dataset uses an integer key, and the key values are assigned sequentially. If the dataset had used a character key, the records would have been more evenly distributed, though the scan would not have been any faster.
Question: Why didn't Suprtool stop when it saw the 2.9 million-th car? Answer: To be absolutely sure that Suprtool didn't miss anything. What if someone parks a new car (adds a record to the dataset) just as Suprtool starts running, and it gets located somewhere near the beginning of the lot? Suprtool would count that car in the 2.9 million it is looking for, and would stop before seeing the car that has been parked in location number 8 million all this time. To prevent this possibility, Suprtool always reads master datasets to the end-of-file.
The solution in Anthony's case was make the the dataset smaller, thus removing a lot of the empty entries. This saved a ton of disc space, and let Suprtool perform a whole lot faster. (Warning: Anthony should not reduce the size of his dataset too much or it might degrade performance for the online application. Turns out that when there are too few empty slots in the parking lot, cars start colliding -- but that's a topic for another story.)