Venting IV

by zenquaker

This is not really a venting post, it’s more of a worrying post, but it fits in with the previous venting posts, so I’m going to roll with it. As you might have guessed, it’s about the new Sharp Floor system at work. I found a new bug with it today. Actually, I found two new bugs with it today, but one of them is so much more worrisome that I’ve forgotten what the other one is.

We have a problem with the system that we’ve known about for some time. Sharp Floor is a web application. God knows why they chose a web application for a broad and complicated application that is critical to the functioning of our organization, but there it is. As a web application it creates temporary files. It stores form data and cookies and other things. I don’t totally understand all of what temporary files it stores. While I am a programmer, web programming is not my territory. What I do understand is that these temporary files can build up and clog the system. It just stops working. It obviously fails to do what you have asked it to do.

The contractors have said that this is just not a solvable problem. Web programs create temporary files, and there is no way for the program itself to determine which ones to clear and when. Which brings us back to the question of why a web application was chosen in the first place, but we’re never going to get an answer to that question so we’ll just move on. The contractor’s work around for this problem is to have us manually clear our web cache, deleting all of the temporary files. This is a bit of a pain, because it deletes all your temporary files, including your history and cookies. You have to fully retype every web page, because you made your browser forget which ones you’ve been to; and you have to sign back in to all of your websites, because you made your browser get rid of all the cookies that indicated you’d already signed in. Not a huge deal, but as I said, a bit of a pain.

That brings us to today’s fun and games. Yesterday I reviewed some reports. One of them I didn’t think needed any further action, but I was talking to some other people today and they said “Yeah, we’d like to look at that.” I knew the report involved a particular object. So I did a search for the broad category that the object belongs to, and a keyword I knew was in the report. The report didn’t come up. Three reports that involved that same sort of object came up, but not the one I was looking for. I reformulated the search several different ways. The report didn’t come up. Eventually I found the report because I knew I had performed a task on it that was tracked by the system, and I went through all of the reports that had that task performed yesterday one by one. That confirmed that the report hadn’t been deleted, it was coded under the broad category I was searching on, and it contained the keyword I was searching for.

I’m sure you can all see where I am going, but I’m going to drag you along because misery loves company. I swear, I was beating my head against this for an hour or more this morning. I went to a coworker (who gets the code name Stiff Man) and asked him to do the search. I intentionally phrased the request vaguely, to not bias him toward my failed attempts. He just searched just on the broad category, and found the report. So I went back and searched just on the broad category. The report didn’t come up. That’s when I remembered the cache problem. I cleared my cache, redid the search, and the report came up.

This is a monumentally bigger problem than we had before. Before, the cache problem caused an obvious failure of the system. This time I got three reports back from my failed search. If I hadn’t been looking for a particular report that didn’t come up, I wouldn’t have realized the results were wrong. While it was an incorrect result, it was a plausible one. Me and a third coworker (codename Soft Prose) have tracked down two other problems that are likely caused by the cache messing up searches. A common part of Soft Prose’s job is to assign investigations. But since we can get multiple reports of the same incident, she has to make sure the incident hasn’t already been investigated. So she searches for similar reports to the ones she’s about to assign. Recently, she has started to miss the previous reports in her searches, and has assigned many investigations that were later found to have already been done. This is not a major problem, but it is not common for Soft Prose to mess this sort of thing up. The other possibly related problem is that I am writing a program to directly search the new data. I was having problems a week ago because the data from my direct pull was not matching the test set I got from a search using the web application. I redid the search for the test set this week and the problem went away.

That leaves us in a situation where we can’t trust the searches we’re running with the web application unless we clear the cache before every search we do. Again, it’s not a big deal, but on the other hand it’s not something you commonly do when surfing the web. My problem is that I don’t think we can trust the whole organization to remember to do this before every search. If we can’t, then erroneous data will start creeping in to the decision making process. This decision making process involves things that can and do kill people. That’s not the sort of decision making process that I am comfortable allowing erroneous data into.

But what am I going to do? I’m near the low end of the totem pole (or actually the high end, apparently the important guys were actually put on the bottom of the totem pole). I trust my immediate management to understand the problem and it’s consequences. But middle management has shown a marked tendency to not be proactive with problems involving the Sharp Floor system. If they don’t push a solution to this problem, I would be faced with either letting it slide or bring it directly to the attention of upper management. The problem with upper management is that I am almost certain that bringing it to their attention will unleash a meteorological feces event that will be neither fun nor a solution.

I have been worrying about this sort of thing for a while. So far the problems with Sharp Floor I have been willing to let slide. I’m not sure I can let this problem slide, but I’m not sure what to do if I decide not to let it slide. I am tempted by the Gordian solution of just leaving and finding another job, but that is not a fun option to contemplate. However, I am comforted by a quote from my favorite author, Lois McMasters Bujold: “… tests are a gift. And great tests are a great gift. To fail the test is a misfortune. But to refuse the test is to refuse the gift, and is something worse, more irrevocable, than misfortune.”