Posted by: Peter Quirk | July 6, 2008

In search of elusive stability

I spent a good portion of this long weekend trying to get to the root cause of instability with realXtend. I have eliminated some problems, but I can’t say that I really understand what is going on yet.

The first problem that I have eliminated is a resource starvation problem that caused the system to fail to upload images after an hour or more of apparently normal operation. I traced this to an internet backup product that I installed a month ago. Apparently it stopped working about 10 days ago when my IT organization pushed out a new rule for the Cisco Security Agent (CSA) installed on our corporate laptops. IT never announces the new rules, presumably thinking that it just like adding virus signatures to an antivirus product. The problem is that the rule blocked the backup product’s Windows services from running – silently. This in turn caused some resource leakage because the hooks that the product installs in various parts of the kernel didn’t deallocate resources until the services completed some action. To compound the problem, a notification from the vendor that the new rule was negatively affecting operation was caught by my junk mail filter and lay dormant until I reviewed my junk mail over the weekend.

Deinstalling the product was another challenge because the CSA rules prevented the uninstall routines from working properly. I had to violate security procedures to shutdown the CSA, just so could compete the uninstall.

The second problem I’ve been investigating is the apparent lack of stability when running the server processes on a separate machine with a SQL Server database. I built a new remote server with the standard SQLite database and found that it exhibited many of the same instabilities, especially with the authentication server failing whenever a user logs off. So for now, SQL Server is exonerated, and remote services are questionable.

Supporting this line of reasoning, I discovered a problem loading a complex mesh. Simple meshes load OK on the remote server, but this complex mesh doesn’t. Needless to say it loads OK when the server processes are on the same machine as the client. It’s starting to smell like some kind of timeout problem. I carefully executed a sequence of loading the small mesh, followed by the large mesh in both a remote and local server configuration, and submitted the logfiles to the dev team for analysis.

For now I’ll continue to work with the sandbox configuration and hope that someone on the rexdeveloper forum or the realXtend group finds an aswer.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: