Tuesday, March 27, 2012

Aborting a thread CRASHES Sql Server 2000!

Hi,

I'm creating a new thread and executing a database import operation using a transaction. I have had several problems that I cannot explain, the most serious of them being that aborting this thread sometimes crashes the instance of Sql Server to which I was connected!

The application is a Windows Forms app, and when I launch the import operation I display a form that allows me to abort the import while it is in progress. If the user decides to abort the import in the middle of things, I call the Abort() method of the thread executing the import.

I'm catching any exceptions in the method on the bottom of the call stack (the ThreadStart delegate) and logging the information, and from the stack trace I can see that the ThreadAbortException happened to occur a few levels into the internals of the SqlCommand.ExecuteNonQuery method. This is where it is most likely to occur, because this is where the thread spends most of it's time, as the import consists of executing a bunch of large script files (often many megabytes of script text in a single round trip).

So far so good, but now things get a bit strange. The connection state is Closed when I catch the ThreadAbortException, indicating that ADO.NET code (probably ExecuteNonQuery) caught the excpetion, closed the connection and rethrew it. I would expect in this case that Sql Server would rollback the pending transaction - despite the application not issuing an explicit rollback command, since (local) transactions cannot span across connections anyway.

Instead, this completely CRASHES the Sql Server instance! It's not just a matter of locks aquired during the transaction not being released; it is no longer possible to connect to another *catalog* using Query Analyzer, or to view "current activity" using Enterprise Manager. Nor is it possible to shut down Sql Server using the management console - it just changes status to say that shutdown is in progress (I have the French version, the wording might be slightly different in the English user interface) and then nothing happens. I went to lunch to give Sql Server plenty of time to recover, but nothing changed.

In the end, I had to REBOOT the server in order to bring Sql Server back to life.

I'm using SqlClient with .NET 1.1 and Sql Server 2000 on Windows Server 2003 with all service packs and critical updates.

I should probably mention that I am NOT using SqlTransaction but instead sending "BEGIN TRANSACTION" and the corresponding commit/rollback commands to Sql Server using SqlCommand.ExecuteNonQuery. But please, do not allow this to take the focus away from the question of how to avoid Sql Server crashes, because this is NOT correct behavior on SQL Server's part (or possibly SqlClient) regardless of my application code! There is actually a reason I don't use SqlTransaction: I've no idea why the behavior is not the same, but when I used SqlTransaction and the size of the transaction becomes really large, SqlTransaciton.Rollback() invariably fails with an exception complaining that "the server did not respond". I don't know why it works when I just use text commands instead, but for the moment at least I think it is more important to focus on the crashing of Sql Server.

For now, I've had to change the abort logic so that instead of calling Thread.Abort() I just set a flag, then the thread executing the import will check this flag after each round-trip to the database and throw an exception in the event that abort has been requested. This works, i.e. everything is rolled back and sql server stays alive and all, but it does mean that the user might have to wait quite a while after requesting Abort until the program actually stops executing and rolls back the transaction.

I hope someone can give me some answers with this as I'm beginning to really lose faith in Sql Server as a reliable backend database for anything a bit demanding; completely crashing the server just because I abort a thread seems a tad fragile to be honest.

Here's the stack trace for the ThreadAbortException (it's the French-resourced version of the Framework, but this obviously shouldn't make any difference):

System.Threading.ThreadAbortException: Le thread a t abandonn.
à SNINativeMethodWrapper.SNIPacketGetConnection(IntPtr packet)
à System.Data.SqlClient.TdsParserStateObject.ProcessSniPacket(IntPtr packet, UInt32 error)
à System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
à System.Data.SqlClient.TdsParserStateObject.ReadPacket(Int32 bytesExpected)
à System.Data.SqlClient.TdsParserStateObject.ReadBuffer()
à System.Data.SqlClient.TdsParserStateObject.ReadByte()
à System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
à System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async)
à System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
à System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
à Import.Worker.executeSql(String sql, Boolean log)

My program catches the exception, logs it (as you can see) and continues executing (not the aborted thread!) normally, but Sql Server is dead and apparently can be revived only by rebooting the host computer...!

|||

Have you applied the lastes patch for sql server 2000? It might be a bug in the server. The following KB article might be related to your problem.

http://support.microsoft.com/kb/914298/

|||

Moving the thread to the SQL Server Engine.

|||

i saw a similar problem when aborting a backup,

the system particulars are 16P Itanium2, W2K3, SQL2000 sp3 + hotfix 1027, fiber mode.

the problem did not appear w/o fiber mode or on sp4 and later 2000 level builds

some aspects of this problem did not replicate consistently between difference systems,

No comments:

Post a Comment