Timeout exception

Feb 16, 2016 at 1:12 PM
I am trying to store more data at once. During executing "SaveChanges" method I obtain "Timeout expired" exception .
Probably it is exceeded "CommandTimeout" property in one of batch command.
I believed that I can use "Batch Update" feature of VITA to avoid to write stored procedures in the Sql Server.
Stored procedures are not problem, I can write them but I wonder whether exist a way how to prolong command execution time in this case.
I hope that this can help me to solve problem.
Coordinator
Feb 16, 2016 at 6:53 PM
I don't quite understand what stored procs are you talking about. VITA creates CRUD stored procs automatically. Then, with batch execution (default behavior for MS SQL, and available only with CRUD stored procs) - the fwk does the following. For multiple updates found in SaveChanges, it builds one batch text consisting of calls to CRUD stored procs that update individual records.
Command timeout - never happened to me, even for huge batches. In MS SQL there's limitation of 2300 parameters, so a huge batch might be broken into several commands, each with smaller batch, and still these huge batches usually execute within fraction of a second.
We need to find out what's going on. First, the exception thrown should contain info about the command; just catch it, do ex.ToLogString() and printout the result, look at command text there. Let me see the entire exception text, what it says
Feb 23, 2016 at 8:41 AM
I found out that exception was caused by transaction log in the moment when allocates disk space during save changes. Small changes in database configuration resolved issue described above.

Now I have another problem: Batch updates failed with message: "Must declare the scalar variable "@Pid1""..., followed with long list another undeclared variables.. It seems that currently executed batch is still referenced to the variable declared in the previous batch.

Currently I have no idea how to resolve this issue. Any help is appreciated.
Coordinator
Feb 23, 2016 at 5:23 PM
The first thing to try is disable batch execution (in DbSettings at startup) and see if it works. If it works, then let's see what's going wrong. I need some more info.
Can you please provide some more details? pls share the printout of error message, it should contain the batch text and parameter printout.
Mar 1, 2016 at 8:35 AM
I performed tests with both options. Saving changes with disabled batch updates was performed successful.
Saving with enabled batch updates was performed successful only when I used small object, i.e. object that generated small save batch. When I used large object that provided thousands records I got into problem described previous.

Here is link to the Google Drive where I put two files. I hope it is accessible. Let me know when no: https://drive.google.com/open?id=0B2PEwkOHH9M-WF9sRkttb0tVX1E

The "png" file shows part of our database scheme that is related to the problem and second "txt" contains batches, including error messages, that I caught during saving.
There are also recorded entries to the operation log for completeness sake

As you can see, the problematic part of the script is last part when StructureAnnotationTopologicalIndex and StructureAnnotationData are inserted into database. Both are referencing to previous batch where StructureAnnotation records (@Pid1-@Pid12) are inserted into Database.

I think right insert/update order should be: StructureAnnotation, folllowed by StructureAnnotationTopologicalIndex and StructureAnnotationData.

But this only partially solves the issue because we got exception in the middle of script.

After inserting annotation data (StructureAnnotation, TextAnnotation, IsotopicFormulaAnnotation) the save continues with AnnotationHeader table which is connector between Spectrum and one of above mentioned annotation data. Spectrum was saved in the one-two batch before and now there is no reference to the variable that keeps it primary key value. I think this scenario is likely, even if I never got to him.


I have no idea how to solve this situation. Therefore, I appeal to you with the hope that you will find some general solution to this problem. Otherwise, I will save data related to the spectrum using custom (written by me) stored procedures. What is not a problem but I lose a great advantage of VITA Framework's batch updates.


And I have one question: Do you know how to compare database's binary data using LINQ without download them to the client and use SequenceEqual (c#)?
Coordinator
Mar 1, 2016 at 4:47 PM
aaaaah... I see now. It's a fatal combination of identity PKs, foreign keys and super-large batch size. My overlook - generated identities should be carried between batch commands. I think all other problems are related to this. I will look into this, give me a few days. For now, can you live with no-batch mode?
Is it too late to ask to consider Guids for PKs?
Mar 2, 2016 at 6:38 AM
Yes, I can. I will work on communication between client and server (service).

I wondered about the use GUID as the primary key, but I decided to int, because our desktop application is based on SQL Server Express edition, which has a size of data files limited to 10GB. With the arrival of high resolution mass spectrometry is also an increasing amount of data. It is difference store 16 bytes and 4 bytes. I need to compare databases with and without Guid as primary keys, because my concern in the size are possible unfounded. While thus I remain in them, but if that is the only way to solve the problem of saving large batches, I begin to think seriously about them.
Coordinator
Mar 2, 2016 at 6:01 PM
Ok, this big batch trouble with identities should be fixed now, I've pushed fix this morning
Guid PKs. First, size (16 bytes vs 4) is irrelevant. Even if it gives any space savings, it would be in a few percent range. Which means - your customer will hit 10 Gb limit a month earlier with Guids than with ints. Does it make any difference? (With batch sizes in you example, they'll hit it pretty soon I guess) You still have a problem. Maybe time to explore alternatives - like Postgres or MySql? With VITA the switch is realistic (not too easy, but not too much).
Back to Guids vs Ints. I think the biggest advantage of Guids is - no problem with data set merges - like download from server a big pack of relational data. With GUIDs you can just dump records into local db, all PKs/FKs are OK. With Ints and identities PKs in incoming data you'll have to rewire the whole pack (assign local PKs/FKs from local identities). Lot of trouble. My guess is what you're about to do now? I've looked at your site some time ago, remember you offer 'libraries' of known spectrum patterns - which I guess are packs of relational data user can import into local db from remote server - and then you have a problem of merging.
So, if you are that early in the project so switching to GUIDs is doable - my strong advice is to switch. Identities are things of the past, today's best practice is to stay away from them.
Roman
Mar 7, 2016 at 10:24 AM
Great work. Batch updates are Ok. Thank you very much.

But I regularly get timeout exception when trying to save large data set.

I think that simplest workaround should be increase command timeout. I do not know say what value will be sufficient. I would appreciate it if I could set this value before performing data storage. I can have stored this value in configuration file and then I can change it when this problem occurs in user computer.
What do you think? It is possible or you have another idea?

I am looking for cause of this issue but without success yet.

Last days I thought about GUID and their impact for our software. I have also talked with my colleague which is responsible for developing algorithms for processing data.

We decided to stay with integer PK's because introducing GUID came us back in the time and we have very short time to release our software. Maybe in the future we return to this and will again discuss about it.

Of course, you're right in everything you said about them. I agree with you.
Coordinator
Mar 7, 2016 at 4:49 PM
with command timeout - I will see how it can be done, to add ability to alter it. For now, you can limit batch size by limiting number of parameters in the batch, the same way I do in TestIdentityInLargeBatch unit test. By default, limit for MS SQL is 2100; change it to smth like 500. In general, I would suggest to reduce batch sizes rather than increasing timeout; anything requiring more than a few seconds is too much, puts too big load on the server at one moment.
As for Guids - I understand, time constraints. But consider that once the app is out with identities, to change it to Guids you'll face incredibly challenging task of converting customer databases to new schema. Better do it before first release.