Comments and questions

Mar 1, 2015 at 4:59 PM
Hi again.

I've accumulated some comments and questions while I was reworking with Vita my initial implementation of a simple database (SQLite, two tables, one referencing the other) that used the PetaPoco mini-ORM.
  • Logging
    My first goal was to print to our log file any commands that get sent to the database. That includes not just the normal CRUD operations, but any maintenance too, begin/end transactions, model update, whatever. I spent quite some time to find a solution. Initially I went with overriding SQLiteDbDriver.ExecuteCommand, but it doesn't get called for maintenance operations. Then I tried enabling logging within SQLite, but it turned out that it needs to be done only after the connection has been opened, and Vita provides no hook into that. Reading the available info online suggested adding an Operation Log module, but what that does is add a log table into the database, which is not my goal. Finally after studying Vita's code I found a solution, and it was to register my own implementation of IOperationLogService into my app. I hope this is the correct solution? If so, it would be nice to have it mentioned somewhere.
  • Mass inserts
    Inserting lots of data into a table is a typical operation that can often become a bottleneck. Vita does this in a transaction, which helps SQLite a lot. However, for each row a new DbCommand is created, instead of reusing the same command and just changing the arguments. I haven't measured yet if this would provide significant speed improvement (because the data I test with is not much and its running fast), but just the fact that there are lots of unneeded objects created and left to the GC to collect feels uneasy (this would be especially important in a mobile environment where it is always a good idea to avoid creating too much garbage). OTOH, if the same command is reused, this would conflict with the current solution to use SQLiteDbDriver.CommandExecuted to disconnect each executed command so that it disposes the underlying statement and doesn't keep the SQLite database file locked.
  • The order in which objects get inserted into the table is not the same as the order of calls to NewEntity(). This seems to be caused by the reordering of inserts because of references. It's not fatal of course, from the point of view of database correctness, but it could help debugging a lot. For example, I'm inserting rows ordered by some Name field, then I'm looking at the contents of the table to see if all went well, and it would be much easier to judge if the rows were in the same order. It would be nice if there was a way to make the (I suppose some kind of topological) sort try to preserve the original order of creation of the entities.
  • I'm having a problem with cascade delete, which I cannot track for now. The unit test in Vita is working correctly, even running a delete command from SQLite Explorer on my database works correctly, just the DELETE command sent by Vita from my app doesn't cause a cascaded delete. For now this is not fatal because I just delete manually the dependent objects instead. But I hope I'll be able to track and solve this one day.
Thanks,
Ivan
Mar 1, 2015 at 5:40 PM
Hi
thank you for the feedback.

Logging
Did you try just setting App.LogPath property? - this will dump all to the file, like in SetupHelper.cs in extended unit tests:
    BooksApp.LogPath = LogFilePath;
When you run unit tests, log file appears in bin folder, it contains everything.
If file log is not what you want, then you can also do the following. Instead of creating your own LogService, you can hook to existing service Saving event - look at LogFileWriter.cs, that's how logging to file is implemented.
About documentation - agree, will try to improve this, mention it explicitly somewhere
(to be continued)
Mar 1, 2015 at 7:45 PM
About Mass Inserts.
I'm aware of the problem, but I see it at a bit different angle. Surprisingly, I was in the same path in the past, and tried to keep DbCommand object and reuse it - for the same reason you write - to avoid too many objects created/disposed. But after some thinking - dropped it. If you look at command execution, followed by materializing returned entities, and all the activities going around there - it's obvious there are a lot of objects created/dropped in the process, so a single DbCommand, even counting with parameters - it is almost nothing in the big pile. Also reusing the DbCommand caused some other complications - like saving last command in session for debugging purpose was not so simple - the command could be reused by other thread, etc.
As for improving performance for bulk update/insert commands, the main strategy in VITA is batched commands. See how it works in MS SQL: for generating sample data, hundred insert commands (calls to CRUD stored procedures) are put together in one SQL, enclosed in BeginTrans/Commit, and sent to server in one call. This brings 5-fold perf improvement. However, it was designed only to work with stored procedures If you don't use stored procs, the batch processing is automatically off.
The reasoning was
1) using stored proc calls allows use of caching of query plans (they are compiled with stored procs, and reused on each call), while with plain SQLs glued together it seems (ed) it would not work at all, the server will not recognize the SQL shape and reuse the cached plan.
2) The only provider I was playing with that did not have stored procedure at the time was SQL Compact edition - and SQL CE does not allow multiple SQL statements batched together (unlike SQLite).
So I just added code that automatically turns off batch mode if CRUD stored procedures are not used. It is off for SQLite in VITA.
However, recently I found a few articles about perf in SQLite, and they all point out one of the most effective methods is batching multiple statements into one batch, enclosed into start/commit trans. So it looks like I need to enable it - that's the plan. But it is not so trivial, due to some original design - CRUD SQLs are stored as plain strings, with parameter refs embedded direclty into SQL text, like @P1, @P2, etc. So if you just concatenate to SQLs, you get 2 instances of @P1, meaning different things. I need to refactor this, at some time soon, not immediately.
But it will be there, no doubt, seems like the gain worth the effort.
Roman
Mar 1, 2015 at 8:03 PM
Ordering of inserts - I don't think it's worth fighting for. Even if SQLite delivers you rows in a table seemingly in the order of inserts, I think it might all break if you get to bigger volumes, and things start involving different pages or reusing empty space inside pages. The ordering in VITA is done using SCC algorithm, amazingly short and efficient code, it is probably not 'stable' sort (that preserves original order), but it is not worth messing it up for such a questionable gain.
CascadeDelete problem - I noticed that for some servers, including SQLite, you need to give server some time to digest schema changes, or even significant data updates, just by calling Thread.Sleep(). So if you are calling your deletes quickly after doing schema updates, it might be this effect. Just wild guess.
Mar 1, 2015 at 8:50 PM
Yes, I forgot to mention the Saving event of the default IOperationLogService, I think it was the first thing I found and tried. I don't remember now what was the problem with it, it either dumped the whole log at once, or did it at later times (in a separate thread). My goal is to intercept each call to the database at the moment it happens, so it can be properly mixed with all other events that occur in the application. The only way to do that was by providing my own IOperationLogService.AddEntry.

Yes, I see how reusing the DbCommand can be a problem. I imagined that you had a reason to implement it that way, and I admit that I don't see any problem so far, it's just a feeling based on previous experience that it might be a problem in some situations. But I could be wrong, or at least not run myself into such situation in the near future :)

Yes, if ordering of inserts is hard to achieve, it wouldn't be worth the effort. I just thought that it might not be too hard. Actually, I'm taking care to create the entities in the right order (first the referenced entity, then the entities that refer to it), and it could actually not be changed at all, just verified for correctness.

OK, I'll take into account your advice about the schema changes, maybe there's some way to flush or wait for them.