soc progress 1

I have started doing the real work few days back. So what gives? I have branched darcs and started porting the relevant bits over to hashed-storage. Along the way, hashed-storage has received some improvements. For the most part, these were darcs compatibility improvements (in the darcs hashed pristine code) and in tree diffing department. The tree diff is now fully symmetrical, which is required for --look-for-adds. Efficiency has suffered a little, but I don’t quite expect this to show up on profiles.

In darcs, I have mostly implemented safe index manipulation. (I.e. not allowing index to get out of date with regards to tracked files… The nature of the index requires that each tracked file is present in the index, so that we don’t need to read the actual working or pristine directory contents.)

Unfortunately, the index still doesn’t work very well with paths that have spaces in them (which is weird, since the index doesn’t particularly care about what is stored in the path, but I’ll investigate that later). This also means, that I can’t test on ghc-hashed, but I can test on ghc-testsuite, which seems to be more interesting of those two, anyway. The numbers (with hot cache in both cases):

darcs wh  0,87s user 0,12s system 94% cpu 1,046 total
darcs-hs wh  0,06s user 0,03s system 84% cpu 0,100 total

That gives about tenfold speedup for whatsnew on hashed repositories. This also fixes the infamous “timestamps get out of sync all the time” bug, which is usually manifested by darcs taking extraordinarily long time “reading pristine”. Branching the ghc-testsuite repo, I get (in the newly created branch, which has broken timestamps wherever hardlinks work; hot cache again):

darcs wh  5,91s user 0,56s system 91% cpu 7,033 total

To get back to darcs-hs, it seems, that at least on my machine, it manages to pass the darcs testsuite (although it took some tweaking to get there). Nevertheless, there are some further issues I have discovered that the suite does not cover. Still, at least for now, it should be safe to use darcs-hs, as the code is “read-only”: it is only used for whatsnew, never for creating patches.

Next week, I’ll work some more on getting record use the new diffing code (index-based, that is). I have already started, but I’m still failing a bunch of tests and they are not trivial to fix yet. Also, I should look into getting back the optimised version of filepath-restricted diff — I had to disable it since it’s not clear how to make it work with pending renames (the original darcs approach doesn’t apply for my version, sadly).

That’s it, I’m attaching a summary of changes on the individual repositories. The first one is hashed-storage (get from http://repos.mornfall.net/hashed-storage):

The other one is darcs-hs, from http://repos.mornfall.net/darcs/darcs-hs: