It is half-time for me at DebCamp/DebConf11 in Banja Luka, Bosnia-Herzegovina, and that is a good time to write about my work so far. I started with some finger exercises, such as the HTML copying for gnome-terminal (as blogged previously) and cosmetic changes to the buildd.debian.org/status website. After that, I began my main project: A SAT-based solver for the testing transition problem.
In Debian, new software first enters a repository called unstable, where it is tested. Once some requirements are fulfilled (package has been in unstable for usually at least 10 days and has no new release critical bugs), it is entitled to enter the repository called testing, which will eventually form a new stable Debian release. But some packages need to migrate together, usually because either requires the new version of the other. Also, no packages in testing ought to become uninstallable by some seemingly unrelated change.
The software that decides these things is called britney. It is sufficiently good in making sure nothing bad happens, but not smart enough to figure out what to do in case more than two packages need to migrate simultaneously. My goal is to improve this. Now, the the various requirements can all be expressed as predicative formulas, and a lot of research has been going into writing good solvers for such problems, called SAT-solvers. Hence my plan was to only specify what we expect from a testing migration, but leave the search for a solution to such a general purpose and highly optimized program.
My initial progress was good and I had some result after two days of hacking, and the approach is promising. Because the data sets are pretty large (1.5GB of input data, the final SAT problem has 1.8 million clauses with 250.000 variables), so I learned quite a bit about profiling and optimizing Haskell programs, and that parsec is slower than working with BS.lines and BS.split to parse simple input. I also used the FFI to use dpkg code directly to compare Debian version numbers.
At some point I noticed that I actually want to solve PMAX-SAT problems: Given two set of clauses, hard clauses and soft clauses, find a variable assignment that fulfills the hard clauses and as many of the soft clauses as possible. Unfortunately, there are no fast Free PMAX-SAT solvers around. The ones that I found and that were fast enough that I could use them, msuncore by Joao Marques-Silva at the University College Dublin and MiniMaxSat by Federico Heras at the Universitat Politècnica de Catalunya, are only available as statically linked binaries. It is a shame that this is acceptable by the academic community; imagine mathematicians would stop including the proofs in their papers, and only share their theorems.
So if you happen to have written a PMAX-SAT solver and can solve this instance (in weighted DIMACS format) in less than five minutes, and want to brag that the Debian project is using your code, then please release it under a Free license (e.g. GPL or BSD) and tell me about it!
The code of my SAT-based solver is, of course, available, though slightly unpolished. Parts of it (notably the interface to picosat and the PMAX-SAT solvers) might be of general interest and I plan to put them on Hackage as a separate license.
Update: I have contacted the authors of the SAT solvers mentioned above, and they have reconfirmed that they have no intention of releasing the source. Now I put my hope in maxsatz2009 by Chumin LI, which is GPL but was not able to cope with my large instance directly; I think I need to change the memory management.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.
(I too am in DebConf)
I could try attacking them with ASP (answer set programming, aka stable models) solvers too. Many of them support weighted rules and minimize clauses.