In Haskell, lists are ubiquitous. In real-world Haskell, monads are ubiquitous. So it should be an easy task to fill a list with values obtained in a monad, i.e. a list of n random numbers (in the Rand
monad), or getting some input from the user, and storing it in a list, until the user chooses to finish the input.
For now, assume a function getInput :: IO Int
; we want to run it 10000 and then have the list of the values returned. Sounds simple and straight forward, but it is actually really hard to get the run-time behaviour that we naively and optimistically expect.
In an imperative setting, we would have a simple counting loop. In the loop, we run getInput
, allocate a new cell for the linked list, and append it to the list. Clearly, this
Can we achieve that with Haskell? Let’s look at a few alternatives (complete code available)
I’ll start with the probably worst attempt. Unfortunately, from what I have seen from correcting Haskell exams, this is what beginners would write, especially if they have the above imperative algorithm in mind, and know about the “convert loops to recursion with accumulators” approach to solving Haskell problems:
accumAppend :: Int -> IO [Int] accumAppend n = go n [] where go 0 l = return l go n l = do {x <- getInput; go (n-1) (l++[x])}
To check the space usage of this, I pass -Kn
to the GHC runtime and see what limit is just enough to let this program calculate accumAppend 10000
and fully evaluate the resulting list. It turns out that we need 328336 bytes of stack space.
So what about the second requirement; that the list is fully evaluated? Experienced Haskellers see it immediatelly: The accumulator is not used in the loop, so each iteration adds a thunk around the previous value, that would add the next entry to the end. Only when the list is actually used, these are run, each traversing the whole list, causing quadratic cost. We can verify this using ghc-vis:
Clearly (and not surprisingly) this is not the best solution.
Lets try to fix the second issue, by making sure that the list is always fully evaluated. We import Control.DeepSeq
and write
accumAppendSeq :: Int -> IO [Int] accumAppendSeq n = go n [] where go 0 l = return l go n l = do {x <- getInput ; go (n-1) $!! (l++[x])}
As the accumulator gets fully evaluated before being passed to the next invocation of go
, the result will not contain thunks. But note that this has also fixed the stack consumption: The code now runs the smallest setting for -K
, 8 bytes. This works as both go
and deepseq
are tail-recursive.
But this is still not the solution we are looking for, as the quadradic cost caused by using ++
is still there.
Since we know that l++[x]
is expensive, but x:l
is cheap, we can simply use this to update the accumulator. This has the slightly annoying consequence that the entries of the list are in the reverse order, but we can fix that in the base case:
accumReverse :: Int -> IO [Int] accumReverse n = go n [] where go 0 l = return (reverse l) go n l = do {x <- getInput ; go (n-1) (x:l)}
And indeed, we get noticably faster code that still runs in 8 bytes of stack. Unfortunately, we still don’t construct the final list directly.
Maybe we can do without an accumulator. The most naive such code would be
listRecurse :: Int -> IO [Int] listRecurse n = go n where go 0 = return [] go n = do {x <- getInput; r <- go (n-1); return (x:r)}
and I’d expect that most experienced Haskellers would write that code (if they are asked not to use a library function). And indeed, this code does create the final list directly. But it requires a large stack or 164616 bytes. The reason is that this is no longer tail recursive: After calling go
again we need to take the result and prepend x
Maybe we should simply not worry about the implementation, and use library functions? Clearly, the authors of such libraries have found the best solution. So how does
listReplicateM :: Int -> IO [Int] listReplicateM n = Control.Monad.replicateM n getInput
fare? Unfortunately, not better than the naive recursion. In fact, the stack space requirement is the same 164616 bytes. The same holds when using special libraries for generating and consuming list, e.g. the Streams module from the vector package:
listStreams :: Int -> IO [Int] listStreams n = MStream.toList $ MStream.replicateM n getInput
It is time to dig deeper into our box of tricks. What if we have lists with constant time Snoc (a.k.a. appending a new element)? Then Attempt 3 would not require reversing the list. Such lists are provided by difference lists; there are libraries for that, but we can simply implement them using lambdas:
accumDList :: Int -> IO [Int] accumDList n = go n id where go 0 l = return (l []) go n l = do {x <- getInput; go (n-1) (l . (x:))}
Indeed, no stack is required (8 bytes are enough). But we are cheating here: We are still not constructing the final list, but rather we combine functions that append elements to lists. So when accumDList
returns, what we have in memory, is a thunk and sequence of partial function applications in the wrong order, and evaluating the thunk does basically the same thing as reverse
in Attempt 3. It might be a bit faster, is still not satisfying enough:
Even deepter in the box of tricks, somewhere in the grit and dust where one usually should not reach, there is a function that can help us out here: unsafeInterleaveIO
. This allow us to modify the naive recursion in a way that first creates the Cons-cell, and then continues the loop:
listUnsafeInterleave :: Int -> IO [Int] listUnsafeInterleave n = do {l <- go n; return $!! l} where go 0 = return [] go n = do x <- getInput r <- unsafeInterleaveIO (go (n-1)) return (x:r)
The stack consumption is 8 bytes. It is worth noting that the all to deepseq
(in $!!
) will actually drive the evaluation of go
. Also, it guarantees that the unsafe effects of unsafeInterleaveIO
are confined to the execution of listUnsafeInterleave
, i.e. are actually safe.
I ran criterion over the functions (generating lists of length 1000), and found these results:
There are many ways to write a list-constructing list in Haskell, none are perfect – they either fill the heap, or do not construct the list directly (usually requiring a second pass through it), or rely on IO
-specific unsafe functions. The naive recursion performs the best, but may cause a stack overflow (note, though that from GHC 7.8, the stack will be unlimited). The next best option seems to be the difference lists
I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.