linesIn [Bird-Wadler-1988], the function
> lines :: String -> [String]which decomposes text into lines not containing the newline character
'\n'
is specified as one-sided inverse to the function
> unlines :: [String] -> String > unlines = foldr1 concLinewith the following auxiliary function:
> concLine :: String -> String -> String > concLine xs ys = xs ++ '\n' : ysFrom this specification, a constructive definition of
lines
is then derived.
Most probably since nowadays it is considered bad style
to end a file with anything but a newline character,
the prelude definition of unlines now is the following:
> unlines = concatMap (++ "\n")In GHC and HBC this is implemented in the following way:
unlines [] = [] unlines (l:ls) = l ++ '\n' : unlines lsThis is equivalent to the following version that we shall stick to in the sequel:
> unlines :: [String] -> String > unlines = foldr concLine "" > concLine xs ys = xs ++ '\n' : ysThis reduces the difference between the version of [Bird-Wadler-1988] and that of the prelude to the
foldr1 having been replaced by a foldr,
and thus corresponds to exercise 4.3.3 in
[Bird-Wadler-1988], which we now solve:
We start from the original specification for lines:
< lines (unlines xss) = xss -- (spec)and adhere to the assumption that we can express
lines
via foldr:
> lines :: String -> [String] > lines = foldr f ewith suitably chosen auxiliary values
> e :: [String] > f :: Char -> [String] -> [String]We now just redo the development of [Bird-Wadler-1988], so we first derive the following laws about
lines and
unlines from the definition of foldr:
< unlines [] = [] -- (unlines.1) < unlines (xs : xss) = concLine xs (unlines xss) -- (unlines.2) < lines [] = e -- (lines.1) < lines (x : xs) = f x (lines xs) -- (lines.2)It is straightforward to derive
e:
< e = lines [] -- (lines.1) < = lines (unlines []) -- (unlines.1) < = [] -- (spec)For
f, we initially obtain:
< f x xss = f x (lines (unlines xss)) -- (spec) < = lines (x : unlines xss) -- (lines.2)Performing a case distinction on
x we first get:
< f '\n' xss = lines ('\n' : unlines xss)
< = lines ("" ++ '\n' : unlines xss) -- (++)
< = lines (concLine "" (unlines xss)) -- (concLine)
< = lines (unlines ("" : xss)) -- (unlines.2)
< = "" : xss -- (spec)
For the case that x /= '\n'
we again have to distinguish two cases,
let us first assume that xss is non-empty,
i.e., that xss = ys:yss
for some ys :: String and yss :: [String]:
< f x xss = lines (x : unlines (ys:yss)) < = lines (x : concLine ys (unlines yss)) -- (unlines.2) < = lines (x : (ys ++ '\n' : unlines yss)) -- (concLine) < = lines ((x:ys) ++ '\n' : unlines yss) -- (++) < = lines (concLine (x:ys) ++ '\n' : unlines yss) -- (concLine) < = lines (unlines ((x:ys):yss)) -- (unlines.2) < = (x:ys) : yss -- (spec)In the case that
xss is the empty list
we are in a dilemma since
non-empty strings not containing any newline characters
are outside the range of unlines.
We therefore have to decide on a reasonable totalisation,
and we extend the specification with:
< lines xs = [xs] , if not (null xs) && '\n' `notElem` xs -- (spec2)This allows us to finish the calculation:
< f x xss = lines (x : unlines []) < = lines (x : []) -- (unlines.1) < = lines [x] < = [[x]] -- (spec2)Altogether we have:
> lines = foldr f [] > where > f '\n' xss = "" : xss > f x [] = [[x]] > f x (ys:yss) = (x:ys) : yssComparing this with the equivalent prelude definition:
> lines :: String -> [String] > lines "" = [] > lines s = let (l, s') = break (== '\n') s > in l : case s' of > [] -> [] > (_:s'') -> lines s''we note that our newly derived definition for
lines
is not obviously more cryptic than the prelude definition;
it even turns out to be faster.
However, as pointed out by Koen Claessen, it is also less lazy, in that only complete lines are passed on.
Here are
interact (unlines . lines) on a 1.4Mb source file,
with compiled Haskell, and
lines "Science\nis true.\nDon't be misled\nby facts."
with Haskell interpreted by Hugs,
and a term graph translation of a
joint Haskell script animated by HOPS
There are also HOPS reduction sequences in PostScript for SpanLines2:
and for
NewLines:
The animation PostScript files are not designed for printing!
Holding down the space key in GhostView might give you a usable animation.
Experiments have been conducted with the
following variants of lines (those in italics are lazy within lines):
break (== '\n') replaced by span (/= '\n')
span modified to use an irrefutable pattern
span (/= '\n') replaced by span ('\n' /=)
span (/= '\n') partially evaluated
foldr expanded
f expanded
f expanded
f expanded
ghc-4.06 -O | hbc-0.9999.5b -O | HUGS | HOPS
real user | real user | red. cells | red. .ps .ps.gz .pdf
PrelLines: 1.813s 1.710s | 2.630s 2.430s | 1025 1579 | 479
SpanLines: 1.837s 1.700s | 2.610s 2.360s | 1105 1701 |
SpanLines2: 1.827s 1.720s | 2.435s 2.180s | 1105 1701 | 307 7882kB 641kB 4035kB
SpanLines3: 1.853s 1.660s | 2.659s 2.390s | 1063 1655 |
SpanLines4: 1.763s 1.650s | 2.057s 1.920s | 1101 1727 |
NewLines2a: 1.817s 1.660s | 2.229s 2.060s | 956 1543
NewLines2b: 1.745s 1.580s | 2.166s 1.990s | 917 1309
NewLines3a: 1.784s 1.640s | 2.303s 2.050s | 956 1582
NewLines3b: 1.700s 1.580s | 2.024s 1.850s | 959 1393
NewLines3c: 1.755s 1.620s | 2.176s 2.020s | 917 1348
And second the variants that are not lazy inside lines:
NewLines: 1.328s 1.210s | 1.440s 1.280s | 839 1231 | 257 5316kB 328kB 2912kB NewLines2: 1.213s 1.070s | 1.373s 1.230s | 839 1231 | 255 NewLines3: 1.222s 1.080s | 1.290s 1.130s | 878 1312 AccumLines: 1.205s 1.060s | 1.275s 1.120s | 880 1316 AccumLines2: 1.196s 1.060s | 1.520s 1.360s | 841 1234
words, too,
can be improved over the prelude version:
ghc-4.06 -O | hbc-0.9999.5b -O
real user | real user
PrelWords: 1.681s 1.550s | 2.379s 2.120s
NewWords: 1.162s 1.110s | 1.430s 1.290s
NewWords2: 1.216s 1.100s | 1.418s 1.260s
Here (in NewWords)
we have used the following directly programmed version:
> words' :: String -> [String] > words' [] = [] > words' (x:xs) | Char.isSpace x = words1 xs > | otherwise = case words' xs of > [] -> [[x]] > (ys:yss) -> (x:ys):yss > where > words1 [] = [] > words1 xs@(y:ys) | Char.isSpace y = words1 ys > | otherwise = [] : words' xs
spanLines; new timings
SpanLines2, SpanLines3, SpanLines4, HUGS, and HOPS
NewLines3, thanks to Ronny Wichers Schreur
AccumLines from Simon Marlow
AccumLines2, NewLines2a
NewLines2b, NewLines3b
NewLines3a, NewLines3c;
all lazy lines retimed