|Notify me of new responses|
I was doing a little operations research work with Dick Larson one semester. We had to sort 300,000 phone numbers which I figured would take a long time on my beloved Symbolics 3600. I said to myself "Why not use the 50 MIPS, 64-Mbyte RAM HP RISC machine on your desk for something other than reading mail?". I wrote a little program in MIT Scheme; it went off into never never land. I pulled out Rivest's book on Algorithms and wrote myself a radix sort. The program still just thrashed. I talked to all the Scheme wizards around me who gave me various incantations for running MIT Scheme with more heap. No answer.
Finally, Gerry Sussman said "Of course, you can't expect Lisp to do something like that; Lisp can't do things like this. If you want to deal with massive data sets, you have to use C. It is sad but true." I said, "Gerry, I think this would run on an old 1 MIPS 3600 with 8 Mbytes of RAM. I'd just rewrite this in Common Lisp. The code would be cleaner because of the generic functions for sequences and I'll throw out my private sort function and just use the one built into Common Lisp." Sussman said "If that is true then the whole Scheme project has been a waste and I'll shut it down."
"Don't say that, Gerry!" I admonished. "I've written a lot of Lisp Machine programs over the years and I'm pretty sure that it can do this." Sussman was insistent. If that creaking old LispM could sort the phone numbers, MIT Scheme was history.
It took me 20 minutes to convert the code, which ended up being about one third as many lines. It took 45 minutes to run on the 3600 whose paging bar was on continuously (reading a 30 Mbyte file into VM) but it terminated with the answer I needed.
Gerry was despondent until Bill Rozas, his grad student and one of the scheme implementors, showed up. "Of course MIT Scheme couldn't run it. That's because you were using READ to read numbers. You can't do that for 300,000 numbers. You have to write your own READ-NUM that is specialized just to read numbers. Then it will run fine." Sussman breathed a big sigh of relief. I switched to Macintosh Common Lisp.
-- Philip Greenspun, January 4, 1998
I wrote the Lisp Machine version of READ that Phil was using here during my first summer working as a UROP for the Lisp Machine project. Ironicly, at the time I was given a hard time because my new reader was <em>slower</em> than the less-featureful reader we had been using up until that time.
-- Alan Bawden, January 4, 1998
I can't believe that you'd even think of using C. Common Lisp or Scheme are so much superior to C/C++. Gifted Students use Scheme and MACSYMA was written in Lisp not C/C++ because Lisp is a better language.
-- Michael Israel, Lisp lover --, January 4, 1998
That damn reader is going to haunt me forever.
To this day, it is still not fixed.
I could try to defend it, but that would be a waste of time. When I wrote it, all I cared about was getting something that worked correctly. Once it did, I put it down and moved onto other things; it's just a case of laziness.
-- Chris Hanson, January 4, 1998
the real problem was that you were trying to sort number swith an alpha system. C is primarily an alpha system that cannot distinguish the characteristics of a number system based on the commonly used centurian system. That is why C cannot be used.
-- john fragle, January 5, 1999
Slow numeric reading has come up in a couple of projects I've done. In about 1978, I was working on a bootstrap compiler for Ada written in Simula-67 on a PDP-10. It was pretty slow overall (after all, it was just intended as a throwaway bootstrap), especially when reading numbers, and I was asked to look into why.
It turned out that the arbitrary-precision arithmetic routines were rather slow, and I tweaked them a bit (in Simula as in Scheme, everything, including stack frames, is allocated off the heap, so reducing consing and procedure calls are the main optimizations). But along the way, I noticed that the low-level lexing routines were extremely inefficient, costing many milliseconds per character read. I rewrote them so that the common cases would be handled efficiently, which sped up the compiler as a whole by a very noticeable multiple.
Bill Waite claims that lexing is in fact the slowest phase in many compilers, since after all it is the only one that must handle each character of the input.
-- Stavros Macrakis, September 16, 2002
Why on earth would anyone write code nowadays to sort 300,000 numbers? Unix sort does it all.
-- John Cowan, April 15, 2004
Amusingly, Red Hat 5.3 (IIRC) shipped for a year with a broken sort(1). Somebody at RH added a private option ("sort bolt sizes" or something) and didn't bother to test without. Most RH maintainers I teased about it said it was not such a bad thing to have shipped it broken for a year before releasing a patch.
A language that isn't good for sorting probably isn't good for much at all.
-- Nathan Myers, September 15, 2004
This a a thorny question to me. I program bash only so far, coming from a unix admin background. I want to learn programming and have a copy of ANSI common lisp after getting excited reading stuff on www.paulgraham.com. I find ANSI Common lisp hard to read, with undefined terms. I was pointed to SICP by the #scheme room on irc but they banned me for asking the differences between scheme and lisp as did the #lisp room. Does anyone know of a website that shows where scheme or clisp are different, and how they stack up against other popular tools like tcl python and ruby?
-- Rick James, April 28, 2005
Rick James: see e.g. http://c2.com/cgi/wiki?LispSchemeDifferences
(very old topic, so probably only Rick will see my response.)
Nathan Myers: Multiple linux releases (not just Redhat) had problems with sort for the simple reason that someone (reasonably, but incorrectly) set the default sort order based on internationalization/localization rather than to the C language sort order (and yes, it makes a difference).
John Cowan: (hi, long time) is correct; Unix/linux command line "sort" takes on the order of 1 second in 2005 for this kind of data (and still would have been very fast in 1998 when this issue was posted).
Philip Greenspun: the irony is clear, but nonetheless there is always a tradeoff; even C itself is ***vastly*** slower reading numbers using general facilities (e.g. scanf) than using a homebrew routine. Here we are talking about comparing optimized languages (apparently including Mac cl) against non-optimized. It's not an indictment of scheme that the number reader wasn't optimized (even in 1998); the same problem could happen in any language at all (speaking from long experience with many languages). This would be true even in the absence of the mea culpas from Chris and Alan.
-- Doug Merritt, May 6, 2005
Would you still reccomend clisp over scheme today Mr. G? -A novice programmer trying to decide what language to learn.
-- Rick James, September 21, 2005
I'm to understand that the software company ITA Software, which has a reputation for using Common Lisp heavily, uses C++ to read large data sets because Lisp is too slow.
-- Jeff Read, April 30, 2006
Isn't there a devastating irony in the fact that a language whose name is an abbreviation for "list processor" is comparatively weak at sorting long lists?
-- Chris Platt, July 15, 2020