Sow and cubs.  Katmai National Park, Alaska.

Web Site Development

a modest proposal by Philip Greenspun for Web Tools Review

Bison in Yellowstone National Park In Chapter 10 of Database Backed Web Sites, I laid out the following plan for the best way to develop an on-line service:
  1. publisher/editor decides what service he wants to offer, e.g., "automatic survey of camera reliability for readers of photo.net"
  2. programmer builds prototype with text-only user interface
  3. publisher/editor critiques prototype
  4. programmer refines user interface
  5. graphic designer comes in to add graphics/layout, but without changing user interface
I didn't think that a mostly static HTML site would need a similar plan. After all, a six-year-old with Claris Home Page or Netscape 4.0 can build a reasonable site so why does it take an MIT nerd like me to pontificate about methodology?

Part of the reason that I thought like this is that I hadn't really ever worked with other people on a static site. People usually come to me with ambitious on-line service development projects. If they are going to blow $100,000 on graphics and HTML "programming" then they usually go straight to a designer like my friend Ulla. The only static site of which I really knew the cost was my personal one. It turns out that, after two decades of working as a software developer, I happen to think and work very formally. This means that the linked HTML structures I produce are reasonably navigable, have relatively few HTML syntax errors, and are oftentimes machine-parsable even when I've hand-produced them.

Now that I've coerced Ulla into joining me in ArsDigita, I have exposure to a broader range of sites. My first rude awakening was that even a seemingly trivial site project is fraught with peril.

First, it is very likely that the people you'll be working with are incapable of thinking formally or driving simple desktop tools. Why? Because the Web is six years old now (1997). It was the case in 1992 (I think) that a user of a NeXT computer could browse an HTML document, edit it without seeing the HTML tags, and press a button to publish it back to the server. By early 1995, someone with a Mac or a PC could do this in NaviPress (subsequently purchased by AOL and subjected to a humiliating name change).

So if a person had something to say and a modicum of ability to think formally, why in 1997 would they not already have a Web site? If you're working with them on their first Web site, ipso facto they either have nothing to say or are unable to think formally. Either way, you are in deep trouble.

[Note: there are of course exceptions to this rule. My friends Mark and Aimee were off in Tanzania when the Web boom hit. When they got back, they didn't have funds for scanning to PhotoCD and a server. So I got HP to pay for the scans and batch-converted their 550 images so that they wouldn't have to spend weeks in PhotoShop. I gave them HTTP PUT access to my server, and they went off and produced their Sukuma site by themselves. But this is not a validation of the "give someone who has never built a Web site Netscape 4.0 approach"; it just means that sometimes one gets lucky.]

People who have never built a Web site have all kinds of mistaken ideas. First, they think that computers are somehow intelligent and magic. It would never occur to them that a computer can't handle a special case or that my batch conversion scripts need exactly one line per image and that an extra carriage return or left-out vertical bar will abort the process. Another mistaken idea that novice Web publishers have is that spending time on graphic design is a useful activity.

Somewhere during the process of building this site, the new publisher has to be educated into realizing (1) how stupid computers are and therefore the value of formal thinking, and (2) that there are a lot of difficult structural and content decisions that must be made before a graphic designer can be hired.

Here then is a site development plan.

Draw a Site Map

National Gallery of Canada (Ottawa) I wish that I had one to show you for my site. Macmillan drew some decent ones for Chapter 14 of Database Backed Web Sites. They show my personal Web site in its juvenile incarnations. But the final drawing shows the site as it was in early 1997 and it was already too big for a graphic. Instead we put in a simple outline, more or less like my current on-line map.

Around 13th street and 6th Avenue.  Manhattan 1995. The goal of putting the site map down on paper is to communicate to others on the project what the ultimate goals are. Areas that are to grow should be identified and perhaps given their own directory in the Unix file system (remember that one must never relocate content as a site grows or it will break precious links from other sites).

Another goal of the site map is to think about navigation and expose flaws in the publisher's way of thinking about his new site. The most common flaw is CD-ROM fixation. A CD-ROM is something in which readers have a big investment. They go to the shop, they pay $50, they transport it home, they disable their computer by ejecting whatever useful CD-ROM was already in the drive. At this point, the user has a huge investment in the multimedia work and will be delighted with a 2-minute introduction complimenting him on what a smart purchase he has just made.

People who are laboring under the misconception that the Web is anything like a CD-ROM will betray their misconceptions by making the homepage a useless welcome page that links only to another useless welcome page or two. Eventually they'll have a table of contents page like the one for photo.net but before they'll have a few pages and graphics that will waste at least three minutes of a modem user's time (the "entry tunnel"). These pages reflect the publisher's belief that (1) his is one of a tiny handful of delightful sites on the Web, (2) users will be slavering with anticipation to see his priceless content, and (3) AltaVista does not exist and will not dump users directly into his site.

Another warning sign is that the site contains a "user's manual", even a subtle one such as "this site best viewed in Netscape 4.0". People go out and buy $500 printers and won't read the manual. Why does a new Web publisher with a tiny site imagine that a user is going to read his user's manual and then go out and spend a few hours tuning up his software installations?

This is the time for the Web nerd to remind the new publisher that Web users don't have any investment in particular sites, that they don't have much patience, and that AltaVista does not have any patience for entry tunnels.

This is also the time to make sure that the navigation strategy is sound. If a user got dumped by a search engine into a randomly chosen page on the map, would he be able to find his way back to the table of contents?

Assemble and Structure Content

Frozen waterfall.  Melrose, Massachusetts Once everyone is happy with the site map, the nodes should be given filenames, e.g., "foobar.html". Then people must create these files and stuff them with the content that they are ultimately to hold. If there are images or audio clips to be included, some way of organizing these should be devised. Naming conventions should be published to the team. (Personally I do everything in a PhotoCD-centric manner).

This is a good time to make sure that everyone is up to speed. If people say "I'm getting it, I'm a multimedia wizard now, look at this attractive background GIF" then you need only point out that the file system is not filling up with the filenames that were on the site map. The new publisher must understand that if files are not showing up then no progress is being made.

This is a good time to show the site to a programmer. If there are a few dozen pages that have nearly identical structure then it might be much more cost-effective and reliable to have the content authors organize things into a flat file or RDBMS and have the programmer either (1) write a custom Perl script to grind out all the .html files, or (2) write a CGI or API script to pump out the .html files on-the-fly.

Make a text-only site

Assuming that the people assembling the content were completely incapable of "programming" HTML, one person should sit down and edit each content file to make legal HTML with navigation, page ownership signatures, titles, headlines, etc. When this is done, everyone will be able to get a feel for the site and holler before big money is invested in this particular site map.

This is also the time to add links to collaboration services (like the ones that I offer for free to other Web publishers at the top of Web Tools Review). Do you want readers to be able to discuss content? In one forum for the whole site or in sub-forums? Do you want to configure a comment server (like the one at the bottom of this page) to collect page-specific comments? Are you going to ask readers to join a mailing list? If it turns out that you want collaboration out the wazoo then maybe it is time to reconsider and go whole hog for a RDBMS-backed site.

Hire a graphic designer

This step is optional for personal sites though I guess it is becoming mandatory for corporate sites. Remember that sticking with plain text hasn't kept my photo.net server from getting 20 hits/second and that bad graphic design is far harder on the user than no graphic design. Also remember that nobody will laugh at your plain text site and say "look at these losers who spent $50,000 on design for a content-free site."

How to work with a designer? It is probably best to first make some decisions as to what kind of user queries you can afford to answer. If you can't afford to pay people to respond to email wondering about browser incompatibilities and crashes then you probably can't afford to publish JavaScript or Java, even if the initial programming were free. Here's an example of a set of requirements I might give to a designer:

Please don't construe the above as saying that designers shouldn't be given any scope. On the contrary, I think a graphic designer worthy of the name should be given maximum scope to develop user interface hints. For example, suppose you want older photos on your site to be distinguished from newer ones. You could tell the designer "come up with a way to stick a gold-leaf frame-type graphic around all the old photos." Personally, I would ask the designer to "come up with a graphical and/or text-y way to remind the reader that he is looking at an old photo rather than a new one."

Come up with a maintenance plan

Towing away my useless Avis rental car, in Cortina A lot of big companies, at least for awhile, will spend $millions every year paying graphic designers to perform essentially clerical functions (e.g., pasting new text into HTML templates). Oftentimes both publisher and typists get upset on a site like this. The publisher is frustrated because he has to holler into a telephone to get a page looking the way he wants. The graphic designer is frustrated because he is being used as a typist. If you don't have the budget and/or patience to maintain a Web service this way, then you need some plan for direct maintenance of the site by the publisher.

With a database-backed site this is straightforward. You build admin pages into which the publisher types text fragments. Your programs weave these together at run-time into complete HTML pages.

With a static site, you have to give the publisher some kind of tool, e.g., AOLpress, Adobe Site Mill, Claris Home Page, Microsoft Front Page, or Netscape 4.0, that can do HTTP PUT and support simple edits of a page. Alternatively, the publisher could spend a few hours learning enough HTML to simply edit the pages directly.

Right now I think that most desktop tools are so badly engineered and they put it so much crud (e.g., " " all over the place) that they are more trouble than they are worth. The extra crud makes it very hard for a Web expert to ever edit the pages manually. You can use the preferences field for Netscape Composer (part of 4.0) to register some simple text editor as your preferred way of working with HTML. Then you just use Netscape 4.0 to do the HTTP GET and HTTP PUT when you're done editing. You can add a little item to your page that says "this site engineered with Notepad" or "this site engineered with SimpleText" :-)

Version/source control

100th Anniversary Boston Marathon (1996). Programmers and technical writers at large companies are familiar with the problem of lost updates when multiple people are editing the same document. But Web publishing is the first time that the average person has to confront the problems of version/source control. You have to educate contributors maintaining a site via HTTP PUT that, if they aren't careful to grab the page just before editing and then save it right back, there is a serious risk that they will be overwriting someone else's edits.

[Note for non-programmers: a lost update happens when Joe grabs Version A at 9 am from the Web site and spends a day editing it. Mary grabs Version A at noon and fixes a typo, writing it back at 12:05 pm (call this Version B). Joe finishes his edits at 5 pm and writes the document back to the server (call this Version C). Unfortunately, Version C is an edit of Version A and hence does not include Mary's typo fix. So Mary's update is lost. I wrote more on this topic in Chapter 11 of Database Backed Web Sites, though with a financial transaction example.]

Standard programmer tools like Emacs and RCS on Unix provide all kinds of aids for version/source control. At the very least, Emacs will automatically warn you that "the file you're about to edit has been changed on disk, are you sure you want to edit the old version?" But magic Web tools are still catching up. The fact is that if everyone were using Emacs on a Unix box to edit the HTML, they'd have better support for version/source control!

Again, there is nothing to really do here except use email and telephone to synchronize peoples' efforts so that documents get checked out and checked back in.

Are you failing?

Limelight.  Manhattan 1995. If the project is a failure and you haven't been able to educate contributors into doing things sufficiently formally, then you need to impose structure on them. Budget $20-50,000 to rebuild the site as a database-backed system where people can only contribute content via Web forms. The site's structure will then be a little rigid but at least you won't have broken links, illegal HTML, etc.



philg@mit.edu

Reader's Comments

I came to this page looking for pointers to tools that would be useful for a large company intranet development process that has gotten out of control. Didn't find any, so here are some in case others arrive similarly disappointed. Look at INSO's Dynabase, Interwoven's TeamSite, 2Bridge Software, NetObjects TeamFusion. Watch out for site management tools that put all the content into their own proprietary database-driven systems and generated the site dynamically; most dynamic database driven sites don't need to be, and you lose performance and create a single point of failure. Also stay away from proprietary template systems like Allaire Cold Fusion, FrontPage extensions and Vignette StoryServer. They lock you into a single vendor application that loses the openness to new tools and techniques your web site deserves.

-- Dave Britton, June 8, 1998
Add a comment | Add a link