Client/Server, the Internet, and WWW By David J. Greer Abstract Much of the Internet was made possible by client/server computing. The World Wide Web (WWW) is a means of providing hypertext access to the Internet using client/server protocols. The WWW allows you to point at links to text, pictures, music, or video located on servers anywhere in the world and then play the files on your local client PC, workstation or terminal (along with more links to related information). You never need to know where the information is located or learn any obscure commands to access it. This presentation will teach you how the WWW client/server architecture works, how to set up your own WWW server for MPE or HP-UX, and what the differences are among various WWW clients. You will also receive useful tips about how to find information on the Web. David Greer set up Robelle's WWW service and he participates in the development of Lynx, the character-mode WWW client. David is the President of Robelle Consulting Ltd. and the person in charge of Research and Development for its Qedit and Suprtool products. Robelle Consulting Ltd. Unit 201, 15399-102A Ave. Surrey, B.C. Canada V3R 7K1 Toll-free: 1-800-561-8311 Phone: (604) 582-1700 Fax: (604) 582-1799 E-mail: david_greer@robelle.com WWW: http://www.robelle.com Copyright Robelle Consulting Ltd. 1995-1996 Permission is granted to reprint this document (but not for profit), provided that copyright notice is given. Client/Server, the Internet, and WWW http://www.robelle.com/www-paper/overview.html By David J. Greer Overview The World Wide Web (WWW) (http://www.w3.org) is a collection of servers distributed all over the world that respond to various clients. The WWW allows you to click on links to text, pictures, music, or video located on these servers and then to play the selected files on your local client PC, workstation, or terminal, along with more links to related information. You never need to know where the information is located or to learn any obscure commands to access it. The on-line version of this paper is available as a linked set of files (http://www.robelle.com/www-paper/overview.html) or as a large single file (http://www.robelle.com/www-paper/paper.html). Downloading this paper as a single file may take some time, but has the advantage of making it convenient to save or print the entire paper with your Web browser. To help you understand the World Wide Web, we have organized this paper into these major sections: WWW Introduction (http://www.robelle.com/www-paper/intro.html) To understand the WWW, it helps if you understand some basic Web concepts. Fundamental to this understanding is the concept of client/server computing on a global scale. The Language of the Web (http://www.robelle.com/www-paper/language.html) Whether you're reading WWW documents or creating your own, it helps if you understand the basic components of the WWW language. WWW Clients (http://www.robelle.com/www-paper/clients.html) One powerful feature of the WWW is that the information you publish on your server can be read by many different clients. In this section, we provide a quick introduction to some of the popular WWW clients. WWW Servers (http://www.robelle.com/www-paper/servers.html) If you want to make your own information available to WWW clients, you'll want to set up your own server. In this section, we discuss some common WWW server software and give our suggestions for how WWW server information should be designed. Interesting Places to Visit (http://www.robelle.com/www-paper/links.html) The WWW is a big place. Here are a few pointers to some of the things that we have liked or found useful. Summing it Up (http://www.robelle.com/www-paper/summary.html) These are our parting thoughts on client/server, WWW, and the Internet. Bibliography (http://www.robelle.com/www-paper/bib.html) A short list of books that we have found very useful for learning more about the WWW. Jump on board for a ride on the Web. We hope that you'll find enough information here to join us with your own WWW information. Introduction The WWW is a new way of viewing information -- and a rather different one. If, for example, you are viewing this paper as a WWW document, you will view it with a browser, in which case you can immediately access hypertext links. If you are reading this on paper, you will see the links indicated in parentheses and in a different font. Keep in mind that the WWW is constantly evolving. We have tried to pick stable links, but sites reorganize and sometimes they even move. By the time you read the printed version of this paper, some WWW links may have changed. The World Wide Web The WWW project has the potential to do for the Internet what Graphical User Interfaces (GUIs) have done for personal computers -- make the Net useful to end users. The Internet contains vast resources in many fields of study (not just in computer and technical information). In the past, finding and using these resources has been difficult. The Web provides consistency: Servers provide information in a consistent way and clients show information in a consistent way. To add a further thread of consistency, many users view the Web through graphical browsers which are like other windows (Microsoft Windows, Macintosh windows, or X-Windows) applications that they use. A principal feature of the Web is its links between one document and another. These links, described in the section on hypertext, allow you to move from one document to another. Hypertext links can point to any server connected to the Internet and to any type of file. These links are what transform the Internet into a web. A History of the Web The Web project was started by Tim Berners-Lee at the European Particle Physics Laboratory (CERN) in Geneva, Switzerland. Tim wanted to find a way for scientists doing projects at CERN to collaborate with each other on-line. He thought of hypertext as one possible method for this collaboration. Tim started the WWW project at CERN in March 1989. In January 1992, the first versions of WWW software, known as Hypertext Transfer Protocol (HTTP), appeared on the Internet. By October 1993, 500 known HTTP servers were active. When Robelle joined the Internet in June 1994, we were about the 80,000th registered HTTP server. By the end of 1994, it was estimated that there were over 500,000 HTTP servers. Attempts to keep track of the number of HTTP servers on the Internet have not been successful. Programs that try to automatically count HTTP servers never stop -- new servers are being added constantly. On-Line versus Batch This paper is available on the World Wide Web (on-line) or as a paper document (batch). If you are reading this via Robelle's WWW Service, (http://www.robelle.com) you probably already know how to access the on-line version. Much of the value of the Web lies in its links between one document and another. When you view this paper with a WWW browser, the links are hidden from you. When you read the text or paper copy of this paper, you see the links in parentheses. Because links tend to be long, they do not format well in the text and paper versions. Since more than half the effort of writing this paper went into finding and testing the links, we have left them in the text and printed versions, despite their distracting appearance. We will describe what the links mean a little later. What is Hypertext? Hypertext provides the links between different documents and different document types. If you have used Microsoft Windows WinHelp system or the Macintosh (http://emu.mit.edu/mac_resource.html) hypercard application, you likely know how to use hypertext. In a hypertext document, links from one place in the document to another are included with the text. By selecting a link, you are able to jump immediately to another part of the document or even to a different document. In the WWW, links can go not only from one document to another, but from one computer to another. Client/Server Computing The last few years have seen an explosion of information about client/server computing. For many people, the definition of client/server is still unclear. We describe it as a method of distributing applications over one or more computers. A client is one process that requests services of another process. These processes can be on different computers or on the same computer. The processes communicate via a networking protocol. People often think of client/server computing in terms of local area networks, PCs with graphical user interface capabilities, and servers with information that is needed by the PC clients. You do not have to implement client/server computing this way. It is possible for the same computer to be both the client and the server. The key point is that there is a communications protocol that allows two processes (often on different computers) to request and to respond to demands for services. The Hypertext Transfer Protocol When you use a WWW client, it communicates with a WWW server using the Hypertext Transfer Protocol (HTTP) (http://www.w3.org/pub/WWW/Protocols/). When you select a WWW link, the following things happen: 1. The client looks up the hostname and makes a connection with the WWW server. 2. The HTTP software on the server responds to the client's request. 3. The client and the server close the connection. Compare this with traditional terminal/host computing. Users usually logon (connect) to the server and remain connected until they logoff (disconnect). An HTTP connection, on the other hand, is made only for as long as it takes for the server to respond to a request. Once the request is completed, the client and the server are no longer in communication. WWW clients use the same technique for other protocols. For example, if you request a directory at an anonymous FTP site (e.g., ftp://ftp.robelle.com), the WWW client makes an FTP connection, logs on as an anonymous user, switches to the directory, requests the directory contents, and then logs off the FTP server. If you then select a file, the WWW client once again makes an FTP connection, logs on again, changes directories, downloads the file, and then logs off. If you use an FTP client to do the same thing, you would normally log on to the FTP server, change directories several times, and download one or more files. Only when you were finished would you log off. The Internet The Internet is the world's largest interconnected computer network. Computers on the Internet communicate using the Internet Protocol (IP) and the Transmission Control Protocol (TCP). You identify individual computers by their IP-address. This address is a 32-bit number that is usually represented by four octets (e.g., 192.40.254.0). Fortunately, you can usually refer to a computer by its name (e.g., www.robelle.com (http://www.robelle.com)). If you can send network packets to one computer on the Internet, you can send network packets to any computer on the Internet. This feature is what makes the Internet so powerful; it is also what concerns system managers. If you can send packets to the Internet, it follows that anyone can send packets to your computer, even the PC on your desktop. Accessing the Internet If you are reading the text or paper version of this paper, you're probably wondering "How do I get started on the Internet?" It is much easier to connect an individual PC and a modem to the Internet than it is to connect a server like an HP 3000 or HP 9000. We suggest that you find a local Internet access provider to connect your PC to the Net. Most access providers include everything you need to log on and start exploring. In addition, several books on connecting to the Internet also provide all the software and the telephone numbers of Internet access providers you need to get started. Once you're connected to the Internet, you can begin investigating many of the sites described in this paper. You will also be able to access and download much of the software needed to create your own WWW application which, as we discuss further on, can be of help to you, even if you never plan to connect your servers to the Internet. The Language of the Web In order to use the WWW, you must know something about the language used to communicate in the Web. There are three main components to this language: Uniform Resource Locators (URLs) URLs provide the hypertext links between one document and another. These links can access a variety of protocols (e.g., ftp, gopher, or http) on different machines (or your own machine). Hypertext Markup Language (HTML) WWW documents contain a mixture of directives (markup), and text or graphics. The markup directives do such things as make a word appear in bold type. This is similar to the way UNIX users write nroff or troff documents, and MPE users write with Galley, TDP, or Prose. For PC users, this is completely different from WYSIWYG editing. However, a number of tools are now available on the market that hide the actual HTML. Common Gateway Interfaces (CGI) Servers use the CGI interface to execute local programs. CGIs provide a gateway between the HTTP server software and the host machine. Uniform Resource Locators (URLs) Uniform Resource Locators (http://www.w3.org/hypertext/WWW/Addressing/URL/Overview.html) (URLs) specify the access-method (how), the server name (where), and the location (what) needed for a WWW client to find and access a WWW object. The general form of a URL is access-method://server-name[:port]/location Access Methods The three most popular access methods are http: This is the method provided by WWW servers. It includes hypertext linking, the hypertext markup language, and server scripts. gopher: Gopher (gopher://gopher.micro.umn.edu) was developed at the University of Minnesota as a distributed campus information service. There are gopher servers everywhere -- many of them provide campus-wide information systems. Gopher information is organized into menus. Because hypertext provides the same services as gopher and more, many sites are moving from gopher-supplied information to WWW-supplied information. ftp: The File Transfer Protocol is one of the oldest and most popular of all Internet services. You can access millions of files, documentation, source code, and other useful objects on anonymous FTP archives. You can use a WWW browser to view and to retrieve information from FTP archives. Server Name The server name is an IP host name or an IP address. WWW servers often start with the name "www" as in www.robelle.com (http://www.robelle.com) or www.mayfield.hp.com (http://www.mayfield.hp.com). The port number is usually not needed. If there are many servers on one machine (e.g., two different WWW servers on the same host), you would use a port number to select one of them. By default, WWW servers are on port 80. Other protocols have different ports (e.g., the default for FTP is 21). Most users never need to know about port numbers. Welcome Page Most WWW servers provide a welcome or home page. This is the document that you see if you specify a machine name, but not a document name (see all the examples above under "Server Name"). Good WWW welcome pages provide a short description of the information the WWW server provides, as well as links to all the other information available on the server. The welcome page must be explicitly configured for each WWW server. If you access a WWW server without giving a document name, and receive the error message "no document found", you should try one of the following common document names: welcome.html, index.html, or default.html. Location The location can be a filename, a directory, a directory and filename, a server-script name, or something specific to the access-method. Filenames and directory structure often change, so don't be surprised if a URL that worked a few months ago no longer works now. Hypertext Markup Language (HTML) When you write documents for WWW, you use the Hypertext Markup Language (HTML). (http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimerP1.html}. In a markup language, you mix your text with the marks that indicate how formatting is to take place. Most WWW browsers have an option to "View Source" that will show you the HTML for the current document that you are viewing. Each WWW browser renders HTML in its own way. Character-mode browsers use terminal highlights (e.g., inverse video, dim, or underline) to show links, bold, italics, and so on. Graphical browsers use different typefaces, colors, and bold and italic formats to display different HTML marks. Writers have to remember that each browser in effect has its own HTML style sheet. For example, Lynx and Mosaic do not insert a blank line before unnumbered user lists, but Netscape does. If you want to see how your browser handles standard and non-standard HTML, try the WWW Test Pattern (http://www.uark.edu/~wrg/). The test pattern will show differences between your browser, standard HTML, and other broswers. Creating HTML Creating HTML is awkward, but not that difficult. The most common method of creating HTML is to write the raw markup language using a standard text editor. If you are creating HTML yourself, we have found the chapter Authoring for the Web in the O'Reilly (http://www.ora.com) book "Managing Internet Information Services" to be an excellent resource. You might also find the HTML Quick Reference (http://kuhttp.cc.ukans.edu/lynx_help/HTML_quick.html) to be useful. Bob Green, founder of Robelle, finds HTML Writer (http://lal.cs.byu.edu/people/nosack) to be useful for learning HTML. Instead of hiding the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these into a text window. To see how your documents look, you must use a separate Web browser. If you don't want to deal directly with HTML, you can get a WYSIWYG HTML editor. On the PC, we have tried HoTMetal and the Microsoft Word Internet add-on. HoTMetal is produced by SoftQuad (http://www.sq.com). There is a free version, which we found somewhat unreliable, and a professional version. HoTMetal probably works best if you are writing HTML documents from scratch (we tried to edit existing documents, some of which may have had invalid HTML). Microsoft has produced a new add-on to Microsoft Word that produces HTML. The Internet Assistant (http://www.microsoft.com/msoffice/freestuf/msword/download/ia/default) is available from Microsoft at no charge. You will need to know the basic concepts of Microsoft Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word users, we found that the Internet Assistant didn't help us much. The HTML area of WWW is changing quickly. Users do not want to go back to ascii text editing after they've used WYSIWYG editors for the last several years. The Web itself carries a list of WYSIWYG HTML editors (http://www.yahoo.com/Computers/World_Wide_Web/HTML_Editors) for a variety of operating systems. Common Gateway Interface (CGI) The Common Gateway Interface (CGI) (http://hoohoo.ncsa.uiuc.edu/cgi/overview.html) provides a method for WWW servers to invoke other programs. You can write these programs with any tool or language. They usually return HTML as their output. The Robelle WWW server statistics (http://www.robelle.com/server.html) are provided by a CGI script that runs the getstats program (http://www.eit.com/software/getstats/getstats.html). Forms The WWW supports simple forms (http://www.robelle.com/forms/comments.html) with text boxes, radio buttons, and pull-down lists. Forms are processed by CGI scripts. WWW Clients You will likely first experience the World Wide Web through a WWW client. In WWW terms, these are called browsers. Browsers are available for almost all major computer platforms, however you also need the appropriate network infrastructure to make them work. Network Infrastructure What browser you use depends largely on how you are connected to the Internet. If you are using a terminal emulator and a serial connection, you will most likely use a character-mode browser. If you can send network packets from your computer to the Internet, you will probably use a graphical-mode browser. Character-Mode Browsers A popular character-mode browser is Lynx (http://www.cc.ukans.edu/about_lynx/about_lynx.html). You cannot use Lynx to display graphical images, but it does support forms, as well as all HTML 2.0. Graphical Browsers Three popular graphical browsers are Mosaic (http://www.ncsa.uiuc.edu), Netscape (http://www.mcom.com) and Microsoft Internet Explorer (http://www.microsoft.com/ie/msie.htm). Mosaic and Netscape are available for Microsoft Windows, X-Windows, and the Macintosh, while Microsoft's IE is only available for Microsoft Windows. Mosaic and Microsoft IE are free to anyone; Netscape is free to any not-for-profit institution. Network Infrastructure How you connect to the Internet affects how you view the WWW. If you connect via a modem, you won't be able to view large WWW pages, images, sounds, or video; if you have a T1 connection (1.544M bits/second), you will be able to enjoy these features. Some WWW pages assume that you have a fast connection to the Internet. Local Area Networks If your Local Area Network has a gateway to the Internet (there are several different methods to do this), you should be able to use a graphical browser on your own workstation to cruise the WWW. If you are using a PC with Microsoft Windows, you'll need to have a Winsock (http://www.microsoft.com/pages/developer/winsock/default.html) interface installed (in addition to the regular networking configuration). Macintosh users already have network support via MacTCP. UNIX workstation users should also have built-in support for networking. Dial-in Access There are two methods of dialing into a machine to get access to the Internet. If you dial in and log on as usual (on UNIX you see "login:" and shell prompt or on MPE you type "HELLO" and get a colon prompt), your computer is not directly connected to the Internet, so it cannot send network packets from your PC to the Internet. In this case, you will have to use Lynx to access the WWW. If you dial-in using SLIP (Serial Line IP) or PPP (Point-to-Point Protocol), your computer becomes part of the Internet, which means it can send network packets to and from the Internet. In this case, you can use graphical browsers like Mosaic or Netscape to access the WWW. The Internet Adapter (http://marketplace.com/tia/tiahome.html) is supposed to allow users with only shell account access to obtain a SLIP connection. Shiva (http://www.shiva.com) and Livingston (http://www.livingston.com) provide products that allow users to dial into hosts using SLIP or PPP. Character-Mode Browsers While Lynx is not the only character-mode browser, it is one of the most powerful. Lynx (ftp://ftp2.cc.ukans.edu/pub/lynx) is available for many platforms. You can obtain a pre-compiled version of Lynx for MPE/iX from (http://jazz.external.hp.com/src/www_src/index.html). Some users are disappointed that Lynx's display is limited to text. What Lynx does demonstrate is that a single server can provide information to both character-mode and graphical clients. Still, to gain a full understanding of how powerful the client/server concept can be, you should compare Lynx's capabilities to the capabilities of graphical browsers such as Mosaic or Netscape. Graphical Browsers Mosaic is one of the tools that makes the WWW so popular. With Mosaic, you can view in-line graphical images surrounded by proportional font text in multiple colors. For an excellent introduction to Mosaic, see the O'Reilly book The Mosaic Handbook (http://www.ora.com). Three versions of the book are available (Windows, Macintosh, and X-Windows). The PC version of Mosaic requires the Win32s subsystem which is described in the Mosaic readme file (ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/README.TXT). While Mosaic is popular, the newer Netscape browser is even more appealing, especially when used with slower network connections. Earlier versions of Mosaic did not display anything until an entire URL (and its associated graphical images) had been downloaded. Netscape, by contrast, starts displaying as soon as a screenful of information is available. As you page down through a document, Netscape barely pauses as it continues to download the URL in the background. The newest graphical browser is the Microsoft Internet Explorer (http://www.microsoft.com/ie/msie.htm). This browser is part of Microsoft's strategy to make the Internet an important part of all Microsoft products. Like Netscape, the Microsoft IE also does background network transfers. We perfer Netscape over Microsoft IE, due to Netscape's user interface and better reliability. External Viewers Neither Mosaic nor Netscape tries to handle all the data that can potentially be served up on the Web. They both understand HTML, in-line graphics, and URLs. Netscape can display external GIF (Graphics Interchange Format) files, but Mosaic cannot. To view images, listen to sound, watch movies, or view spread sheets, you must have external tools (http://www.ncsa.uiuc.edu/SDG/Software/WinMosaic/viewers.htm) to support these data formats. For Microsoft Windows users, a popular graphical viewer is LView (ftp://ftp.ncsa.uiuc.edu/PC/Windows/Mosaic/viewers). The Mosaic Handbook provides a good introduction to the external tools that you need to support full multimedia applications. Most of these tools also work with Netscape. WWW Servers WWW servers provide information to the Web. Server software is available for many computer platforms, but setting up a server isn't always easy. Why Set Up a WWW Server? Even if you don't have an Internet connection, there are lots of uses for an internal WWW server. WWW Server Design Setting up a server to provide information to the many different Internet clients requires extra thought, but the effort is worth it. Setting Up Your WWW Server Server software exits for UNIX, MPE, Windows NT, Microsoft Windows, and even MS-DOS. Maintaining Your WWW Server Like most applications, your WWW server will need a little help from time to time. Why Set Up a WWW Server? If you have a full-time Internet connection, you might want to set up a WWW server to provide information about your company, your division, your group, or yourself. Even if you are not connected to the Internet, you still might want to set up a server. Hypertext is a useful way to distribute information because it can contain mixed text and graphics (or more), as well as links to other documents. Using WWW servers, you can create sophisticated help systems without a lot of work. Once established, these systems then become available to all users on your internal network who have suitable client software (browsers). With CGI scripts and e-mail, you can automate forms which you now process by hand (e.g., expense reports, travel reports, or purchase requisitions). With some extra work, you could even have the forms processed directly into a database. You can also design scripts to look up information in your existing databases and display it for clients. If your users are pushing for Microsoft Windows interfaces to all of their database data, you can use your WWW server as an intermediate solution. This way users get an immediate graphical interface and managers can experience the difficulties of managing client/server configurations. WWW Server Design When you set up a WWW server, keep in mind that many different clients will be accessing your server. If your server is available on the Internet, you should not assume that the clients will all have high-speed Internet connections and graphical browsers. Consider these things when designing your WWW server: * Concentrate on your text. Well-written text conveys a lot of information. If you use text to convey essential information, then your server will be friendly to text-based clients like Lynx. * Organize documents the way you would organize a book: gather information together into chapters; each chapter should describe a single idea or related topics. Provide navigational tools (like previous or next chapter) and an overview with a table-of-contents. We have attempted to have all of these elements in this paper. * Question each graphical image that you provide. Does the graphic add meaning to the text or is it just neat? Compare the size of the graphics file to the size of your text files. If the graphical image is much larger, does it really add a lot of necessary information? * If your WWW server is on a fast network, do all the clients have fast access to your server? You may have a T1 connection (1.544M bits/sec), but many WWW clients connect via 14.4 modems. Some commercial Internet providers even charge by the hour, which makes it more expensive for clients to download large files and graphics. If your clients have a fast connection to the Internet, you can provide more graphical information and larger text files without annoying them. Nevertheless, it's a good idea to keep these limitations in mind when you're developing your server. * Try to keep files to a reasonable size (we suggest three to ten thousand bytes long). When converting existing documents to HTML, remember that they will often end up quite large (tens of thousands of bytes). Do clients want to download such a large file only to find that it is of no interest? The converse is also true. Can clients download a single file with the complete text (e.g., this paper), without having to follow all the hypertext links? * Hypertext does not mean disorganization. Provide an index or a table of contents to your web pages, so users can quickly find information. Provide summaries for long articles and files. * Use graphic-design common sense. Use white space to increase readability. If you use special effects (bold, italics, underline, horizontal rules, etc), use them sparingly to increase their effect. * If your WWW server is available on the Internet, many visitors will access your server out of curiosity. Make your welcome page attractive, but clearly identify what information your WWW server is providing. Of all the files you publish, be most careful of the size of your welcome page. It will likely be the most frequently accessed page. We also suggest that you look at the W3 Style Guide (http://www.w3.org/hypertext/WWW/Provider/Style/Overview.html). Setting Up A WWW Server First, you need to decide what computer will host your WWW information (or you could pick several hosts). If your WWW server will make information available to many machines, the host must be connected to your network or the Internet. While WWW server software is available for a variety of machines, each server software package runs only on certain operating systems. The server software you pick will have to be compatible with the host machine that provides the WWW service. WWW Server Software W3 maintain a good list of WWW Server software (http://www.w3.org/hypertext/WWW/Daemon/Overview.html). Two of the most popular UNIX WWW server software packages are NCSA HTTPD (http://hoohoo.ncsa.uiuc.edu) and CERN HTTPD (http://www.w3.org/hypertext/WWW/Daemon/Status.html). A pre-compiled copy of the NCSA HTTPD software is available for MPE/iX (http://jazz.external.hp.com/src/www_src/index.html). Windows NT is becoming more popular as a WWW server, largely due to its built-in networking support and its familiar Windows interface. Free Windows NT HTTP Server software (http://emwac.ed.ac.uk/html/internet_toolchest/https/contents.html) is available from the European Microsoft Windows NT Academic Center (http://emwac.ed.ac.uk). The Robelle Windows NT WWW Server (http://wwwnt.robelle.com) uses the O'Reilly Website (http://website.ora.com) software. Website comes with comprehensive documentation -- something other server software is lacking. Configuration and management is different for each package. We found the O'Reilly Book (http://www.ora.com) Managing Internet Information Services to be a valuable resource in setting up our WWW servers. The book is an excellent introduction to HTML, with many good examples of configurations. Unfortunately, the book only covers the configuration of the NCSA HTTPD software. Security The CERN and NCSA HTTPD packages allow the WWW administrator to configure security. By default, both packages allow anyone to connect to your WWW service. However, you can configure the servers to allow connections only from specific IP addresses (be sure to do this if your WWW service is for internal use only). You can also password protect individual files. The MPE WWW Server (http://jazz.external.hp.com/demo.html) includes a demonstration of the NCSA security features. By default, the CERN and NCSA server software allow individual directories of hypertext files. If someone specifies a URL with a directory starting with tilde (~), the server software looks for a user directory of that name and then searches under the user name for the directory public_html. Writing HTML Once you have the WWW server software running, you need to create WWW information. WWW documents use the Hypertext Markup Language (HTML). See the HTML description (http://www.robelle.com/www-paper/language.html) earlier in this paper for suggestions and tools for writing HTML. Be sure to test your files before adding them to your WWW server. We test with at least three different browsers (Lynx, Mosaic, and Netscape). We also use Weblint (http://www.khoros.unm.edu/staff/neilb/weblint.html) on all of our Web documents. Weblint checks for common errors in HTML. While Weblint isn't perfect, it does help produce HTML that is acceptable to the widest range of WWW browsers. Weblint is written in Perl (http://www.cis.ufl.edu/perl). To use Weblint, you must have a working copy of Perl. Perl is short for "Practical Extraction and Report Language". Perl is designed to be more powerful than the shell, but easier to use than C. Host Name If your WWW server is available on the Internet, it's a good idea to create an alias for the actual computer that hosts your WWW service. Most people chose "www" as the alias name. This will make it easier for you to change the host without affecting users of your WWW service. Robots WWW servers on the Internet are often visited by robots (http://web.nexor.co.uk/mak/doc/robots/robots.html). Robots usually visit Web sites in order to create indexes of the information that you publish on your WWW server. Since robots can cause problems for a WWW server, it's a good idea to create a robots.txt (http://web.nexor.co.uk/mak/doc/robots/norobots.html) file. This file tells well-behaving robots which parts of your WWW they should visit. You might want to exclude graphical images, CGI scripts, and forms from a robot search, but include all other information about your WWW server. Internal WWW Servers If your WWW server will only be available on a Local Area Network, you have more flexibility in your design. Since users will have reasonably fast access to the server, you can make your HTML pages larger. You can also distribute more binary objects, such as graphics, word-processing documents, and spread sheets. You do have to remember to configure each client browser with the information on how to handle each filename suffix (e.g., you might want to associate ".doc" with Microsoft Word). See the section on External Viewers in the Clients section of this paper for more information. Maintaining Your WWW Server Once you have your WWW server working, you need to continue maintaining it. The Web is changing rapidly. You need to insure that you obtain newer versions of the HTTPD software from the original source. All WWW server software can produce log files. If you do enable log files (some software has them enabled by default and others not), they usually grow without bounds. At Robelle, we make a copy of the current log files once a day and then we empty them. We keep the daily copies for approximately 60 days. This lets us provide statistics (http://www.robelle.com/server.html) about our WWW service through the getstats program (http://www.eit.com/software/getstats/getstats.html). Because more and more users are joining the Internet, you will likely want to continue to improve and expand your WWW information. This is a challenge, since the conversion and authoring tools are not yet well developed. At Robelle, we have tried to automate some of the production of our WWW information. For example, when the most recent change notices for Qedit/MPE (http://www.robelle.com/ftp/changes/qeditmpe.txt), Qedit/UX (http://www.robelle.com/ftp/changes/qeditux.txt), Suprtool/MPE (http://www.robelle.com/ftp/changes/suprtool.txt) and Suprtool/UX (http://www.robelle.com/ftp/changes/suprux.txt) are released, they are automatically posted to the Robelle FTP Service (ftp://ftp.robelle.com) Interesting Places to Visit The WWW is a huge place. The following are a few personal recommendations for sites that we have found interesting or useful. Your mileage may vary. Virtual References The Web contains links to everywhere. We show you a few sites that have a lot of excellent reference materials. Travel Resources Finding good travel information is a challenge. Here are a few suggestions for WWW travel resources. Searching WWW So much information is available via the WWW that finding the answer to a specific question can be hard. Here are some WWW search engines that help you to search the Web. Virtual References Yahoo (http://www.yahoo.com) contains links to many Internet resources organized into subject catagories. If you have ever had trouble finding someone's e-mail address, try the Four 11 Directory Services (http://www.four11.com) or WhoWhere? (http://www.whowhere.com) instead. You can also add your own e-mail address and other information about yourself to the Four11 or WhoWhere? directories. Travel Resources Curious about a city, a region, or a country? Planning for that big trip across Europe or Asia? You might first want to check out one of these travel resources. We have found the Rec.Travel Library (http://www.solutions.mb.ca/rec-travel) to be useful. The travel library is based on discussions from the rec.travel newsgroup. O'Reilly and Associations (http://www.ora.com) publish technical books, especially about UNIX. O'Reilly was one of the first companies to publish an on-line magazine called The Global Network Navigator (http://gnn.com/GNNhome.html). Included in GNN, is the GNN Travel Center (http://gnn.com/meta/travel/index.html) with current travel information and links to many Internet travel resources. Internet travel resources tend to be organized into major areas (e.g., Canada and the US, Europe, Asia). You often have to be patient when accessing their indexes, since they cover all countries and cities in an area. Keep in mind that England, Scotland, and Wales are usually indexed under United Kingdom, which is at the end of any listing for Europe. Searching WWW Users have invented robots (http://web.nexor.co.uk/mak/doc/robots/robots.html) to search the Web for documents. Since searches take a long time, these robots usually index everything they find into a database. The server provides the tools to search these databases. For example, InfoSeek (http://www2.infoseek.com/), Lycos (http://lycos.cs.cmu.edu/), Alta Vista from Digital (http://altavista.digital.com), WebCrawler Search Database (http://webcrawler.com/), or Architext Excite (http://www.excite.com/query.html") are all good. Because these databases are indexed from the entire WWW, you usually have to qualify your searches in order to find what you are looking for. For example, if you search for "travel" you will likely have too many choices, but if you search for "travel Alaska" the list may be just what you want. Each database is different, so be sure to try two or three before giving up on your search for information on the Web. MetaCrawler (http://metacrawler.cs.washington.edu:8080/index.html) will search many of the popular search databases at once. Summary The World Wide Web demonstrates how powerful client/server computing can be. If you are thinking of implementing client/server computing in your organization, it wouldn't hurt to first take a look at the Web. A WWW server is an application. System managers must pay attention to the security and maintenance problems that go with any large application. Creating Web documents is time consuming. It took me at least twice as long as I expected to write this paper. I spent a lot of the time finding and checking the many WWW links. With our 9600-baud connection to the Internet, this was a slow process. Tools for creating HTML are still in their infancy. We expect a lot of new tools to appear in the next year to help create HTML. It's easy to waste time on the Web, but it is one of the largest and most up-to-date resources available anywhere in the world. Get an Internet connection, a WWW client program, and start surfing! Bibliography Here is a short list of books that we have found very useful in understanding the WWW and in creating our own WWW services. Managing Internet Information Services Managing Internet Information Services Cricket Liu, Jerry Peek, Russ Jones, Bryan Buus, and Adrian Nye O'Reilly and Associates, Inc. ISBN: 1-56592-051-1 If you are managing any Internet information services (e.g., ftp, gopher, or WWW), you should get this book. The book includes an excellent primer on writing HTML. There are lots of hints on how to setup your own WWW server and extensive documentation on the NCSA server software for UNIX. The book also includes examples of CGI scripts. The Mosaic Handbook The Mosaic Handbook Dale Dougherty and Richard Koman O'Reilly and Associates, Inc. ISBN: 1-56592-094-5 There are three versions of this book: MS Windows, Macintosh, and X-windows. The book includes a copy of Enhanced Mosaic. There is a good explanation of the WWW and how clients and servers work together. The chapter Using Mosaic for Multimedia includes a description of MIME types, how to configure them, and some suggests for external viewers. This section of the book would apply to any graphical browser. Teach Yourself Web Publishing with HTML in a Week Teach Yourself Web Publishing with HTML in a Week Laura Lemay SAMS ISBN: 0-672-30667-0 This book really does do what the title says. Here is the description from the author's home page. This book describes how to write, design, and publish information on the World Wide Web. In addition to describing the the HTML language itself, it provides extensive information on using images, sounds, video, interactivity, gateway programs (CGI), forms, and imagemaps. Through the use of dozens of real-life examples, the book helps you not only learn the technical details of writing Web pages, but also teaches you how to communicate information effectively through the Web. The Whole Internet User's Guide and Catalog The Whole Internet User's Guide and Catalog Ed Krol O'Reilly and Associates, Inc. ISBN: 1-56592-063-5 One of the best introductions to the Internet. Ed Krol covers most major Internet services (e.g., ftp and WWW). He also includes references to many useful Internet resources. The appendix Getting Connected to the Internet discusses the different grades of service and provides a list of suggested Internet connection providers. WWW Pointers These are the WWW pointers for these books. * O'Reilly and Associates, Inc. (http://www.ora.com). * SAMS (http://www.mcp.com/sams). * Laura Lemay (http://slack.lne.com/lemay/theBook/index.html). * The Whole Internet User's Guide and Catalog (http://gnn.com/gnn/wic/index.html).