Robelle | Products | Library | Support | Partners | Contact Us | Papers

[ Previous | Next | Overview ]

The Language of the Web

In order to use the WWW, you must know something about the language used to communicate in the Web. There are three main components to this language:
Uniform Resource Locators (URLs)

URLs provide the hypertext links between one document and another. These links can access a variety of protocols (e.g., ftp, gopher, or http) on different machines (or your own machine).

Hypertext Markup Language (HTML)

WWW documents contain a mixture of directives (markup), and text or graphics. The markup directives do such things as make a word appear in bold type. This is similar to the way UNIX users write nroff or troff documents, and MPE users write with Galley, TDP, or Prose. For PC users, this is completely different from WYSIWYG editing. However, a number of tools are now available on the market that hide the actual HTML.

Common Gateway Interfaces (CGI)

Servers use the CGI interface to execute local programs. CGIs provide a gateway between the HTTP server software and the host machine.


Uniform Resource Locators (URLs)

Uniform Resource Locators (URLs) specify the access-method (how), the server name (where), and the location (what) needed for a WWW client to find and access a WWW object. The general form of a URL is
        access-method://server-name[:port]/location 

Access Methods

The three most popular access methods are
http:

This is the method provided by WWW servers. It includes hypertext linking, the hypertext markup language, and server scripts.

gopher:

Gopher was developed at the University of Minnesota as a distributed campus information service. There are gopher servers everywhere -- many of them provide campus-wide information systems. Gopher information is organized into menus. Because hypertext provides the same services as gopher and more, many sites are moving from gopher-supplied information to WWW-supplied information.

ftp:

The File Transfer Protocol is one of the oldest and most popular of all Internet services. You can access millions of files, documentation, source code, and other useful objects on anonymous FTP archives. You can use a WWW browser to view and to retrieve information from FTP archives.

Server Name

The server name is an IP host name or an IP address. WWW servers often start with the name "www" as in www.robelle.com or www.mayfield.hp.com. The port number is usually not needed. If there are many servers on one machine (e.g., two different WWW servers on the same host), you would use a port number to select one of them. By default, WWW servers are on port 80. Other protocols have different ports (e.g., the default for FTP is 21). Most users never need to know about port numbers.

Welcome Page

Most WWW servers provide a welcome or home page. This is the document that you see if you specify a machine name, but not a document name (see all the examples above under "Server Name"). Good WWW welcome pages provide a short description of the information the WWW server provides, as well as links to all the other information available on the server. The welcome page must be explicitly configured for each WWW server. If you access a WWW server without giving a document name, and receive the error message "no document found", you should try one of the following common document names: welcome.html, index.html, or default.html.

Location

The location can be a filename, a directory, a directory and filename, a server-script name, or something specific to the access-method. Filenames and directory structure often change, so don't be surprised if a URL that worked a few months ago no longer works now.


Hypertext Markup Language (HTML)

When you write documents for WWW, you use the Hypertext Markup Language (HTML). In a markup language, you mix your text with the marks that indicate how formatting is to take place. Most WWW browsers have an option to "View Source" that will show you the HTML for the current document that you are viewing.

Each WWW browser renders HTML in its own way. Character-mode browsers use terminal highlights (e.g., inverse video, dim, or underline) to show links, bold, italics, and so on. Graphical browsers use different typefaces, colors, and bold and italic formats to display different HTML marks. Writers have to remember that each browser in effect has its own HTML style sheet. For example, Lynx and Mosaic do not insert a blank line before unnumbered user lists, but Netscape does.

If you want to see how your browser handles standard and non-standard HTML, try the WWW Test Pattern. The test pattern will show differences between your browser, standard HTML, and other broswers.

Creating HTML

Creating HTML is awkward, but not that difficult. The most common method of creating HTML is to write the raw markup language using a standard text editor. If you are creating HTML yourself, we have found the chapter Authoring for the Web in the O'Reilly book "Managing Internet Information Services" to be an excellent resource. You might also find the HTML Quick Reference to be useful.

Bob Green, founder of Robelle, finds HTML Writer to be useful for learning HTML. Instead of hiding the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these into a text window. To see how your documents look, you must use a separate Web browser.

If you don't want to deal directly with HTML, you can get a WYSIWYG HTML editor. On the PC, we have tried HoTMetal and the Microsoft Word Internet add-on. HoTMetal is produced by SoftQuad There is a free version, which we found somewhat unreliable, and a professional version. HoTMetal probably works best if you are writing HTML documents from scratch (we tried to edit existing documents, some of which may have had invalid HTML).

Microsoft has produced a new add-on to Microsoft Word that produces HTML. The Internet Assistant is available from Microsoft at no charge. You will need to know the basic concepts of Microsoft Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word users, we found that the Internet Assistant didn't help us much.

The HTML area of WWW is changing quickly. Users do not want to go back to ascii text editing after they've used WYSIWYG editors for the last several years. The Web itself carries a list of WYSIWYG HTML editors for a variety of operating systems.


Common Gateway Interface (CGI)

The Common Gateway Interface (CGI) provides a method for WWW servers to invoke other programs. You can write these programs with any tool or language. They usually return HTML as their output. The Robelle WWW server statistics are provided by a CGI script that runs the getstats program.

Forms

The WWW supports simple forms with text boxes, radio buttons, and pull-down lists. Forms are processed by CGI scripts.