?

Log in

No account? Create an account

Josh-D. S. Davis

Xaminmo / Omnimax / Max Omni / Mad Scientist / Midnight Shadow / Radiation Master

Previous Entry Share Next Entry
Historical info about internet protocols
Josh 201604 KWP
joshdavis

Curling Up to Universal Resource Locators
version 0.2 (7 January 1994)
by Eric S. Theise/verve@well.sf.ca.us


INTRODUCTION
"Yeah, you can get it from sumex."
                                               --well meaning net.friend

"When anonymous FTP is enabled, there is a special login name called
anonymous.  If you start ftp, connect to some remote computer, and give
anonymous as your login name, ftp will accept any string as your
password.  It is generally considered good form to use your electronic
mail address as the password ..."
                  --Ed Krol, *The Whole Internet User's Guide & Catalog*


Both of these statements are intended to direct an Internet user -- you
-- to a particular Internet resource.  The first is maddeningly vague.
There's no information about the access scheme, the hostname of the
Internet computer you're supposed to connect to, the directory path to
the file you want, and the name of the file itself.  Even if you know
what and where sumex is, the sheer number of files at that site can
effectively preclude you from finding what you're looking for.

The second statement is wonderful if you're new to the Internet.  It's
informative, patient, and clear, but once you've worked your way to an
intermediate level of experience, you'd like your pointers in more
distilled form.

Nothing extraneous, nothing omitted.  Readable by human or machine.
Simply put, that's the goal of the Uniform Resource Locator (URL), an
Internet resource specification currently under development.

The strongest and earliest push for URLs came from the World-Wide Web
initiative.  Since the Web is chiefly concerned with providing
high-level, automated access to a rich range of Internet services,
compacting the necessary information into something reliably read by a
machine was the driving force.  Those of us arriving on the scene today
are motivated by the need for a way to catalogue online resources in a
way that will not conflict with emerging standards, yet are usable by
humans, often with unsophisticated access to the Internet, e.g., 2400bps
dial-up to a text-only, commercial service, such as Netcom, Delphi,
Holonet, or The WELL.

It's my hope that this document will evolve into something that will be
indispensable for anyone preparing their first URL, and that people
having to decode URLs will find much of value here, too.

Areas where I have questions for the URL community are marked off by
{Question: ...}.


NOTES ON FORMAT
In the following descriptions of URLs, items separated by the pipe
character, |, represent a choice, so that

  void | /path

means that either the field is left blank, or that the file path is
specified.  Items in square brackets, [ and ], are exceptional and only
specified when necessary, so that

  [:port]

is replaced by a colon and a numeric value when the port of the resource
being catalogued is not the standard one.

In all cases, spaces have been included for clarity of exposition and
should not be included in the genuine URL.  Examples in each section
should serve as your guide.


FTP (port 21)
The file transfer protocol, ftp, remains the basic way of delivering
text or binary files over the Internet.  Since we're primarily
interested in cataloging publicly available information, we focus mainly
on files available through anonymous ftp, where a user uses login:
anonymous (or ftp) and their full e-mail address as the password, or
servers that allow access using the password: guest, or servers that
publish the needed login, password combination.

The URL does not currently address the issue of whether or not a file
is to be transferred as a binary file, leaving it up to the intelligence
of the human or machine to recognize binary file extensions (e.g., .Z,
.gz, .gif, .sea) and issue the appropriate command.

The full URL specification for an ftp session is:

  ftp:// [user [:password] @] host [:port] / void | /path

Examples:
  You want to catalog the file folklore-faq located in the
  /pub/usenet-by-group/news.announce.newusers directory of rtfm.mit.edu,
  available via anonymous ftp.  Since this is a standard use of
  anonymous ftp (port 21, login: anonymous, password: e-mail-address),
  the URL is:

  ftp://rtfm.mit.edu/pub/usenet-by-groups/news.announce.newusers/folklore-faq

  You want to catalog the file alien.visitors located in the
  /pub/et directory of martians.org.  Their systems administrators are
  new to the planet, and don't know about Internet conventions.  They
  run ftp on port 666, and require people to use login: humanoid,
  password: visitor.  The URL is:

  ftp://humanoid:visitor@martians.org:666/pub/et/alien.visitors

  Note that this is a completely fictitious example; I have never come
  across an anonymous ftp site running on an alternative port, or one
  that wishes its contents to be public while requiring some nonstandard
  login/password combination.


TELNET (port 23)
Telnet is the primary way of establishing connections to remote
computers over the Internet.  The standard port for telnet is 23,
although a number of specialized servers use different ports for
different services. Similarly, many hosts require well-publicized
login/password combinations to access special services.

The full URL specification for a telnet session is:

  telnet:// [user [:password] @] host [:port]

Examples:
  You want to catalog the InterNIC, a central repository of
  information about the Internet, offering whois, wais, gopher, x.500
  addressing, and other lookup services.  Since the InterNIC runs on the
  standard telnet port and requires no login, the URL is:

  telnet://rs.internic.net

  You want to catalog the University of Michigan's Weather
  Underground.  It requires no login, but it does run on the
  non-standard port 3000.  The URL is:

  telnet://madlab.sprl.umich.edu:3000

  You want to catalog the American Type Culture Collection.  It runs
  on the standard telnet port 23, but requires the login: search,
  password: common, before you can get in.  This information is widely
  known (e.g., it's listed in Scott Yanoff's Special Internet
  Connections).  The URL is:

  telnet://search:common@atcc.nih.gov


{Question: the URL Internet Draft has a Specific Scheme paragraph
devoted to telnet, rlogin, and tn3270.  I do not see a need to address
rlogin, but given the existence of numerous Internet Libraries requiring
tn3270 connections, I wonder if there should be a tn3270: URL?}


MAIL/SMTP (port 25)
E-mail is the most universal service provided by the Internet and other
systems in the Matrix (e.g., BITNET, FidoNet, uucp, and commercial
services such as CompuServe, Prodigy, America OnLine, MCIMail, GEnie,
and others).  Because mail is handled by the user's machine and not
through a client/server interaction with a remote machine, the // that
typically separates the scheme from the address is not included in the
mailto URL.

The full URL specification for a mail message is:

  mailto:user@host

Examples:

  You've come across this entry in the List of Lists.

  386USERS@UDEL.EDU

       A moderated list for Intel 80386 topics, including hardware and
       software questions, reviews, rumors, etc.  Open to owners, users,
       prospective users, and the merely curious.

       Archives are available via an electronic mail server.  Details about
       its use can be obtained by sending a request to
       386USERS-REQUEST@UDEL.EDU.  All requests to be added to or deleted
       from this list, problems, questions, etc., should be sent to
       386USERS-REQUEST@UDEL.EDU.

          List Maintainer: James Galvin  <galvin@udel.edu>
          List Moderator:  Bill Davidsen <davidsen@crdos1.uucp>

  The URL for subscribing to the list is:

  mailto:386users-request@udel.edu

  The URL for participating in list discussions is:

  mailto:386users@udel.edu

  The URL for the list moderator is:

  mailto:davidsen@udel.edu

  and the URL for the list maintainer is:

  mailto:galvin@udel.edu

{Question: information is scarce on the mailto: URL.  Those of us in the
cataloging business would like to see ways to encode common instructions
for using mail services into the URL, such as the common

    SUB LISTNAME-L Firstname Lastname

convention for listservs.}


GOPHER (port 70)
Gopher is one of the simplest Internet services to use, but one of the
more complicated ones to create URLs for.  The good news is that current
gopher clients will display a resource's URL for you when you use their
"Get Information" command.

  Macintosh clients: command i
       unix clients: =
       NeXT clients: command i
         PC clients: choose "Item Inspector" from the menu

The full URL specification for a gopher session is:

  gopher://host [:port] [/gophertype [selector] ][? search]

The standard gopher port is 70.  You have to specify the port only if
the gopher is running on a nonstandard port.  However, if you
cut-and-paste your URLs, there is no need to delete the ":70" port
information.

Standard gophertypes include:

  0: text file
  1: directory
  2: CSO/qi phone book server
  3: error
  4: Macintosh .hqx/BinHex file
  5: DOS binary archive file
  6: uuencoded file
  7: index/search server
  8: telnet session; use the telnet URL described above
  9: binary file
  T: tn3270 session

Experimental gophertypes include:

  s: sound file
  g: GIF file
  M: MIME (multipurpose internet extensions) file
  h: html (hyper text markup language) file
  I: image file
  i: inline text type (used by panda; I don't know what this is)

The selector is the string used to give the path to a particular area of
a gopher.  It isn't needed if you're cataloging an entire gopher.  Note
that the selector and the sequence of menu choices usually bear some
resemblance to each other, but typically they are not the same.  The
selector is the definitive path to the resource.  Because the selector
includes the gophertype, most gopher: URLs look like they repeat
themselves.

Examples:
  You want to catalog the University of Minnesota's Mother of all
  Gophers.  You want the whole thing, and it's a standard gopher.  The
  URL is:

  gopher://gopher.micro.umn.edu

  The longer

  gopher://gopher.micro.umn.edu:70

  is also perfectly acceptable.

  You want to catalog the Civic Nets, Community Nets, Free-Nets, and
  ToasterNets section of the WELLgopher.  It's a standard gopher, but
  you only want one section of it, so the selector is important.  The
  URL is:

  gopher://gopher.well.sf.ca.us/11/Community/communets

  Remember that the first '1' in the '11' is the directory gophertype,
  and the second '1' is actually part of the selector,
  1/Community/communets.

  You want to catalog the veronica server at the University of
  Manitoba.  It has the search type, 7, and runs on the nonstandard port
  2347.  The URL is:

  gopher://gopher.umanitoba.ca:2347/7

Here are two examples where a gopher URL is not a gopher URL.

  Type=1
  Name=ANS CO+RE Systems, Inc. (US and Int'l)
  Path=ftp:ftp.ans.net@/pub/info/
  Host=gopher.cic.net
  Port=70
  URL: ftp://ftp.ans.net/pub/info/

  The definition of a gopher directory is broad enough that a gopher can
  point to an ftp site.  That's what is happening here, and the URL
  given by the "Get Information" command is correct as is.

  Type=8
  Name=NYSERNet (NY)
  Path=nysernet
  Host=nysernet.org
  Port=23
  URL: gopher://nysernet.org:23/8nysernet

  There are two things to note here.  Type=8 indicates this is a
  standard telnet session, and this is confirmed by the Port=23 line. If
  you were to try this gopher item, you'd see the telnet indicator,
  , be told that you were leaving gopher, and that you should log
  in using the name "nysernet", which is taken from the Path=nysernet
  line.  The correct URL for this entry is:

  telnet://nysernet@nysernet.org


{Question: there is no finger: URL.  Because a fair amount of useful
information is only available this way, I think there should be one.

FINGER (port 79)
Although finger was originally intended as a way to access information
about users (time and date of last login, personal information) at local
or remote sites, many novel uses of finger have appeared on the
Internet.  These include election reports, random quizzes, earthquake
and weather information, and the infamous appliance reports, e.g.,
vending machines.  When used without a userid, finger often supplies a
list of presently logged-in users.

At present there is no finger: URL.  If there were, the full URL
specification for a finger request might be:

  finger:// [user] @host [:port]

Example:
  You want to illustrate how quickly the Internet community responds in
  the face of network or personal emergency.  When Brendan Kehoe, author
  of Zen and the Art of the Internet, was critically injured in an auto
  accident, Cygnus Support made a finger address available for getting
  up-to-date information on his condition.  The URL would be:

  finger://brendan-news@cygnus.com
}


HTTP (port 80)
Http stands for hyper text transport protocol.  It is an increasingly
common Internet service due to the growing popularity of the World-Wide
Web, and its Mosaic and Cello clients.

The full URL specification for an http session is:

  http://host [:port] [/path] [? search]

Example:
  You want to catalog the FBI's information page for the UNABOM
  unsolved bombing case.  It's a standard html (hypertext markup
  language) file on a server using the standard port.  The URL is:

  http://naic.nasa.gov/fbi/FBI_homepage.html


NEWS/NNTP (port 119)
Network news, aka USENET, is like e-mail in that it passes easily across
network boundaries.  Like mail, news is typically accessed from a local
rather than a remote server, and for this reason the // that usually
separates the scheme from the address is not included in the news URL.

Although the news URL allows you to specify an article's unique message
identifier, this is rarely used.  Two primary reasons for this are that
there is no established protocol for accessing archived articles, and
that many of the more important articles, such as FAQs, are reissued
periodically.

The full URL specification for news is:

  news: * | newsgroup | message_identifier@host

{Question: is "host" appropriate here?  The URL Internet Draft says that
"News host names are NOT part of news URLs." (p. 9)}

Example:
  You want to catalog the newsgroup, sci.virtual-worlds, which is one of
  the primary sources of information and discussion about all aspects of
  virtual reality.  The URL is:

  news:sci.virtual-worlds


PROSPERO (port 191)
The full URL specification for a prospero session is:

  prospero://host [:port] /path [% 0 0 version [attributes] ]

{Question: I have no direct experience with prospero; could someone
supply me with an illustrative example?}


WAIS/Z39.50 (port 210)
Wais stands for Wide Area Information Servers, and refers to a type of
distributed database search that has become popular on the Internet over
the past few years.

The full URL specification for a wais search is:

  wais://host [:port] /database [? search]

Example:
  You want to catalog the k-12-software archive offered for wais
  search.  The URL is:

  wais://info.curtin.edu.au/k-12-software.src

{Question: I always use wais via telnet to wais.com; can somebody verify
the format of this URL?}

{Question: I have left the x500: and whois: URLs out of this version
since the Internet Draft highlights them as subjects for future study.
Does anyone have more current information?}


REFERENCES
Marc Andreessen, "A Beginner's Guide to URL's".
<ftp://ftp.ncsa.uiuc.edu/web/mosaic-papers/url-primer.ps.z>.


Tim Berners-Lee, "Uniform Resource Locators: a unifying syntax for the
expression of names and addresses of objects on the network", Internet
Draft Version 7 (14 October 1993).
<gopher://rusmv1.rus.uni-stuttgart.de/00/software/ftp_server/stgt/org/>



DISCLAIMER
This document will hopefully undergo rapid change in the first few weeks
of its existence.  Even in its present, rickety form, I'd appreciate it
if you keep it intact.

This document has benefited from discussions with Dirk Herr-Hoyman,
David Robison, and Larry Masinter.  Errors and omissions are mine.
Let me know when you find them, and please suggest ways to make this
document more useful.

The definitive source for this document is
    gopher://gopher.well.sf.ca.us/00/matrix/internet/curling.up.02

--
  Eric S. Theise <verve@well.sf.ca.us>
  P.O. Box 460177, San Francisco, CA 94146.0177
  Internet Domain Editor, Millennium Whole Earth Catalog
  The WELL: internet, matrix, & news conference host + gophermeister

Source