This paper is a brief introduction to the new protocol. It does not attempt to be comprehensive but discusses some of the most notable features of the specification.
Area: protocol development
GET /index.htmlto the server and the server replied with the contents of the file, closing the connection at the end of file.
This simple protocol had many drawbacks. For one, all documents were assumed to be HTML text; the server had no way of sending any metadata. It was quickly superseded by HTTP/1.0 [2] which introduced additional header fields, both in the request and in the reply. For example a typical transaction might be:
GET /index.html HTTP/1.0 blank line
and the server would reply:
HTTP/1.0 200 OK Date: Tue, 04 Feb 1997 23:11:26 GMT Content-Type: text/html Content-Length: 1331 blank line ...Contents of document Close connection
Actually, servers and browsers implemented many of the features in HTTP/1.0 before it became a standard. This was due to the unprecedented growth in demand for the Web and Web services. The standardisation was done a posteriori.
The most important changes between HTTP/1.0 and HTTP/1.1 are explained in the sections that follow.
When HTTP was invented at CERN, nobody could have predicted the popularity the Web would achieve. In 1995, on the Merit network, HTTP became the leading IP protocol by request volume, overtaking both FTP, and the combination of all the other TCP/UDP protocols, which includes SMTP (Mail). It is highly likely that these findings extrapolate well to the rest of the Internet. Therefore any improvements to HTTP will help ease the demand on network resources (file descriptors, buffers, control blocks, etc) along the transmission path.
Many proposals were discussed in forums such as the WWW-talk list. The features implemented in HTTP/1.1 that are directed towards making Internet usage more efficient are persistent connections, chunked encoding, byte ranges and caching. In addition, the non-IP virtual hosting feature will ease the demand on IP addresses.
With persistent connections it is necessary to delimit each transfer by using the Content-Length header so this is no longer optional. However there are some circumstances where the server cannot know the length of the data ahead of time; or the data is dynamically generated and the server cannot or does not wish to buffer the data to find the length; or the data is of unbounded length, for example, a stream of continuous data refreshing the display at regular intervals. In such cases, the server can use the new chunked encoding method. This method sends the response in pieces, each enclosed in a MIME [6] envelope.
Persistent connections have another advantage. Clients can take advantage of pipelining, that is, sending the next request immediately following the previous one without waiting for the reply to the previous request to completely arrive. The server is still required to service the requests in order. This saves round-trip times and TCP packets and is of great benefit over high-latency links. A more detailed discussion of persistent connections is in Section 3 of this paper.
Persistent connections are activated only when the client indicates that it can handle HTTP/1.1 replies by sending a HTTP/1.1 request.
Probably the most common use will be for resuming interrupted transfers. However clients can use it in any way appropriate, such as transferring parts of structured documents. For example, a client might first request the description section of a word processing file, decide which pages are desired and request only those pages. Again this feature reduces unnecessary traffic.
In HTTP/1.0 the client only sends a partial URL and the host portion is implicit in the host that the client connects to. There is one exception, when requesting from a proxy, in which case the full URL is required. Virtual hosting could only be done by host aliasing, in which a host has more than one IP address bound to a given network interface. (Recall that IP addresses identify network interfaces, not hosts.) This allows the same hardware to run more than one Web server but at the cost of requiring more IP addresses. Even though IPv6 (IP version 6 protocols, with a 128 bit IP address space) is on the horizon, we need to be parsimonious with IPv4 addresses as IPv4 will continue to exist for many years more. In addition, host aliasing is not supported by all operating systems.
HTTP/1.1 was designed to provide multiple virtual hosts without using additional IP addresses. This is done by requiring the client to provide the (virtual) hostname along with the URL. This is done either with the new Host header field or by sending the full URL in the request. The drawback is that the full potential cannot be realised until a significant number of clients have been updated to use HTTP/1.1. Perhaps at some point, when there is a large body of HTTP/1.1 servers, old HTTP/1.0 clients will not be able to access virtual Web hosts.
For example:
GET /index.html HTTP/1.0 If-Modified-Since: Fri, 14 Feb 1997 19:43:31 GMTmeans: if the document has been changed after this time, then send the contents, otherwise send nothing, and signal this with the "304 Not Modified" status code. Often, when a resource is fetched again, the contents have not changed. This doesn't avoid the need to connect to the server to ascertain the status of the resource, but it does avoid the data transfer which can be significant for graphics (and which don't change often).
HTTP/1.1 defines other conditions: If-Unmodified-Since, If-Match, If-None-Match, and If-Range. If-Modified-Since and If-Unmodified-Since work on resource timestamps and are used to update cached information efficiently. If-Unmodified-Since will probably be used by the PUT method. If-Match and If-None-Match work on resource names. If-Range is intended to work in conjunction with the other directives and informally its meaning is: if the resource is missing, send me the parts I am missing, otherwise send me the entire resource. If-Range is only valid in conjunction with a Range header.
The status code "402 Payment required" is intriguing. The specification document only says: This code is reserved for future use. As far as the author knows there are no specifications of how this status code would trigger payment mechanisms; no doubt it is being worked on. Obviously this was designed with electronic commerce in mind.
The status code "409 Conflict" anticipates situations where updating a resource may conflict with another use, say a locked file or database. It is most likely to be triggered by database and publishing applications.
Two types of content negotiation are available in HTTP: server driven and agent driven. Server driven means that the server chooses the most appropriate resource, from a possible selection, based on information provided by the client. This information may come from headers that declare client capabilities, such as Accept-Charset, Accept-Encoding, Accept-Language and User-Agent. The server may however use information in other headers or even from extension header fields.
Here is an example:
Accept-Language: da, en-gb;q=0.8, en;q=0.7This says: I prefer Danish, but will accept British English and other types of English.
The drawbacks with server driven negotiation are: (1) it is impossible for the server to always know best, (2) it is inefficient for the client to describe all its capabilities and may violate privacy, (3) it complicates the implementation of a server and (4) it limits the ability of a cache to use the same response for multiple user requests.
In agent driven negotiation, the server sends a list of available representation of a resource or resources then the agent selects from among these. The alternatives may be in the header field Alternatives or as part of the resource body. The choice could be made automatically by the client (in the standard called the user agent), based on what the user agent knows about the user, or could be presented to the user, possibly as a hypertext menu.
The disadvantage of agent driven negotiation is that it requires an additional request to fetch the data. However caching will help reduce the impact of the second request.
Upgrade: SHTTP/1.3If the server agrees, it returns a "101 Switching Protocols" response.
Security is a very large area; understandably the designers of HTTP/1.1 did not wish to burden the base protocol with volumous specifications of authentication protocols. It could also be a decision to "let the market decide" which security protocols are appropriate.
In the clarification, it was reaffirmed that communicants with the same major protocol version number (i.e. 1 for HTTP/1.0 and HTTP/1.1) must be interoperable, in the sense that the meaning of headers do not change between minor version numbers. Participants that receive a header they do not understand should ignore the header. If the major protocol version number of the message is greater that that which the recipient is prepared to accept, the recipient must not accept the message. This is a dormant point at the moment since it only arises in communication between HTTP/0.9 and HTTP/1.0 communicants, i.e. a HTTP/1.0 client should not (and in fact, has no way to) send HTTP/1.0 headers to a HTTP/0.9 server. It will become important if HTTP/2.0 is developed.
The version number transmitted in the response from a server should be the version number that the server supports, not the version number in the request. Thus a HTTP/1.1 server should truthfully say that it supports HTTP/1.1 even to a HTTP/1.0 client request. There was in fact a proxy that mislabelled a HTTP/1.1 reply as invalid; this was quickly fixed.
The robustness principle [9] states that "an implementation must be conservative in its sending behaviour, and liberal in its receiving behaviour." A consequence of this is that a HTTP/1.1 server should not send HTTP/1.1 headers when it knows the client only understands HTTP/1.0. A client may send a lower request version if it has ascertained that the server is buggy for the higher request version. However a server may not downgrade a HTTP/1.1 or higher version request. Presumably this rule will force clients appearing on the market to implement the specifications correctly.
Another point is that the protocol level applies between two communicants in a hop-by-hop fashion, i.e. a browser and a proxy, or a proxy and a server. It does not apply over the whole transfer path. Therefore version numbers are not propagated. Proxies and gateways must be prepared for mismatches in version numbers and take the appropriate action: version upgrading of the message, tunnelling, respond with an error, etc. This also means that proxies must forward unknown headers, unless they are protected by a Connection header.
Non-IP based virtual hosting is of great interest to Web providers and Internet infrastructure organisations. It allows almost unlimited numbers of Web sites to be served from one server. For example a Web provider could enter one host alias per customer, each pointing to the same Internet host, into their Domain Name System (DNS) [10] tables and have each one activate a different virtual Web server, but with only one server running. In the long run this will conserve IP addresses.
Payment and security protocols go hand in hand, but HTTP/1.1 provides very little in the way of recommendations. This is likely to become a busy area of development.
We are likely to see greater use of HTTP in publishing mode, where information is uploaded to the server. Although the Range and Content-* headers provide additional semantics for the PUT method, this may not be sufficient for some users who want semantics approaching that of a remote file store. With the spate of interest in Network Computers, additional protocols may be brought into use, such as WebNFS [11].
It is also likely that future multimedia services will use HTTP to some extent, at least as the primary user access channel. We are likely to see in future combinations of HTTP and other protocols such as RTSP, a protocol for real-time audio and video data.
With an eye towards reducing the amount of standardisation work required to introduce new features in HTTP, and to allow server and clients to introduce new protocol features on a private use basis, work is being done on specifying a Protocol Extension Protocol (PEP). This is likely to appear in conjunction with HTTP/1.2.
The author wishes to acknowledge that this work was carried out within the Cooperative Research Centre for Research Data Networks established under the Australian Government's Cooperative Research Centre (CRC) Program and acknowledge the support of the Advanced Computational Systems CRC under which the work described in this paper is administered.