A Technical Overview of the New HTTP/1.1 Specification

Ken Yap
CSIRO Mathematics and Information Sciences
and ACSys CRC
Locked Bag 17
North Ryde 2113
ken.yap@cmis.csiro.au

Abstract

The HTTP/1.1 specification became a draft standard in January 1997 when it was published as the Internet RFC (Request for Comments) 2068 [1]. This paper explains the major improvements of HTTP/1.1 over HTTP/1.0 and how it will affect Web software.

This paper is a brief introduction to the new protocol. It does not attempt to be comprehensive but discusses some of the most notable features of the specification.

Keywords

HTTP, protocols, standards, Internet, networking

Area: protocol development

1. In the Beginning

The first HTTP, now called HTTP/0.9, was very simple. The client sent the line
GET /index.html
to the server and the server replied with the contents of the file, closing the connection at the end of file.

This simple protocol had many drawbacks. For one, all documents were assumed to be HTML text; the server had no way of sending any metadata. It was quickly superseded by HTTP/1.0 [2] which introduced additional header fields, both in the request and in the reply. For example a typical transaction might be:

GET /index.html HTTP/1.0
blank line

and the server would reply:

HTTP/1.0 200 OK
Date: Tue, 04 Feb 1997 23:11:26 GMT
Content-Type: text/html
Content-Length: 1331
blank line
...Contents of document
Close connection

Actually, servers and browsers implemented many of the features in HTTP/1.0 before it became a standard. This was due to the unprecedented growth in demand for the Web and Web services. The standardisation was done a posteriori.

2. HTTP/1.1: The New Draft Standard

Not wanting to be caught the next time, leading Web experts started work on HTTP/1.1 while 1.0 was on the standardisation track. HTTP/1.1 was designed to address some of the shortcomings of HTTP/1.0.

The most important changes between HTTP/1.0 and HTTP/1.1 are explained in the sections that follow.

HTTP and Internet Resources

A bit of background first: HTTP runs on top of TCP (Transmission Control Protocol), which is a stream protocol on top of IP (Internet Protocol). TCP is good because it guarantees delivery of the data, and that the data will arrive in order; or else will notify the sender that an exception has occurred. Thus servers and clients do not have to program mechanisms for data loss detection and data retransmission, which simplifies the code. However, HTTP does not have the notion of a session, as the FTP (File Transfer Protocol) [3] does. In fact, HTTP has been succinctly described as a hit-and-run protocol. Each Web page fetched involves establishing a connection to a server, transferring the data and tearing down the connection. Actually it's worse than that because a Web page may include graphics and each graphic is a separate document. The average Web document tends to be short, the average size according to a survey by Tim Bray published in the World Wide Web Journal [4] is around 7000 bytes. Thus the establishment and teardown of a TCP connection, which involve protocol handshakes[5] consisting of several packets back and forth and thus a few round-trip times, becomes a significant overhead.

When HTTP was invented at CERN, nobody could have predicted the popularity the Web would achieve. In 1995, on the Merit network, HTTP became the leading IP protocol by request volume, overtaking both FTP, and the combination of all the other TCP/UDP protocols, which includes SMTP (Mail). It is highly likely that these findings extrapolate well to the rest of the Internet. Therefore any improvements to HTTP will help ease the demand on network resources (file descriptors, buffers, control blocks, etc) along the transmission path.

Many proposals were discussed in forums such as the WWW-talk list. The features implemented in HTTP/1.1 that are directed towards making Internet usage more efficient are persistent connections, chunked encoding, byte ranges and caching. In addition, the non-IP virtual hosting feature will ease the demand on IP addresses.

Persistent connections

In HTTP/1.0 the end of the document is signalled either by the client reading the number of bytes specified in the Content-Length header or a connection close from the server. In HTTP/1.1, by default, the server does not close the connection at the end of document transfer but keeps the connection open, just in case a new request comes in soon afterwards. This will be the case when a set of related pages are fetched from the same server by a single client. If no request comes in after a certain (configurable) time period, the connection is closed. However if the client sends the header field Connection: close, then the connection is closed after the request has been satisfied.

With persistent connections it is necessary to delimit each transfer by using the Content-Length header so this is no longer optional. However there are some circumstances where the server cannot know the length of the data ahead of time; or the data is dynamically generated and the server cannot or does not wish to buffer the data to find the length; or the data is of unbounded length, for example, a stream of continuous data refreshing the display at regular intervals. In such cases, the server can use the new chunked encoding method. This method sends the response in pieces, each enclosed in a MIME [6] envelope.

Persistent connections have another advantage. Clients can take advantage of pipelining, that is, sending the next request immediately following the previous one without waiting for the reply to the previous request to completely arrive. The server is still required to service the requests in order. This saves round-trip times and TCP packets and is of great benefit over high-latency links. A more detailed discussion of persistent connections is in Section 3 of this paper.

Persistent connections are activated only when the client indicates that it can handle HTTP/1.1 replies by sending a HTTP/1.1 request.

Byte ranges

Byte ranges allow a part of a resource to be sent. The client specifies the byte offsets of the start and end of the piece or pieces of the resource required.

Probably the most common use will be for resuming interrupted transfers. However clients can use it in any way appropriate, such as transferring parts of structured documents. For example, a client might first request the description section of a word processing file, decide which pages are desired and request only those pages. Again this feature reduces unnecessary traffic.

Full URLs

Many people want to run more than one Web server on a host. This is called virtual hosting. For example an Internet mall might have 1000 businesses, each wanting their own home page. For technical reasons, it's better to have one server or set of servers pool their resources to handle all the requests to the different virtual hosts rather than start a separate server for each virtual host. Thus, the server has to have a way of knowing which virtual host is addressed when the request comes in.

In HTTP/1.0 the client only sends a partial URL and the host portion is implicit in the host that the client connects to. There is one exception, when requesting from a proxy, in which case the full URL is required. Virtual hosting could only be done by host aliasing, in which a host has more than one IP address bound to a given network interface. (Recall that IP addresses identify network interfaces, not hosts.) This allows the same hardware to run more than one Web server but at the cost of requiring more IP addresses. Even though IPv6 (IP version 6 protocols, with a 128 bit IP address space) is on the horizon, we need to be parsimonious with IPv4 addresses as IPv4 will continue to exist for many years more. In addition, host aliasing is not supported by all operating systems.

HTTP/1.1 was designed to provide multiple virtual hosts without using additional IP addresses. This is done by requiring the client to provide the (virtual) hostname along with the URL. This is done either with the new Host header field or by sending the full URL in the request. The drawback is that the full potential cannot be realised until a significant number of clients have been updated to use HTTP/1.1. Perhaps at some point, when there is a large body of HTTP/1.1 servers, old HTTP/1.0 clients will not be able to access virtual Web hosts.

More conditional qualifiers

HTTP/1.0 defined one method for specifying a condition under which execution of a method goes ahead, the If-Modified-Since header.

For example:

GET /index.html HTTP/1.0
If-Modified-Since: Fri, 14 Feb 1997 19:43:31 GMT
means: if the document has been changed after this time, then send the contents, otherwise send nothing, and signal this with the "304 Not Modified" status code. Often, when a resource is fetched again, the contents have not changed. This doesn't avoid the need to connect to the server to ascertain the status of the resource, but it does avoid the data transfer which can be significant for graphics (and which don't change often).

HTTP/1.1 defines other conditions: If-Unmodified-Since, If-Match, If-None-Match, and If-Range. If-Modified-Since and If-Unmodified-Since work on resource timestamps and are used to update cached information efficiently. If-Unmodified-Since will probably be used by the PUT method. If-Match and If-None-Match work on resource names. If-Range is intended to work in conjunction with the other directives and informally its meaning is: if the resource is missing, send me the parts I am missing, otherwise send me the entire resource. If-Range is only valid in conjunction with a Range header.

More methods

HTTP/1.0 defined GET, HEAD, POST and PUT. Other methods could be provided at the discretion of the implementation. HTTP/1.1 specifies, in addition, the standard methods OPTIONS, DELETE, and TRACE. OPTIONS provides a means of determining the communication options available for a resource or a server named in a request, without initiatiating any retrieval action. The server itself is denoted by the resource "*" (asterisk). TRACE is essentially a way of invoking a loopback at the other end so that the client can see what is received by the server.

More status codes

The list of status codes has been greatly expanded. Two of the more interesting are 402 and 409.

The status code "402 Payment required" is intriguing. The specification document only says: This code is reserved for future use. As far as the author knows there are no specifications of how this status code would trigger payment mechanisms; no doubt it is being worked on. Obviously this was designed with electronic commerce in mind.

The status code "409 Conflict" anticipates situations where updating a resource may conflict with another use, say a locked file or database. It is most likely to be triggered by database and publishing applications.

Content negotiation

Content negotiation is a way of "customising" the documents that are delivered when there is a choice between versions in say, different languages or encoding methods, to user preferences.

Two types of content negotiation are available in HTTP: server driven and agent driven. Server driven means that the server chooses the most appropriate resource, from a possible selection, based on information provided by the client. This information may come from headers that declare client capabilities, such as Accept-Charset, Accept-Encoding, Accept-Language and User-Agent. The server may however use information in other headers or even from extension header fields.

Here is an example:

Accept-Language: da, en-gb;q=0.8, en;q=0.7
This says: I prefer Danish, but will accept British English and other types of English.

The drawbacks with server driven negotiation are: (1) it is impossible for the server to always know best, (2) it is inefficient for the client to describe all its capabilities and may violate privacy, (3) it complicates the implementation of a server and (4) it limits the ability of a cache to use the same response for multiple user requests.

In agent driven negotiation, the server sends a list of available representation of a resource or resources then the agent selects from among these. The alternatives may be in the header field Alternatives or as part of the resource body. The choice could be made automatically by the client (in the standard called the user agent), based on what the user agent knows about the user, or could be presented to the user, possibly as a hypertext menu.

The disadvantage of agent driven negotiation is that it requires an additional request to fetch the data. However caching will help reduce the impact of the second request.

The upgrade header

HTTP/1.1 provides a mechanism for the client to inform the server of any additional protocols it supports. For example a server may support a SHTTP/1.3 protocol. The header would look like this:
Upgrade: SHTTP/1.3
If the server agrees, it returns a "101 Switching Protocols" response.

Authentication

Notable by their absence in HTTP/1.1 are the lack of any authentication schemes beyond the Basic and Digest schemes. The Digest Authentication scheme in HTTP/1.1 is the subject of a linked RFC: RFC2069 [7] and mainly addresses the problem of transmission of a password in cleartext over the network. (The Base-64 encoding of Basic encoding is trivial to decode.) It is only intended as a replacement for the Basic Authentication scheme. Other authentication schemes are to negotiated between the communicants.

Security is a very large area; understandably the designers of HTTP/1.1 did not wish to burden the base protocol with volumous specifications of authentication protocols. It could also be a decision to "let the market decide" which security protocols are appropriate.

Enhanced support for caches

The HTTP/1.1 draft standard devotes some 23 pages to specifications and rules on the behaviour of caches so that they can perform efficiently and correctly. A complete discussion would merit its own tutorial, however it is noted that caches are required to attach warnings if it knows that the resource is not "fresh enough" according to the rules and heuristics. This allows the client to decide whether to use the information or demand a first-hand copy.

3. Performance

The World Wide Web Consortium (W3C) has been measuring the performance of HTTP/1.1 vis a vis HTTP/1.0 and a note has been published, entitled: Network Performance Effects of HTTP/1.1, CSS1 and PNG. In their experiments, they set up a test web site containing data taken from two heavily accessed home pages, containing a total of 42kB of HTML and 125KB of inline GIF images. Jigsaw and Apache were used as the servers, while libwww was used as the client, under both HTTP/1.0 and HTTP/1.1. The tests were done under 3 environments: LAN (high bandwidth, low latency), WAN (high bandwidth, high latency) and simulated modem line (low bandwidth, high latency). There were two retrieval tests, one for a first time retrieval to an empty cache, and the second, called a cache validation, in which the cache already has the data and the server is just called upon to confirm that the data has not expired. To summarise the major findings of the paper:

4. Clarifications

The first HTTP/1.1 compliant servers, such as Apache, appeared about the same time as the draft became a draft standard. In the initial period of interoperation with clients, some misunderstandings arose and a clarification was issued. This document does not modify RFC1945 or RFC2068 but clarifies the text there.

In the clarification, it was reaffirmed that communicants with the same major protocol version number (i.e. 1 for HTTP/1.0 and HTTP/1.1) must be interoperable, in the sense that the meaning of headers do not change between minor version numbers. Participants that receive a header they do not understand should ignore the header. If the major protocol version number of the message is greater that that which the recipient is prepared to accept, the recipient must not accept the message. This is a dormant point at the moment since it only arises in communication between HTTP/0.9 and HTTP/1.0 communicants, i.e. a HTTP/1.0 client should not (and in fact, has no way to) send HTTP/1.0 headers to a HTTP/0.9 server. It will become important if HTTP/2.0 is developed.

The version number transmitted in the response from a server should be the version number that the server supports, not the version number in the request. Thus a HTTP/1.1 server should truthfully say that it supports HTTP/1.1 even to a HTTP/1.0 client request. There was in fact a proxy that mislabelled a HTTP/1.1 reply as invalid; this was quickly fixed.

The robustness principle [9] states that "an implementation must be conservative in its sending behaviour, and liberal in its receiving behaviour." A consequence of this is that a HTTP/1.1 server should not send HTTP/1.1 headers when it knows the client only understands HTTP/1.0. A client may send a lower request version if it has ascertained that the server is buggy for the higher request version. However a server may not downgrade a HTTP/1.1 or higher version request. Presumably this rule will force clients appearing on the market to implement the specifications correctly.

Another point is that the protocol level applies between two communicants in a hop-by-hop fashion, i.e. a browser and a proxy, or a proxy and a server. It does not apply over the whole transfer path. Therefore version numbers are not propagated. Proxies and gateways must be prepared for mismatches in version numbers and take the appropriate action: version upgrading of the message, tunnelling, respond with an error, etc. This also means that proxies must forward unknown headers, unless they are protected by a Connection header.

5. Implications for the World Wide Web

Probably the most important change in HTTP/1.1 is persistent connections. This should help in reducing the traffic congestion caused by HTTP. In addition, the specifications of cache behaviour in HTTP/1.1 will provide implementations with a standard to conform to, enhancing the interoperability of clients, caches and servers.

Non-IP based virtual hosting is of great interest to Web providers and Internet infrastructure organisations. It allows almost unlimited numbers of Web sites to be served from one server. For example a Web provider could enter one host alias per customer, each pointing to the same Internet host, into their Domain Name System (DNS) [10] tables and have each one activate a different virtual Web server, but with only one server running. In the long run this will conserve IP addresses.

Payment and security protocols go hand in hand, but HTTP/1.1 provides very little in the way of recommendations. This is likely to become a busy area of development.

We are likely to see greater use of HTTP in publishing mode, where information is uploaded to the server. Although the Range and Content-* headers provide additional semantics for the PUT method, this may not be sufficient for some users who want semantics approaching that of a remote file store. With the spate of interest in Network Computers, additional protocols may be brought into use, such as WebNFS [11].

It is also likely that future multimedia services will use HTTP to some extent, at least as the primary user access channel. We are likely to see in future combinations of HTTP and other protocols such as RTSP, a protocol for real-time audio and video data.

6. Future

Implementors of servers and clients have had adequate time for preparation, as drafts of the proposed HTTP/1.1 draft standard were in circulation for months prior to standardisation. Thus many features are already found in current software. As time goes on, discrepancies from the standard will be discovered and rectified. In time, the new crop of HTTP/1.1-aware applications will displace the old HTTP/1.0 applications and allow some of the economy features, such as non-IP virtual hosting, to kick in.

With an eye towards reducing the amount of standardisation work required to introduce new features in HTTP, and to allow server and clients to introduce new protocol features on a private use basis, work is being done on specifying a Protocol Extension Protocol (PEP). This is likely to appear in conjunction with HTTP/1.2.

7. Summary

We have made a short tour through the major new features of HTTP/1.1. HTTP/1.1 is a worthy successor to HTTP/1.0 and the RFC provides a solid definition against which implementations of Web based applications can be checked. The future success of the Web as the preferred interface to the Internet for the general public is assured.

8. Acknowledgements

Thanks to my colleagues Bill Simpson-Young and Graham Reynolds for reading drafts of this paper. Thanks also to the anonymous referees who provided good comments. I lay claim to any mistakes found.

The author wishes to acknowledge that this work was carried out within the Cooperative Research Centre for Research Data Networks established under the Australian Government's Cooperative Research Centre (CRC) Program and acknowledge the support of the Advanced Computational Systems CRC under which the work described in this paper is administered.

9. Non-URL references

  1. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Berners-Lee, T., "Hypertext Transfer Protocol HTTP/1.1", RFC 2068, UC Irvine, January 1997.
  2. Berners-Lee, T., Fielding, R., Frystyk, H., "Hypertext Transfer Protocol HTTP/1.0", RFC 1945, MIT/LCS, May 1996.
  3. Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9, RFC 959, USC/ISI, October 1985.
  4. Tim Bray, "Measuring the Web", World Wide Web Journal Issue three: Summer 1996. O'Reilly Publications ISSN: 1085-2301.
  5. TCP/IP Illustrated, Volume 1 The Protocols, W. Richard Stevens. Addison-Wesley 0-201-63346-9, 1994.
  6. Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, Innosoft, First Virtual, November 1996.
  7. Franks, J., Hallam-Baker, P., Hostetler, J., Leach, P., Luotonen, A., Sink, E., and L. Stewart, "An Extension to HTTP : Digest Access Authentication", RFC 2069, January 1997.
  8. Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, Ford Aerospace and Communications Corporation, January 1984.
  9. Jon Postel, "Internet Protocol", RFC 791, NIC, September, 1981.
  10. Mockapetris, P., "Domain Names - Concepts and Facilities", STD 13, RFC 1034, USC/Information Sciences Institute, November 1987.
  11. Callaghan, B., "WebNFS Client Specification", RFC 2054, Sun Microsystems, Inc., October 1996.