Due : Friday, January 29, 1999. Midnight.
Note: The information about CGI has been updated. Please disregard the previous handout
Important Notice : This a two part project. You should plan to finish the first part of the assignment by January 22 (or plan on suffering a lot next week). The second part of this project will be distributed later this week. The web-server you develop will also be the basis for project 3.
Please read this entire handout before you begin work.
You will run your web-server as follows:
where "docroot" is the document root directory (containing all of your HTML documents and CGI scripts) and port is the TCP port number that clients will use to connect to the server. For example,webserver docroot port
For testing purposes, you may want to run the server on a directory containing your own homepages. If you do not have a homepage you can test your server using the web pages available on classes. For example:% webserver /home/beazley/html 10000 Web server listening on classes.cs.uchicago.edu port 10000
webserver /opt/local/www/http/docs/roots/www.classes.cs.uchicago.edu 10000
To test your server, use any browser (Netscape, Internet Explorer, Lynx, etc...) by entering a URL containing the host and port number of your server as follows:
http://classes.cs.uchicago.edu:10000/
The first line of the request is the most important because it indicates the intended operation (GET), the document (index.html) and the HTTP protocol version (1.0). The rest of the request contains additional information about the client. In this case, the "User-Agent" indicates the type of browser being used (Netscape running on a Linux machine). "Host" indicates the hostname given to the browser. This can be used by a web-server for virtual hosting. The "Accept" properties indicate various capabilities of the browser including the types of images that are supported, the language preference, and character set. For this project, you won't have to worry about these properties too much. Finally, the GET request is terminated by a single blank line.GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.05 [en] (X11; I; Linux 2.0.35 i686) Host: localhost:10000 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 < blank line >
There is one additional little detail to note about requests. If the filename requested by the client is really a directory like this,
the web-server usually tries to return a file named 'index.html' in that directory. If the 'index.html' file does not exist, a server may choose to report an error back to the client or may generate a listing of the files in the requested directory (the precise behavior is actually determined by a server configuration file and can be set on a directory by directory basis). Your server should look for "index.html" and report an error if it isn't found.GET /yourpage/html/ HTTP/1.0 User-Agent: Mozilla/4.05 [en] (X11; I; Linux 2.0.35 i686) Host: localhost:10000 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8
The first line of the response provides the HTTP version (1.0), a response code, and a response message. The following list shows the possible response codes:HTTP/1.0 200 OK Server: cs219/yourname Content-type: text/html Content-length: 1253 <html> <head> <title> This is my HTML document </title> </head> <body> ... </body> </html>
Your server will probably only generate a few of these codes (most notably, the 200,400, and 404 codes).2xx Successful (all 200-299 codes are success codes) 200 OK 201 Created 202 Accepted 204 No Content 3xx Redirection (all 300-399 codes to the client to go elsewhere) 300 Multiple Choices 301 Moved Permanently 302 Moved Temporarily 304 Not Modified 4xx Client Error (all 400-499 indicate errors on client end) 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 5xx Server Error (500-599 indicate errors with server) 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable
After the response code, a number of attributes about the server can be given. These are optional, but a typical response might include the version of the server being used, file modification dates, and so forth.
The contents of the document are described by the "Content-type" and "Content-length" fields. The content type is specified as a MIME header type. Typical values are as as follows:
The "Content-length" field only needs to be specified for binary data such as images and Java applets (it can be ommitted for HTML and textual data).Content-type: text/plain - A plain text document Content-type: text/html - An HTML document Content-type: image/gif - A GIF image Content-type: image/jpeg - A JPEG image Content-type: application/java - A Java class file
The response header is terminated by a single blank line. After the blank line, the actual document data is transmitted. In the case of HTML files, you will see the HTML text describing the page. In the case of images and other binary data, the raw binary data is written. When writing binary data, make sure the Content-length field exactly matches the actual length of the sent data. Also, be careful not to send any extraneous data in the header (such as an extra blank line).
# Simple Python web-server import sys import string from socket import * # Check the command line arguments if len(sys.argv) != 3: print "Usage : webserver docroot port" sys.exit(0) # Set the document root and port docroot = sys.argv[1] port = string.atoi(sys.argv[2]) # Open up a socket serversock = socket(AF_INET, SOCK_STREAM) serversock.bind("", port) serversock.listen(5) print "Web-server listening on port %s " % (port,) while 1: (conn,addr) = serversock.accept() # Get a connection print "Connection from %s" % (addr,) request = "" c = conn.recv(1) while c != "\n": request = request+c c = conn.recv(1) request = string.split(request," ") method = request[0] document = request[1] if (method == "GET"): # Form the full filename file = docroot + document # Try opening the file try: f = open(file) conn.send("HTTP/1.0 200 OK\n") # Figure out the content type if (file[-5:] == ".html"): conn.send("Content-type: text/html\n") elif (file[-4:] == ".gif"): conn.send("Content-type: image/gif\n") elif (file[-4:] == ".jpg"): conn.send("Content-type: image/jpeg\n") else: conn.send("Content-type: text/plain\n") # Read the file and send it data = f.read() conn.send("Content-length: %d\n" % (len(data),)) conn.send("\n") conn.send(data) except: conn.send("HTTP/1.0 404 Not Found\n") conn.send("Content-type: text/html\n\n") conn.send("<h1> File Not Found</h1>") else: conn.send("HTTP/1.0 501 Not Implemented\n") conn.send("Content-type: text/html\n\n") conn.send("<h1> Unimplemented request type</h1>") conn.close()
Your web server must support the CGI interface. This section describes a few of the more tricky implementation details.
With GET requests, the CGI query string is appended to the filename following a question mark. Thus, the above request tells the server to run the CGI program "foo.cgi" with a query string of "cmd=lookup&name=dave".GET /cgi-bin/foo.cgi?cmd=lookup&name=dave HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.05 [en] (X11; I; Linux 2.0.35 i686) Host: localhost:10000 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 < blank line >
The second format of CGI requests is as a POST request that looks like this:
A POST request is almost identical to the GET request except that the query string is passed as data following the request header (separated by a blank line). POST requests are most commonly used with forms that are transmitting large amounts of data to the server (since there is a limited amount of data that can be sent with a GET request).POST /cgi-bin/foo.cgi HTTP/1.0 User-Agent: Mozilla/4.05 [en] (X11; I; Linux 2.0.35 i686) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 cmd=lookup&name=dave
The query string contains the form data entered by the user (which has been heavily "munged" by the browser). It is up to the CGI program to decode and interpret the query string. Thus, all your server has to do is capture the query string and make it is gets properly passed to the CGI program.
Since the server is going to be running a separate program, extreme care needs to be given to the contents and use of the cgi-bin directory. In particular
GET /cgi-bin/../../../../../../usr/bin/rm?-rf+/ HTTP/1.0
Here is the same code written in Java:# ----------------------------------------------------------------------------- # Executes another process using pipes # ----------------------------------------------------------------------------- import os # Here is some data data = "Hi there, this is a test of pipes" # Create a pipe (pinput,coutput) = os.pipe() # Create a pipe from child to parent (cinput,poutput) = os.pipe() # Create a pipe from parent to child # Run 'wc' on the data above pid = os.fork() if pid != 0: # I'm the parent os.close(coutput) os.close(cinput) os.write(poutput,data) os.close(poutput) response = os.read(pinput,1000) print "py:", response os.waitpid(pid,0) # Wait for child to exit else: # I'm the child os.close(poutput) os.close(pinput) os.dup2(cinput,0) os.dup2(coutput,1) e = {} # Environment variables e['FOO'] = 'BAR' os.execve("/usr/bin/wc",["wc"],e) # Run 'wc'
This sample code can be found in :// Disclaimer : This is a gross hack. You can probably do better import java.lang.*; import java.io.*; public class Pipe { public static void main(String[] args) { try { String data = "Hi there, this is a test of exec"; Runtime r = Runtime.getRuntime(); // Create the command array // Note : Additional arguments would go in cmd[1],cmd[2], etc... String [] cmd = new String[1]; cmd[0] = "/usr/bin/wc"; // Create some environment variables String [] env = new String[2]; env[0] = "FOO=BAR"; env[1] = "SPAM=YES"; // Run the command Process p = r.exec(cmd,env); // Grab the input and output streams OutputStream o = p.getOutputStream(); InputStream i = p.getInputStream(); // Send data to the process o.write(data.getBytes()); // For a program reading from stdin, we need to close // the OutStream to generate an end of file o.close(); // Get the results back. This is a hack. You would // Probably want to use BufferedReader or some other // input method. byte[] result = new byte[16000]; // Ugh. i.read(result); // Wait for the child process to finish try { p.waitFor(); } catch (InterruptedException e) { } // Print out the result System.out.println(new String(result)); } catch (IOException e) { System.out.println("An error occurred."); } } }
Your server should set the following environment variablesGATEWAY_INTERFACE CGI version number SERVER_NAME Hostname of the server SERVER_SOFTWARE Name of server program SERVER_PROTOCOL HTTP protocol the server is using SERVER_PORT Port number the server is using REQUEST_METHOD The HTTP request method (GET, POST, etc..) PATH_INFO Additional Path information SCRIPT_NAME Name of the CGI script (/cgi-bin/program.cgi) DOCUMENT_ROOT Top of the web document tree QUERY_STRING Query information passed in the URL REMOTE_HOST Hostname of the client REMOTE_ADDR Remote IP address of the client REMOTE_IDENT User making the request (may be unavailable) CONTENT_TYPE Mime type of the query data CONTENT_LENGTH Length of the query data HTTP_FROM E-mail of user making request HTTP_ACCEPT List of MIME types client can support HTTP_USER_AGENT The browser the client is using HTTP_REFERER The URL of the document the client points to before accessing the CGI program.
Any file-type that can't be determined (the file has an unknown suffix) should be returned as type text/plain.MIME type File Suffix ----------------------------------------------------- text/html .html, .htm image/gif .gif image/jpeg .jpg, .jpeg application/java .class text/plain anything that's not listed above
Once you have your basic server up and running, modify it to run concurrently by using fork() or threads to handle multiple client connections.
Add a few security checks to make sure the browser can't request bogus files or do anything weird. For example, your server probably shouldn't allow a user to download the system password file.
Once you have your server working pretty well with all of the above. Add CGI support. Make your server look for the /cgi-bin/ directory and set it up to execute programs in that directory. This is the most tricky part of the server to write, but it should not involve a huge amount of code.
As always, don't hesistate to send me email or stop by if you have any questions.