Information for Assignment 2
FAQ for Assignment 2
- Q. Should my server only handle HTML documents?
A. No, your server should also be able to handle the standard MIME formats (e.g. .gif, .jpeg, .txt)
- Q. When I connect to an HTML document through my proxy-cache server, what happens to the images?
A. The images are referenced in the HTML document via the <IMG> tag. The client sees this tag and requests the images from the server (in this case your proxy server, which must handle them in the same way as it handles the original HTML document.
- Q. In the proxy server assignment, do we have to use sockets and do all the
programming our selves, or can we use some of the other classes like
HttpURLConnection to makes the connections?
A. You must use the TCP conection-oriented sockets to provide the connection between your cache-server and the web clients and servers. You may use any of the classes that are available to do this.
- Q. Writing a browser as an extension seems to be too difficult. Are there other extensions I could write?
A. Besides the ones listed on the assignment sheet, you could also
a) include log files for your proxy server. These log files could i) record every cache object written to disk, or ii) keep records of how long files are kept in the cache including an analysis of max. time, min. time and average time for files to be stored in cache. You could also determine a ratio of hits/misses.
b) You could prefetch "popular" files during slow times and store them in the cache.
- Q. Do I need to worry about reading in images?
A. Yes, when you read an html document, more than likely that document will reference one or more images. The client (browser) will parse the html and send a GET for each of the image files identified by the IMG tag. Your proxy needs to be able to handle this.
- Q. Can I read in images and textfiles the same way?
A. Probably not. I found that it works better to read in text files and binary files in different ways.
- Q. I have been looking through the documentation on the html "rfc2616". When I noticed all the other commands other than GET which are supported by the protocol. Which raises the question do we have to support all of them or just GET's?
A. For the assignment, you only need to implement GET. The others are beyond the scope of this assignment and are typically not supported by proxy-cache servers.
- Q. Can I read a binary file all in at one time?
A. Probably not, there seems to be a bug in the InputStream.read when you try to read in an array of bytes.
- Q. What MIME content types do I need to consider?
A. Minimally your program should be able to handle the text and images content types. Audio, video and application can typically be handled the same as image files. You should NOT cache cgi-bin programs, since you don't want anything to run on your proxy-server. Similarly for .asp files which can only run on the origin server.
- Q. What should I do with the parameters for the GET?
A. Your program should minimally handle the GET without parameters. Expand upon to include parameters if you wish.
- Q. Which protocols should I write?
A. You need to discuss (using BNF) what the client sends to your program, what your program sends to the origin server, what the origin server returns to your program and what your program returns to the client.
- Q. How to make Proxy server access cache(how to let the proxy server know the directory)?
A. You will need to code this. Your program will need to manage the cache.
- Q. I thought the pages in the cache are a list of files, how can the Proxy server know which file is wanted?
A. That's something you will have to code.
- Q. How to read the image files?
A. Remember that an image file is just a collection of bytes.
- Q. I'm having trouble connecting to some pages on http://www.cs.waikato.ac.nz/Teaching/Part3/
A. Many of these pages are redirected and seem to cause problems with your proxy-cache server.
- Q. Can I use URLConnection or HttpURLConnection?
A. Yes.
- Q. > Before I changed "wwwcache..." to my proxy , i type URL address into the Location textbox in the Netscape web browser, i.e. "http://www.cs.waikato.ac.nz/~312". The Browser or proxy-server would immediately recognize it is a directory and add a slash at the end of the URL. i.e. "http://www.cs.waikato.ac.nz/~312/".
However, after i changed it . The Browser or proxy-server wouldn't recognize it is a directory. I have to add a slash at the end of the URL menually.
A. As you have observed, appending a / to a URL address for a directory is not
done by the browser, so it is your server's responsibility to handle this.
- Q. > In the FAQ you said it's better to read in text and image files in different ways.
When cache miss, since the proxy is suppose to get unavailable data and put it in the cache first,
then return that data from cache to client. Do u mean we need to read data in different ways when reading from web servers and cache? Or just read in differents ways when reading form web servers while can just use one way when reading from cache files?
A. Yes,
The problem seems to be with reading from the origin server. You should be
able to move the files from your cache to the client in the same way for
both text and image files.
- Q. What I want to know is what case you will test our file. Do we need to
handle the wrong port number format?
A. If you are reading in the port number from the command line, then you will need to handle the format.
- Q. In the assignment2, what's the meaning of the ratio of hits/misses you mentioned in the FAQ. and what's the use of it in my proxy server?
A. The ratio of hits to misses refers to how many times was the requested
page in your cache compared to how many times you had to retrieve the page
from the origin server.
- Q. are there any sites on campus that use cgi scripts? so we can test our
programs.
A. Yes, there are many including one on the www.waikato.ac.nz page.
Wayne Summers