Cache e webrequest
The webRequest object is fantastic to download content from the web, but it has some strange behaviour. Today I implemented cache in a component of mine, it needs to monitor a series of web pages, and clearly it is not useful to download full content of the page if the page itself is not changed. The solution is to use a cache.
First you need to know how does it work cache for web pages, I’m not an expert but it is quite simple, you need to check in the response header for a couple of header, here these tags for www.nablasoft.com
Last-Modified: Mon, 23 Apr 2007 14:54:38 GMT
ETag: “f4bf850b785c71:ceb9”
The tag is needed from the webServer to handle the cache, it is an opaque value that you need to pass in the request header. Now here is the standard situation, I download the page the first time, store the above etag value along with full page content and the url in a database (Actually I use a ICache interface managed by castle.windsor). At each request I check if the url has a content associated with a date, if I have a match in the cache I set the request header accordingly
|
|
The structure of the cache is simple, I first check if I have the etag cached, then I check if the page is still in the cache (I do not know if the full data is still in the cache. The real work is in line 8 and 9 where I set cache header in the request. The really strange thing happens when I try to get the response, the problem is that if the cache is still valid the server returned a 304 http code ( HTTP/1.1 304 Not Modified ) that represent a valid response but is treated like exception by the framework, here is the code to check if the cache is still valid
|
|
Between line 2 and 3 there are other lines of code but the important thing is that when the server respond 304 the.net throw an exception of type WebException, the question is “how I can found the code?”, the answer is “in the message of the exception”, this is not the best thing to do, at least it surprised me ;). I can accept the exception, mainly because with a 304 the server had closed connection so there is no stream to read the page from, but the WebException object should expose a Int32 property giving back the HTTP1.1 error code, it would be really better.
Alk.
Tags: WebResponse HTTP Cache