clerk.house.gov to smart for my own good?

The House web site has been pretty proactive in moving to modern XML based formats for lots of data, but you might not be able to see it unless they correctly identify your browser.

Here's a little test I just did (getting mangled by formatting, sorry, it's all still in there somewhere):

curl -o /tmp/309
'http://clerk.house.gov/cgi-bin/vote.asp


?year=2007&rollnumber=309'
curl -o /tmp/309f --user-agent "Mozilla/5.0
(Macintosh; U; PPC Mac OS X Mach-O; ja-JP-mac;
rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3"
'http://clerk.house.gov/cgi-bin/vote.asp


?year=2007&rollnumber=309'

If you run this and compare the outputs 309 and 309f you'll see that the website returns totally different results for different browsers (if you can call 'curl' a browser :-)

I can see how this could easily be done out of good aims to try and get the most browser-appropriate representation to the user, but the problem is that if I wanted to download the XML version for other processing, I might never find it if my browser isn't correctly recognized. I think the right answer is to make sure the XML is independently indexed and always findable through some other page for people interested in data. The current interface is probably still about the right thing to do for the casual browser of clerk.house.gov.

Just an observation of web tech and our slowly more visible Congress.




You are not logged in.

In order to post a comment, you must be logged in. If you have a member account, please log in to comment.

If not, you can make an account right here. It's quick and free.