Use lynx –dump to retrieve the contents of your Web site. Just hardcode all the page URLs. Redirect all the content to flat files, then use grep to look for patterns in your content. Start by looking for mistakes you commonly make. Save your greps in a file.
Wow, now this brings back some memories :).
I first loaded up a lynx browser back in 1993, and this was my introduction to what the non-graphical World Wide Web looked like. Truth be told, I fairly quickly abandoned lynx as an everyday platform when both NCSA Mosaic and the first version of Netscape came out, but there is indeed a value to using lynx. It’s a nice tool to add to accessibility tests, so that you can see what your super pretty graphical page looks to those who don’t have that option. For those curious… it looks like this (well, mine looks like this):
|Yep, that’s what the Web looked like in 1993. Cool, huh?|
lynx —dump does exactly what it sounds like.
Here’s an example from my own little site project:
lynx —dump http://127.0.0.1/web/orchestra/index.php
This prints the following to the screen:
Adding a redirect tag (‘>’) puts it in a file for us. repeat a bunch of times, and you can pull down details on every page in your site.
OK, cool, that’s interesting, but what does that do for us? It allows us to go through and pull out data that we’d want to analyze. Granted, the site as it exists right now isn’t all that spectacular, but it does give us a basis for how we can construct some simple greps.
For those not familiar with this tool, “grep” is an old UNIX standby. The term comes from the syntax of the “ed” editor, and the command that was used was g/re/p (or “globally search for a regular expression and print it to stdout”). Those of you with Windows machines can download Grep for Windows at http://gnuwin32.sourceforge.net/packages/grep.htm, or you can find a variety of fun an interesting versions. For me, since my system is in a virtual environment, I’m just going to save the files to my shared folder space and play with grep on my Mac :).
The main benefit to using grep is to look for things that show up in your pages that you may find interesting, or things that might be errors. Searching for basic strings in file names can show a lot of interesting details in the content of the pages. As a quick set of examples that we can do using grep, I recommend poking around on this page for 15 command examples in ways you can use grep to get interesting data.
Once you find a few greps that you find useful, it’s a good idea to save those in a file so that you can run them over and over again as you add content to the site and get more information to mine from your site.
This is meant to be a really basic first step in getting into the details of what your pages show and help get you away from using the browser as a main interaction source. Yes, there’s a lot that can be done just with the files and the content that is in them. How you choose to look at them and what interesting details they show will be my focus for next week.