Archive of Posts from November 2006

Gift Idea for new digital photographers

O'Reilly just published a digital media gift guide via press release, and it includes a bunch of great books for the MP3 and digital photo crowd. But they left out one item, imho. (Warning, what is about to follow is blatant self-promotion—but I still think it's a good idea.) If someone you know is getting a new digital camera (or receiving a digital camera or cameraphone for the first time), give them the ultimate accessories: a Flickr Pro account and a copy of Flickr Hacks. Digital cameras should ship with some sort of pipeline to Flickr. And until they do, you can encourage sharing with the account and get them up to speed with almost everything Flickr can do with the book. (This is my own digital media press release.)

And speaking of Flickr Hacks, co-author Jim Bumgardner recently joined Yahoo. He'll be doing his brand of audio and visual Flash hackery for Yahoo Music. Congrats, Jim! (And Yahoo!)

Update (12/18): Flickr added a Flickr Gifts page, and Paul Stamitiou put together a Flickr Gift Guide for the Flickrist in your life. (Alas, no mention of Flickr Hacks—but you know better!)

RAID Mirroring on a Mac Pro

A couple weeks ago I walked into my office, turned on my monitor, and saw a gray screen. Odd, I thought. Rebooted. Same gray screen. Rebooted. Nothing but gray.

I have one of those new-fangled Intel Mac Pros, so I had to look up how to boot into a different mode. I fired up my PC and found the handy document: My Mac Won't Start!. As I went through all of the options I kept getting the dreaded gray screen. The contents of my hard drive flashed before my eyes. I thought for sure I'd lost my data, and I was starting to make a list of things I would need to reconstruct. After exhausting my options, I called Apple Support and they confirmed that my Mac would need professional help.

To make a long story short, my hard drive was fine. The internal power unit had failed, and two weeks later I got my Mac Pro back in working order with a new power unit. But the event gave me backup religion, and I decided to look into backup options so I wouldn't have to worry about loosing recent work and the thousands of files, emails, and media I've gathered over the years.

The Mac Pros have RAID software built in, so I decided to move from my default 250 gig drive to two mirrored 500 gig drives. RAID stands for redundant array of inexpensive disks and the idea behind mirroring (aka Raid 1) is that you have two or more disks with exactly the same data. If one disk fails, you can use one of the others until you get a replacement. Plus a drive failure won't mean lost data. I think of it as an insurance policy for my virtual stuff.

The process of setting up a RAID set was fairly simple, but I thought I'd step through the process here to share my experience. Here's what you'll need to move from a single drive to RAID mirroring on a Mac Pro:
  • Mac OS X (10.4+)
  • Two 500 gig drives
  • External hard drive case
I looked for quiet drives and eventually settled on Western Digital Caviar SE16 drives. I found them as OEM drives (plastic bag packaging only) at Newegg for $169.99 each. I didn't have a SATA external drive case, so I picked one up at my local Mac Store for about $75. (I picked up a Macally PHR-100SU.) You can probably find external cases cheaper by looking around a bit, though.

Once you have everything you need, you want to initialize your new drives, set up a RAID mirroring set, and copy your current drive to your new RAID set. It goes like this:
  1. Remove the current drive.
  2. Install the two 500 gig drives.
  3. Install the current drive into the external drive case.
  4. Plug in the external drive, boot.
The Mac Pro will boot from an external USB device. You just need to be patient. Eventually the Mac will automatically detect that your OS is now on an external drive. (You could probably also boot from the CD that came with your computer and skip the whole external case thing, but I went the case route.) Now that you've booted into your current drive, start up Disk Utility in the Applications/Utilities folder, and continue:
  1. Partition the new drives.
  2. Click the RAID tab, drag both new drives into the box, click Create.
  3. Click the Restore tab.
  4. Drag the current drive (external) icon to the Source field.
  5. Drag the new mirrored set icon to the Destination field.
  6. Click Restore.
Once this process starts you might want to go out for a coffee. And a movie. And maybe some shopping. You're copying your old disk to your new RAID set, and it'll take several hours (depending, of course, on how much data you have on your current disk).

Once the restore is complete, shut down. Unplug your external drive, and fire up your computer. Your Mac might sit on a gray screen while it looks for an OS to boot into, but you should soon find yourself back on your computer as if you'd never left.

So what did I get out of all this work and the $450 I spent? Well, I now have two hard drives storing my data. In my case, I also got double the hard drive space (I went from 250 gigs available to 500 gigs). I didn't get a performance boost by moving to RAID. With mirroring, the OS has to write everything to two drives now, and I haven't noticed any change. Sometimes I think I sense the system as slower, but it's probably my imagination. But any performance trade-off is so slight, it's worth having protected data. And to top it off, I still have my original 250 gig drive that I can leave in the external case. I might also install the drive internally, and use it for more non-mirrored storage.

The bottom line is that Apple has made RAID disk mirroring extremely easy to set up, and I can sleep a little easier at night knowing that if I wake up to a gray screen (or worse) it won't be the end of my data world.
  • Great tips for managing URLs/Redirection with Apache's cryptic but powerful htaccess. [via mathowie]
    filed under: software, security

Corvallis Update

I'd like to talk with my fellow Corvallians for a minute. Everyone else gone? Good, ok.

We get so few new places to eat around here, that I thought I'd mention a couple of new options. I finally had a chance to eat at the new McMenamins on Monroe last night. It's a nice space, same good McMenamins food. The plumbing art on the wall is fun, but I'm not sure about the giant, giant TV screen. I suppose it'd be good for watching OSU games, but it seems a little out of place when you're there for dinner.

Be sure to check out Thanh-Hien, the new Vietnamese place by Winco. (It replaced TCBY.) I've been there a couple times for lunch, and it's been surprisingly great both times. They've done a nice job redecorating the place. Food places don't last long over there, so try it while you can. I hope they do well.

Did you know that OSU has a nuclear reactor? Neither did I. Somehow they forget to mention that on those safest places to live surveys.

If you don't already subscribe to Paul Turner's email list (he runs the Avalon and Darkside), go subscribe now! His movie synopses are worth the price of admission alone, and sometimes commentary about Corvallis and his experiences running the theaters creep into the emails. Here's part of his recent rant about the Whiteside hullabaloo:
At the risk of not being silly, I must tell you that the Whiteside is simply not financially viable as a movie theater...The memories of exploring one's heterosexuality on the Whiteside balcony while Indiana Jones dodges Nazis will not pay the bills, and will not excite investors to pay them, either. Frankly, I will be a wreck when they start whatever transformation takes place at the Whiteside...But how I feel about the old queen does not change the way business works. And that sucks.
I wish Paul Turner had a blog so I could link to the whole thing.

Hope your leaf pickup is going well. Let's have a good Civil War this Friday—Go Beavs!
  • Flash upload progress thingy for web apps. [via mathowie]
    filed under: design, development, flash, programming
  • a quick, straightforward explanation of data portability and why companies like Google should support it. [via battelle]
    filed under: amazon, google, internet, privacy
  • Flickr applies for a patent on "interestingness" as a way of determining which media objects are getting the most attention from users. [via kottke]
    filed under: flickr, future, law, tagging

The Readability of Blogs

You need 11.9 years of formal education to easily understand this site. Well, that's if you believe a readability test called the Gunning-Fog Index. The Gunning-Fog Index is basically an algorithm that analyzes text for sentence length, syllables per word, and word complexity. After crunching the numbers it comes up with a readability score that is supposed to predict how easily people will be able to digest the text. The Wikipedia article for the Gunning-Fog Index mentions that comic book text typically has a score around six, Reader's Digest typically scores around eight, Newsweek scores around ten, and so on. This puts onfocus.com on par with the readability of Time magazine.

The first time I ran into a demo of readability tests was at this page: Juicy Studio Readability Test. You can plug in a URL, and get back a Gunning-Fog Index score, and some other scores. I thought it was interesting and moved on. But for some reason it's been sticking in the back of my mind.

I'm bringing this up because I've been thinking quite a bit about the ways we measure blogs. And most of our measurement tools are fairly blunt. If you ask blog-measurement site Technorati what it "thinks" about your favorite blogs, you'll get machine answers like the number of inbound and outbound links. You'll get some info about traffic over time and Technorati's computed rank compared to other blogs. You'll see post-frequency and a list of common topics culled from RSS categories and Technorati tagging.

On the other hand, if I were to ask you some questions about your favorite blogs, you could probably tell me exactly why you like them. And it wouldn't have anything to do with inbound links or the other machine-based metrics. I'm guessing most of your answers would involve the writing style, tone, the topics the author covers, the fact that everyone else reads it, or maybe your personal relationship with the author.

You can't quantify something like tone, so you can't put computers to work analyzing tone. (I'd love to have a snark score for blogs.) But readability scores are a step toward a more human-style metric, and the scores can be crunched, analyzed, graphed, and averaged by computers. And I like the idea that the readability scores are laying there dormant within the sentences themselves, waiting to be tapped.

I'm not a linguist so I don't know how accurately these scores reflect readability. But I was interested enough in readability as a metric to do some digging around. A search on CPAN turned up the module Lingua::EN::Fathom which accepts arbitrary text and returns the Gunning-Fog Index score, along with several other scores including Flesch Reading Ease score, and the Flesch-Kincaid grade level. I thought it might be fun to plug in the top ten or so English language blogs as reported on Technorati popular to see if there's a "sweet spot" reading level among the most popular blogs. Of course many factors go into a blog's success, but I thought readability could be a reason some blogs hit the top of the tail and others don't. If nothing else, I figured I could find out if blog readers are more of a Reader's Digest sort of audience, or more of a Time magazine sort of audience.

So I cooked up a little Perl script that takes a list of RSS feeds, loops through the posts, strips out HTML, and calculates readability scores. If you want to run it yourself, you can grab the code here:

reading_levels.pl

In addition to the Lingua::EN::Fathom module, you'll need LWP::Simple for fetching feeds, XML::RSS::Parser for parsing them, and Math::Round::Var for rounding the scores. Add a list of feed URLs you want to analyze to the top of this file, and then run it on the command line, like this:

perl reading_levels.pl > reading_levels.txt

Once finished, the file reading_levels.txt will have a report with the individual reading levels for the sites, and an average for the group.

Caveats: this isn't a very robust feed parser, some feeds only have excerpts rather than full posts, and some feeds simply don't work with this script. I used the full feed posts if multiple feeds were available, and I skipped any sites that didn't parse.

So, what did I find? Well, here's the report for the top several English-language blogs as reported today by Technorati:

reading_levels.txt

(I skipped Post Secret because there's not much text to analyze.) The average Gunning-Fog Index score was off the "wide audience" charts at 14. That means the average person would need over 14 years of formal education to understand these blogs easily. The average Flesch Reading Ease score was 46.9, on a scale of 100. That's on par with state insurance form requirements. (seriously!) And the Flesch-Kincaid grade-level score was 11.8, meaning that it's appropriate for high school seniors, high on the scale. The most "ideal" site for a wide audience was Daily Kos, with a low Flesch Kincaid Grade level (9.05) and an above average Flesch Reading Ease score of 56.48.

So, what does this mean? I have no idea. My prediction that the most popular blogs would have very good readability scores didn't quite hold up. I can't pinpoint a "sweet spot", but maybe blog readers enjoy more densely layered text. (Think Time instead of Newsweek, but not quite Harvard Law Review.) I might take a look at sentence length and percentage of complex words next and see how those measure up.

I still think measuring readability has promise. Earlier today Anil was talking about TL;DR syndrome, and I think the popular blogs capitalize on this with short, frequent posts. But I also wonder if text density plays a roll. So in addition to saying, "too long; didn't read," I think there's the possibility of "too dense; didn't read". (insert joke here.)