OTFG Step 4: Running the Import Script

At long last here's the import script I used: import-flickr-photos.php.

If you want to try it out, be sure to add all of your personalized information I've mentioned in the previous steps (1,2,3) to the top of the script.

Here's what the script does:
  • Authenticates the person running the script at Flickr via the browser. (You'll have to give your script permission to read all of your photos.)
  • Requests the total number of photos for the authenticated account from the Flickr API (to help with looping).
  • Requests all of the account's photos, asking for standard info (title, FlickrID, whether or not it's public) and some extra details (relevant dates, and the longitude and latitude).
  • Loops through every photo, adding the photo information to the database if it isn't already there.
  • Downloads the original photo file from Flickr and saves it locally if it isn't already there.
  • Requests the description (caption) and tags for the particular photo from the Flickr API. (Requires a separate API call, unfortunately.)
  • Adds the file location, description, and tags to the db.
  • Finally, the script sleeps for one second before doing anything else. (Seemed like the polite thing to do so the script doesn't hammer the API.)
In my last post I mentioned that Flickr Backup was clunky to use, but this script is a thousand times clunkier. If Flickr Backup is a Ford Taurus, then running this script is like taking your covered wagon out on the Oregon trail: no shocks, no rubber tires, no paved roads, and you'll likely die of cholera before it's finished. I'm kidding on that last part, but the script probably will die before all of your photos are saved locally. Not to worry, you can run the script multiple times without duplicating files or data. The script checks for existing records and files before taking any action. I have 513 photos at Flickr—which isn't too many in the scheme of things—and I still needed to run the script a couple times to get all of them. (I set a ridiculously high timeout at the top of the script, but the script seemed to die anyway.)

To run the script, open it in your browser. The URL should be something like:

http://example.com/import-flickr-photos.php

The script will redirect you to Flickr where you'll need to log in and/or authorize the script. From there, the magic starts. The script will try to give you some info about what's happening, but if you don't see anything but a blank page, don't worry, it's probably working. If you can, log into your server and check out the photos directory you set up. You should see folders and files appearing. Another way to check progress is by firing up MySQL and running some counts on your table. Something like this:

SELECT Count(PhotoID) FROM photos

If the script is working the count will be higher than zero.

It's important to note that the import script is grabbing every photo you uploaded to Flickr, even those marked as friends and family only. This is exactly what I wanted to happen, but if you want something different, check out the documentation for the flickr.photos.search method and tweak line 65 of the script. You can set a privacy_filter argument in the call to get a list of only photos that are public, for example.

So, once this script finished, I had a bunch of local directories filled with photos that I'd uploaded to Flickr over the past three years. I also had 513 records in the photos table and 1,646 records in the tags table describing those photos. That means I (and some others) added about 3.2 tags per photo. huh. So I can't really look at my photos through the Web yet, but at least they're ready for the next phase. Not too shabby for a few hours on a Saturday afternoon.

Disclaimer: As I mentioned before, OTFG is an off-the-top-of-my-head project. So if you try any of this stuff out, please don't hold me responsible for your toaster catching on fire. I'm sharing this project publicly to show how I'm going off the Flickr grid, and to hopefully get some feedback in the process.

Next Up: Set Theory

OTFG Step 3: Registering the Import Script

One feature that put Flickr ahead of the existing photo-sharing pack a few years ago was the fantastic Flickr API. The API gives developers the chance to tap into the Flickr photo database and create stunning visualizations, fun toys, and productivity hacks that extend the service. Hosting my own images means I won't be able to take advantage of these Flickr-specific tools built (for the most part) by fans of the service. Loosing access to these tools is one of the biggest drawbacks to going off the grid. Even my ability to leave Flickr with my photos hinges on the existence of the API. People can use the API to export their photos if they're not happy with the service, or if they want to share their photos in different way.

But exporting your photos isn't just a matter of pushing an "export" button. You have to have a certain amount of technical expertise to be able to export your photos from Flickr. There is one existing tool I know of that can grab all of your photos—Flickr Backup—but in my experience it's a bit clunky to use, and doesn't snag all of my data like descriptions and tags. (Sounds like it's getting better since I tried it, though, based on posts in the FlickrBackup Open Discussion.) So while it's possible to get your photos out of Flickr, it's not easy for most of the world. I consider myself familiar with Flickr and its API, but it still took a few hours to write a script to grab my photos, titles, captions, and tags.

The key to gathering my photos was logging in as myself through the Flickr API. (Which is about as easy as it sounds.) Luckily most of the heavy lifting is handled through phpFlickr—a swiss-army-knife for working with the Flickr API in PHP. If you want to try this at home you'll need to download and install phpFlickr on your server. Make sure the main phpFlickr file is in your public working directory, along with the file auth.php. This little file helps handle Flickr authentication.

Note the URL of auth.php on your server, it should be something like:

http://example.com/auth.php

Flickr controls access to their API through keys, and I needed one for my import script. With my auth URL in hand, I headed over to the Flickr API and applied for a key. I quickly described the app, noted that it was for non-commercial use, and agreed to the Terms of Use. In exchange, I got a couple of alphanumeric strings that let me use the browser to log in via the Flickr API.

I was instantaneously approved for the key, and I clicked on the Your API Keys link and found the key I just made on the list. I clicked Edit key details, gave the import script a quick title and description, and placed my auth.php URL in the Callback URL field. Then I clicked "Save Changes" to finish the setup.

I found my new key once again on the list of keys and copied both my Key, and my Secret—which is a smaller alphanumeric string that shouldn't be shared with others (as the name implies). You'll need to jump through these hoops to register your import script as well if you're following along.

To recap the progress so far, these items were in my magic bag of holding before any files were transferred:
  • A MySQL username, password, and database name, with empty photos and tags tables.
  • A local filesystem directory where photos will be stored.
  • A Flickr API Application Key and Secret.
It takes a bit of work to put the pieces in place, but once this groundwork is done, importing the photos from Flickr can begin. That's up next.
  • danah starts a discussion about virtual walled gardens, gated communities, whatever you want to call them. Be sure to check out the comments. The central question to me is: "who owns the walls?"
    filed under: internet, privacy, community, identity

OTFG Step 2: Thinking about Photo URLs

My next step in moving my photos from Flickr to my own server was thinking about where I would store the photo files. Flickr assigns every photo a numeric ID which is available in every photo URL. For example, here's the URL of one of my photos hosted on Flickr:

http://farm1.static.flickr.com/124/359119647_4874f02815_o.jpg

The URL doesn't give much information about the photo. We know that the photo is at a flickr.com server, and that the photo is a user's original photo (note the _o at the end of the filename). Other than that, pretty anonymous.

Since I'm hosting my own photos, I thought I'd put a bit more information into the photo URLs. I decided to go with this format for original, unresized images:

http://example.com/[year]/[month]/[photo title].jpg

This means the same photo on my server will have a URL like this:

http://example.com/2007/01/beach-dogs.jpg

Though I don't expect my photo URLs to be exposed in the wild very much, I like this structure because it provides a bit of context. And because I'll be using actual directories in the filesystem named /2007 and /01, for example, the filesystem should scale well. I won't have hundreds and hundreds of photos in one folder. On the other hand, it will make running batch operations on all of the photos a bit tougher because I'll have to recurse through the directories—but that shouldn't be a big deal. (Especially since all of the file locations will be stored in the db.)

The Flickr API provides the date and time a photo was added to their system in Unix time, and the PHP date() function converts that to any format. So as my import script grabs photos from the Flickr server, it puts the image in the local filesystem based on the time it was added to Flickr originally.

I simply set a starting directory in my import script that's available through the web server, say, /www/photos/ or c:\\www\\photos\\ in Windows, and it will create the necessary local directories as it pulls in photos from Flickr.

Using the title of a photo as the file title is a bit tricky, because the titles are meant to be read by humans, not used in the filesystem. Photo titles contain punctuation and spaces, so I just strip all of that out with some regular expressions. I'm sure this could be improved, but I'm using:

$photoTitle_f = preg_replace('/\s+/', '-', $photoTitle_f);
$photoTitle_f = preg_replace('/[^-\w]/', '', $photoTitle_f);


Basically this bit of code says replace any whitespace in the title with a dash, and then remove any character that isn't a dash or isn't standard letters and numbers. A bit rough, but it should handle most standard English titles.

With the photo-URL planning out of the way, it was time to set up Flickr API access for my import script. I'll show how that works in Step 3.

OTFG Step 1: Setting the Stage

This weekend I took my first step toward going off the Flickr grid (aka OTFG). I set up a database to store information about my photos, I downloaded all of my original photos from the Flickr servers, and used the Flickr API to gather information about those photos. This first step is key for me because I don't want to loose the years of work I've already put in to adding titles, captions, and tags to my photos at Flickr. Luckily, Flickr has a fantastic API that lets you tap into their photo database.

If you want to follow along at home, my setup includes PHP 5 and MySQL 5.

I started by whipping up a quick database structure to hold info about photos. I'm sure this will change over time, but this is what I felt was the bare minimum I needed to get back out of Flickr. I made two tables: one for photos and one for tags. The table photos includes fields for a PhotoID, title, description, date the photo was added, date the photo was taken, longitude and latitude of the location (if available), whether or not the photo is public, and the location of the photo on the local filesystem. (I decided to store the files in the local filesystem instead of in the database because that seemed more intuitive to me.) The table tags will store all of the tags associated with the photos.

You can grab the SQL required to create the tables here: otfg_tables_1.txt.

And you can set them up in a new database like this:

shell> mysql -u [username] -p [password] [database name] < otfg_tables_1.txt

Next you'll want to set up a user to access this database. Fire up MySQL and run something like this:

mysql> grant all on *.* to [username]@localhost identified by '[password]';

Remember the MySQL username/password you set, you'll need them in a bit.

And with that, the stage is set and ready to be filled with photos! Coming up: How I grabbed my photos and put stuff in the db.

Disclaimer: Please be aware that I'm building this as I go. That means the code I'm sharing hasn't even been tested in the real world. I'm merely showing the steps I'm taking as a guide (and hopefully for some input). In other words, don't try this at home unless you're comfortable with what's going on here.

Going Off the Flickr Grid

When I started this site in 1998, one of the first things I posted was a set of pictures I took on a walk through Downtown Lincoln, Nebraska. The gallery is a bit clunky (both the photos and the design), but I put it together by hand and it worked. Over the years I've posted numerous galleries here, as you can see on my photos page. In 2003 I set up a way to automatically publish pictures from my cell phone to the web: Mophos Moblog. (It still has my old design.) In 2004 I set up a separate photoblog (2004 archive, 2005 archive) where I could easily post any photo without interrupting my text blog or without having enough photos for a full gallery of related pictures.

Even though I have all of these home-grown tools for posting photos, almost all of my photo activity here at onfocus.com stopped in 2006. (The last photo on my photoblog is from July 3rd, 2006. And the last gallery I posted is from February 27th, 2006.) Part of the problem is that I'm simply not taking as many photos these days. And the other problem is that I'm using the fantastic photo-sharing application Flickr (my Flickr photostream). Every photo that I want to share online goes directly to Flickr where I know it will be seen by my Flickr pals. And if someone isn't yet a Flickr pal and I'd like them to see a photo or two, I just send them to my Flickr photostream. I love Flickr so much that I even wrote half of a book about all the cool stuff you can do with it called Flickr Hacks. My online photo life has moved entirely over to Flickr.

Unfortunately, my inner geek isn't completely thrilled with my move to Flickr. As much as I believe Flickr is a revolutionary application, a part of me is sad to see onfocus.com go without photos. And another part of me thinks that all of the awesome stuff that Flickr enables (community, conversation, collaboration, cataloging, aggregation, and so much more) should be done in a distributed way across the Web. The Web geek in me feels that photo sharing shouldn't be owned by any one company, and photos themselves should ultimately be under the control of individual photographers.

I know this vision of distributed photo-sharing doesn't seem realistic right now, but it is happening. Photobloggers post across servers and domains with widely varying software and somehow aggregators are able to pull their photos together in unique ways thanks to standard feed formats. (I still use blogging software I wrote myself, yet I can join in the larger blogosphere because any news reader can pick up my feeds.) Of course enabling ad-hoc groups is impossible without a centralized application—and identity management/access control (photos for friends/family only) is next to impossible in a distributed fashion. But I believe the tools will get there. And I'd like to start living in my distributed-photo-utopia once again.

I realize that not everyone has the means and ability to manage their own server space. But as a do-it-yourself Web guy I have both, and I'd like to get back ultimate control over my photos. Over the next few weeks (months?) I'm going to re-write my personal photoblogging software from scratch. My first task will be to gather the 500+ photos I've already uploaded to Flickr, because their API makes it possible to export my photos. I'm hoping to document my progress along the way, in case my steps can help anyone else out there who wants to go the DIY route. Going off the grid (so to speak) won't be easy, but I just need to remember that I've been there before.

Progress so far:
  • Unobtrusive system messages application for Mac. This free utility lets you know what's happening on your computer right now.
    filed under: mac, productivity
  • Rebecca Blood asks Bruce Schneier to decrypt his blog. Schneier on Security is a daily read for me, and it's great to hear how he approaches blogging.
    filed under: weblogs
  • If you do any ColdFusion development (shut up!), you should check out this CF Textmate add-on. With this + Transmit, I prefer TextMate to HomeSite now for writing CF.
    filed under: programming, software
  • hooray! Click this link to set a cookie (with cookie technology) to disable those annoying Snap site previews that are popping up everywhere. [thanks torrez]
    filed under: marketing, internet

The Consumer Trap Redux

I extended my review of The Consumer Trap from a week or so ago for J.D.'s excellent personal finance site, Get Rich Slowly. You can read the longer version here: Book Review: The Consumer Trap.
« Older posts    Newer posts »