hacks

OTFG: Syndication, Exif Data, and Tags

Now that I'm off the photo-hosting grid and have my local photoblog up and running, I've been working on some of the extra features. This week I set up an RSS feed and JavaScript syndication, grabbed Exif data from my photos, and set up a way to browse photos by tag. I won't go into great detail about my thinking on these—I'll just share my code and describe how I set up each one briefly.
Syndication
The RSS feed was fairly easy to put together. Here's the code that generates the feed: rss.php. I added a few extra constants to the ini.inc file that's included on every page because they were needed in the feed: APP_TITLE, APP_DESCRIPTION, BASE_URL, PHOTO_URL, and PHOTOGRAPHER_NAME. I also went back through each file in the application and substituted these constants where I'd hard-coded the values. (This will make it easier to change these variables globally in the future, and it makes the code more reusable.) And I went with full-sized images in the feed. Here's the live feed if you'd like to check it out.

One of Flickr's nice features that I was using before my move was their JavaScript badges. With a few lines of CSS and a single line of JavaScript, you can include a strip of photos on any website. I was using a vertical strip of five photos (check "now seeing" in the right sidebar on the front page of this site for an example). I decided to throw this together for my local photoblog, and here's the file that generates the JavaScript that generates the strip: js.php. Nothing too tricky going on here, just a document.write of some HTML that displays the smaller-sized thumbnails of recent photos.

I store both rss.php and js.php outside of my public web directory. I run them every hour, piping the output to a couple of files in the public web directory. To accomplish this with Windows Task Scheduler, I have a simple .bat file that looks like this:

C:\path\to\php\php-win.exe C:\scripts\js.php > C:\web\site\include.js
C:\path\to\php\php-win.exe C:\scripts\rss.php > C:\web\site\rss.xml


So I just run this batch file every hour and that way the feed and JavaScript file are cached and won't be re-built each time someone requests it.

I set up an .htaccess entry for the feed so it has a nicer URL: RewriteRule ^feed/?$ rss.xml [L]. I'm the only one consuming the JavaScript, so that URL can be as ugly as it wants to be.
Exif Data
Most (if not all) digital cameras embed some information about the state of the camera within the photo files themselves using a format called EXIF. Flickr made great use of this data with a special page that displays quite a bit of the Exif data available in uploaded photos. (For example, More detail about bandon beach.) The Exif data gives you a quick look at your shutter speed, aperture, focal length, and a few other settings. I think it helps to look at Exif data frequently so I can remind myself which settings worked or didn't work for any particular type of photo.

Because the Exif data is embedded directly in the original photographs, I could just grab this info on-the-fly to display it on the site. (PHP 5 has a few Exif functions that make snagging this data fairly easy.) But I'm guessing down the road I'd like to do more with this data, such as grouping all photos by a particular camera together, or grouping all photos taken with my 50mm lens together. More control means storing everything in the database, and I put together a table called exif and a generic function to grab/store the data.

Here's the SQL to build the table: otfg_tables_6.txt, and here's the include file with the exif-grabbing function: addExif.inc. The function addExif() accepts a PhotoID and database connection, finds the original file for that photo, and extracts and saves the Exif data. I went back and added this function to the uploading files (Step 10: Adding Photos), and whipped up a quick script to add Exif data for existing photos: setExif.php.

After running this script I had a table full of Exif data, and I found out that 490 of my 840 photos had Exif data. Unfortunately my cell phone strips Exif data from photos before it sends them via email, which means my cell phone pictures (a large percentage) don't have any Exif data to extract. One improvement to this process that I need to write soon is setting the DateTaken value in the photos table based on the Exif DateTimeOriginal value. That way I can display both the time the photo was posted, and the time the photo was actually taken.

I'm displaying the Exif data on the photo detail page, as a single line of text. For example, the photo bandon beach shows the following Exif line below the photo:

otfg exif

This Exif line lists the camera make and model, the shutter speed, the aperture, and the focal length. I'll probably add ISO and exposure bias eventually because I like to see that data too, but I thought these were the basics.
Tags
I've also added some features around tagging. Each tag listed on the photo detail page is now linked to a tag page where viewers can see all photos with that particular tag. One example is the tag page for architecture—a tag I've used frequently. Here's the code for this page: tag.php. As viewers click photos from the tag page, they're able to stay within that tag context. So all of the back/next controls on the photo detail page reflect the viewer's most recent choice. The photo detail URL is also under the /tag directory to reflect the different context. (I've set up my robots.txt to ask robots to ignore the /tag directory so there aren't multiple locations for photo detail pages.)

Today I put together a standard tag cloud so I can visualize how I'm using tags. As usual, the larger the tag, the more frequently it's used, with red tags the most-used. Here's the code for this: tags.php.

All of this tagwork required some significant changes to photo.php and editing.js. Both of these updated scripts are getting more complex by the hour. (Beyond the tag stuff, I also improved some other JavaScript editing functions so simple HTML won't wreak havoc in captions.) I also added a few more entries to the .htaccess file to handle this new tag-space:

RewriteRule ^tag/(.*?)/(\d{1,2})/$ tag.php?tag=$1&page=$2 [L]
RewriteRule ^tag/(.*?)/(\d{4}/\d{2}/.*)$ photo.php?p=$2&tag=$1 [L]
RewriteRule ^tag/(.*?)/$ tag.php?tag=$1 [L]


This just cleans up the URLs for tag pages, and paging through photos there. As you can see, photo stubs under /tag/ are just routed back to photo.php along with the tag itself. (This is also some good foreshadowing for how I'll probably handle galleries since these are basically just tag-galleries.)

All in all, going off the grid is going well. I still have a long to-do list, but I feel like I crossed some of the big features and fixes off my list this week.

OTFG Step 11: Displaying and Editing Photos

Good morning, off the grid fans! I know it's been a while—I've been working on my new photoblog behind the scenes. I'm at a point where I'm not sure it helps anyone to share my code, but who knows? Last time I talked about how I'm getting photos into my system, and today I'll explain how I'm editing photos that are already in the system.
Photo URLs
The first thing I needed to do was provide a permanent home for every photo that finds its way into my database. The easiest method would be using the internal photo ID in the URL somehow, so I'd end up with a URL for any photo like http://photos.onfocus.com/459918/. But as I mentioned when designing the photo file locations (Thinking about Photo URLs), I want to include a bit more information about the photo in the URL. So I went with a pattern similar to the files: http://photos.onfocus.com/[year]/[month]/[title]. As with files, the title is stripped of any non-URL-approved characters, and whitespace is replaced with a dash.

I also wanted to keep photo IDs internal, and not use them anywhere within the public-facing application. So I thought the URL pattern of [year]/[month]/[title] would be a good way to uniquely identify a photo within the system as long as there were no duplicates. To make this happen I set up a field in the database called Stub (varchar(50)) that holds the entire string: [year]/[month]/[title]. If two photos have the same title within the same year and month, I simply increment the stub like this: [year]/[month]/[title]-1, [year]/[month]/[title]-2, etc. This way the permalink can be used not only to see the photo on the Web, but also to identify the photo within the application.

I wrote a quick script to add URL stubs to all of the existing photos: addStubs.php. And I retrofitted all of the uploading scripts from the last step to write a URL stub as a photo is added.

With the virtual space for photos set to go, I needed to give them a permanent home. I set up a script called photo.php that shows one photo at a time, along with a bunch of details about that photo. By passing a URL stub into the script like this http://photos.onfocus.com/photo.php?p=[photo stub], the page knows which photo to display. With a little .htaccess magic: RewriteRule ^(\d{4}/\d{2}/.*)$ photo.php?p=$1 [L] the nicely formatted URLs are a reality: for example, bandon beach.
Photo Detail Page
So the photo detail page accepts an incoming URL stub, looks up info about that photo in the database, and arranges things nicely on the page. Here's what a photo detail page looks like in my system today:

otfg photo detail

The main bits are the title, photo itself, caption, time the photo was posted, and a list of tags associated with the photo. This is the public view. If you have the right credentials (set up in Step 9: Authentication), you see a bit more on the page, and you have a few more options. As an administrator, directly below the caption is a row of administrative buttons:

otfg admin buttons

Here's what each button does:
  • Sets the public/private status of the photo.
  • Toggles the caption editing form.
  • Rotates the photo 90 degrees clockwise.
  • Completely removes the photo, its thumbnails, and all associated info.
The other thing an admin can do is edit the title, caption, and tags in place by clicking on any of these things (like Flickr). It looks like this:

otfg editing title

All of this editing is accomplished with a series of files. The Ajax package Prototype handles some of the work in the background. And the rest of the interface stuff is in a Javascript file that's included on the page for an administrator: editing.js. The functions in this file post to several PHP scripts that return simple text information: The only function that requires a page refresh is completely deleting a photo, which is ok with me because the photo detail page is about to be history anyway and I've got to go somewhere. The administrative buttons all have a JavaScript confirmation dialog before they execute, so I can't accidentally delete a photo when I want to rotate it.

And here's the beast of a script that pulls everything together: photo.php. This is fairly complex, especially because the public design and administrative functions live together in the same page.
Photo Home Page
With the photo detail page set, I set up a place to introduce people to my photos: the home page. Again, it's very Flickrish, with two columns of photos and a list of pages at the bottom. The only difference is that the latest photo is at its largest size at the top. I haven't built any editing into this page yet, so any updates have to happen on photo detail pages. And here's the code that powers the front page: home.php. I set up an .htaccess rule for this page too, to help with paging: ^home/(\d{1,2})/?$ home.php?page=$1 [L]. That way, as you page through the photos, you'll get friendly URLs like this: http://photos.onfocus.com/home/2.

I think this step gives me a functioning system I can use to publish photos. There are definitely more features I need to build: browsing by tag, extracting EXIF data, an RSS feed, galleries/sets, commenting, and mapping photos with coordinates. And I have some issues I need to work out. (Anyone know how to force a cache refresh on photos after rotation beyond adding some random numbers to the file name?) But I think all of the bare essentials for sharing photos with the world are working now. woohoo! You can compare my local photostream to my Flickr photostream to see how similar they are.

By the way, the little administrative and tag icons are all modified from a set of tiny icons by Timothy Groves: Ho!Ho!Ho!. Thanks!

OTFG Step 10: Adding Photos

I'm still working on my project to host my own photos that I'm calling going off the grid, and I'm sharing the code I write as I go. I was hoping to be done by the end of February, but I think I'm still a few steps away from a functioning system. Last time I set up a method for authenticating myself, and the next step was figuring out how to get new photos into the filesystem and database.

Flickr has myriad ways to get photos into their system. They have a bunch of client software that can upload photos in batches, in addition to a standard web form and uploading via email. I've played around with all of these, but I only used two regularly: the web form and uploading via email. So that's what I put together for my local system.
Web Upload
For the forms, I basically duplicated Flickr's 3-step upload process. I choose the files, upload the photos, then title and tag them. The big difference here is there aren't any privacy controls. I'm assuming that every photo I add into the system will be public, so I'm not concerned about setting privacy when I upload. (That's always something I can add later.) I reused code from Step 8 to write thumbnails for the new photos, and just needed to write a few scripts to handle uploading and updating the database. Here's the set of files I'm using to upload photos:
  • upload.php - sets the basic uploading form.
  • upload-action.php - uploads the files, writes thumbnails, and then writes a form for each photo for adding title, description, and tags.
  • upload-final.php - updates the database with the new titles, descriptions, and tags, and sets the photos as public.
And outside of my public web directory, I have a couple of files with some helper functions that are included:
  • addPhoto.inc - adds an incoming photo to the database and returns its photoID.
  • writeThumbs.inc - writes all of the standard thumbnail sizes (and resizes the original, if necessary) for a given photoID.
Each of the public files are only going to be used by me, so there's an identity check at the top of each script. If the current user isn't logged in as an admin, the script boots the user to the home page. That's not too friendly, but I'll know what's happening immediately since this is my system.

Another difference I should point out is that I have to go through all steps to publish the photos. If I upload three photos, but don't add tags and titles in Step 2, the photos won't be public. In Flickr web uploading, you can skip the form for adding titles, tags, and descriptions and the photos will still be live. I decided to make that last step mandatory, even if it's just hitting the submit button again. I think forcing myself to think about titles helps my publishing process.
Email Upload
Sending photos by email is crucial for me because I like publishing cell-phone pictures while I'm out and about. When I set up my first moblog several years ago, I whipped up a filter for XMail (my mail server) to process attachments from any incoming message to a specific address. This time around I wanted something more generic, so I settled on a script that checks a specific email address via POP every 15 minutes, and handles any new messages with attachments.

And here's the script: check-mail.php. I keep it outside the public web directory, and run it every 15 minutes with Windows Task Scheduler. (You could use cron on *nix systems.)

If you're going to try this out, you'll need to add your own mail server details to the top of the script. The MAIL_TAGS constant is set in ini.inc, and is simply a set of tags to use for any photo that comes in via email. (a la Flickr.) I use mopho and cameraphone for any photo that comes in this route. Same with TEMP_DIR, this should be set to a full path to a directory for temporary files.

I think it's important to use a brand new email address that's hard to guess, and is only used for this purpose. And you shouldn't ever share the address. The address is almost like a password, so I treat mine accordingly. And if you can use an email address at a private domain (instead of gmail, hotmail, yahoo, etc.) I think that would be better. You don't want random spammers to be able to post pictures of pills or casinos to your photoblog. (In fact, I think I'll go back and add a sender whitelist to this script for my own piece of mind.) This script is set up for a POP account on standard ports, so you might need to check the PHP IMAP documentation for different setups. I've also only tested this with my phone (a Sony Ericsson S710a) and other phones might attach photos in a different way.

Now that photos can find their way into the system, I need a way to edit photo details. That's up next.

Hacking at Hackszine

I have a guest post over at Hackszine today: Hacks Authors' Blogs: One Feed to Rule Them All. As the name implies, I describe how I threw together a master list of blog feeds by authors in the Hacks Series. I think it'd be fun to compile a list of blogs by Hacks contributors as well, but those names aren't as accessible.

OTFG Step 9: Authentication

So my "little" project of going off the Flickr grid is getting exponentially more complex. And it's definitely taking a little longer than a couple Saturdays. But that's the way these things go. Exporting everything from Flickr took a week or so, but that was only the beginning. After planning for thumbnails and resizing all of the images, my next dilemma was how to handle telling my site that I'm me.

In the past, I've kept the public site and administrative functions separate. So when I needed to create a gallery, title photos, or add captions I'd go to a completely separate web application that existed in a separate folder on the web server. With very basic HTTP-Authentication, I could be sure that I was the only one with access to that section. What I like about Flickr is that I can edit in place—which means that if I see a photo title that's a bit off, I can simply click and edit that title. And my editing interface is pretty much the same interface that anyone else sees. I'd like to be able to edit in place with my photos too, and that means combining administrative and display functions into one seamless application. Unfortunately, that means my basic HTTP-Authentication is out as an option, because it's all or nothing with that scheme. Everyone visiting the site would have to log in via a user/pass prompt, and that just doesn't make sense.

I thought about limiting administrative functions by IP Address, because I'm at the same IP 90% of the time. It would mean a hell of a lot less code to write. But I know I'm going to want to upload/edit photos on the road from random IP addresses. So that leaves one option: standard database authentication.

(Ok, another option would be authenticating with a Yahoo!, Flickr, or Google account, but the point of my project is to get away from relying on the big guys.)

So at this point my application only needs two types of users: anonymous guests who can view photos and a known administrator (me!) who can also add/edit/delete photos. That means I'm going to set up a user database with only one user in it. And I'll have to write a bunch of authentication code that only I will use. It's frustrating, but I think the work will be worth the convenience down the road. Plus, if I ever decide to have "users" with different levels of access (like, say, family members can log in to see family-only photos), the structure will all be there to make it happen fairly quickly.

Here's the users table I set up: otfg_tables_5.txt. I won't bore you with the gory details, but CookieID, LoginKey, and LoginExp will help create a persistent login system so that if I have the right cookie set I won't need to log in each time I visit the application. And because I'm doing some encryption voodoo on the password, cookieID, and everything else, I set up a script to add the administrative user to the db: addAdmin.php. If you want to try this out, add your username and password to addAdmin.php, call the script from a browser, and then remove/rename the file. The password is stored as a one-way hash, so you won't be able to get it back again. If you ever forget your password, you can always delete the existing user via MySQL and re-run this script with a new password. Once the admin user is added, remove this file from the public web.

You might notice at the top of addAdmin.php, there's a file called ini.php included with require("ini.inc");. Since I'm about to write a blizzard of files for my application, I decided to put some application settings in a separate file and include them on every page. Here's what it looks like: ini.php. It's probably a good idea to store this file outside of your public web directory if you can, because it has a bunch of private info.

With the user added, I just needed a way to identify myself within the application. That's what these four files do:
  • login.html - just a simple HTML form to set a username/password.
  • login-action.php - accepts the username/password, matches it against the db, and sets a cookie and session variable for administrative privileges if there's a match.
  • logout.php - destroys the current session and overwrites the cookie.
  • auth.inc - the function here will set the proper session variable if an administrative cookie is present, this enables persistent logins.
  • login-status.php - this script just shows the current login status. I used it for testing.
I'm the only one who's going to be using these scripts (for now), so they're very bare-bones. It's not giving back friendly error messages when I can't log in, but I don't mind. With this infrastructure in place, my site will be able to pick me out of a crowd. And I can get back to thinking about how the pages will look.

Next Up: Getting new photos into the system.
  • this site maintains a database of md5 hashes and the original text. This is a good starting point for decrypting these supposedly one-way hashes. If you're storing passwords as md5 hashes, don't forget the salt.
    filed under: hacks, security, identity, programming

OTFG Step 8: Resizing Images

The next step in going off the grid to host my own photos was resizing my images for display. My initial import script downloaded just my original photos that I uploaded to Flickr. But when you upload a photo to Flickr, the service creates four (sometimes five) copies of the original image at different sizes. This way Flickr can show various thumbnails of images in different ways, and they can use a standard size to display a photo on its detail page. You can click the "All Sizes" button above any photo at Flickr to see all of the sizes available for that particular photo.

I needed to do something similar, and I'm not sure exactly how I want to display my photos yet. So I decided to use Flickr's default image sizes (for the most part), to give me some different sizes to play around with. I went with the following sizes:
  • Medium Thumbnail - 240 pixels max width or height. Flickr uses this size for their photostream pages.
  • Tiny Thumbnail - 100 pixels max width or height. Flickr uses this size for JavaScript syndication (Flickr Badges).
  • Square Thumbnail - 85 pixels square, cropped from the center of the original image. Flickr uses a 75 pixel square thumbnail on the member home page and in back/next links on photo detail pages.
Flickr has a nice naming convention for these different sizes. They use [FlickrID]_[Thumbnail code].jpg to denote the various sizes. So a square thumbnail in their system will have a name like, 359119647_4874f02815_s.jpg, and the same photo in a bit larger size would be 359119647_4874f02815_m.jpg. I went a similar route, but decided to separate the thumbnails from the original photos. By placing the thumbnails in a different root directory, I can stop search engines from indexing the copies with a well-crafted robots.txt file. That means any photo that is syndicated out through Google Images, Yahoo! Images, etc. will be the original photo I want to share.

And because the thumbnail URLs won't be thrown around in the wild, I decided to use a naming system that doesn't give any information about the file itself. I thought that if I could assemble a thumbnail name from limited information (just the PhotoID) that could minimize the amount of stuff I have to pull out of the db. But I don't want to expose PhotoIDs through the system—that could let someone look at photos that aren't meant for them by guessing its ID in a series. So I went with an MD5 Hash of the PhotoID, plus a string that's unique to my application. That should obscure my IDs to all but the most determined cryptographers.

Beyond the thumbnails, I also wanted to set a maximum photo size for the original photo. I'm not sure what the design will look like yet, but I know that I want a standard size to work with in designing the pages. That means I could a.) make sure to resize all photos to the maximum size before I upload it into the system, or b.) automatically resize any original that's larger than my max. I went with b to make my uploading life easier, and to scale down any large photos I might have uploaded to Flickr. (I tried to go with smaller sizes at Flickr, for the most part.) In this script, if an original photo is too large the original file is copied to [name]_o.jpg in the /photos directory, and then the resized photo is saved to the "original" filename: [name].jpg. I know this system isn't perfect, and I have a feeling this is going to cause problems down the road, but hey, what can you do? I think it'll work.

Here's the script I used to resize all of my images: resize-all-photos.php. And here's the unique information that you'll need to set at the top of the resizing script if you've been following along:
  • A PHOTO_MAX_WIDTH and PHOTO_MAX_HEIGHT for the maximum file size you want to display. (I went with 850 x 640.)
  • A PHOTO_QUALITY that will be used in PHP image functions. (I went with 95, but you can play with this up or down to change the filesizes and image quality.)
  • A SALT that's used for uniquely naming thumbnails on your system. (I used—wait a minute, you almost got me. This should be a string of 8-x characters that only you know about.)
  • The full path of the /photo directory set in the original import script, along with a new /thumbs directory at the root path of the site.
  • And, of course, your MySQL details.
  • The thumb sizes are hard-coded, but it should be fairly clear where to change them if you'd like different sizes.
With all of this set, I ran the script and generated a bunch of thumbnails. To get a sense of the sizes available and the file names/locations, check out this page. As I mentioned, this is very close to Flickr's standard sizes and I can always re-run this script with new sizes if I need something different for the final design.

And to help keep my eye on the goal, I set up a couple different pages with some ideas for displaying photos. Here's one that's very Flickrish, with the latest photo large, followed by smaller photos: onfocus photos preview. And here's another page with the latest photo up-front, and older photos as square thumbnails: onfocus photos preview 2. Getting closer!

OTFG: Woops, Rotating Images

Last night I was working on the "Resizing Images" code I hoped to post today, and realized that I missed an important bit of data all the way back in Step 4 in my original import photos script. I forgot to get Rotation information about each photo. Flickr lets you rotate a photo after you've uploaded it, which is especially handy for cell phone images that aren't easy to rotate before you upload. Because I grabbed all of my original photos, I got the non-rotated versions. Luckily, the Flickr API lets you know how many degrees you rotated a photo so I just needed to whip up a script to grab that info and actually rotate my non-rotated originals.

The first thing I did was add a Rotation field to the photos table. That looks something like this:

mysql> ALTER TABLE photos ADD COLUMN Rotation INT NOT NULL;

And here's the script I threw together to rotate any images that needed it: rotate-any-photos.php. One thing to note is that Flickr's rotation is clockwise, and PHP's imagerotate() function rotates counterclockwise. So I needed to make the Flickr degrees negative to compensate. I also copied the original file (using Flickr's _o naming convention), and then saved the rotated image to the original file name. Managing filenames is turning into a pain, but I think this will work ok.

This script rotated 66 photos for me. Now, up next (hopefully): resizing images.
  • Rafe on conflicting images of Iran. We're only getting one view of the country in our major media outlets, but the social Web provides a more nuanced, complete view.
    filed under: media, marketing, politics, flickr, photography
  • haha, let fate determine where you should eat! Jim put together a fun visualization of Yahoo! Local business entries.
    filed under: yahoo, hacks, flash, joke, food, webservices

OTFG Step 7: Import Notes

Welcome back to This Old Blog. This week we're going off the Flickr grid by setting up a custom photo sharing application at a private web site. (this one!)

Flickr has a fantastic feature called notes that lets people add a layer of information on top of a photo. Here's one of my photos with notes so you can see it in action: moon 8/31. As you hover over the image, you can see boxes I've drawn, with notes underneath. It's great for pointing out some little detail in a photo that might otherwise be missed. Matt's memory maps idea is another great example of notes in action. Check out the memory maps tag at Flickr to see hundreds more.

I don't plan on adding notes to my photos here. To me, notes is one of those magical features that just happens—I have no idea how it works. I don't even know where to begin building it. But that doesn't mean I won't want to build it someday, so I figured I might as well throw my existing notes on photos into a database in case I want to recreate them down the road.

To accomplish this, I took a look at the notes data available through the Flickr API and mapped everything to a local table, notes. Here's the structure: otfg_tables_4.txt. And I set up another PHP script to do the work: import-flickr-notes.php. If you've been following along with the previous scripts, you know the drill on this one: runs in the browser, authenticates at Flickr, add your details to the top, might set your house on fire, etc.

This script pulled in 48 notes for me across 19 photos.

Ok, after importing notes I have every bit of information I could possibly want from Flickr residing on my server. That means it's time to start thinking about how I want to show my photos.

Next Up: Drink Me (Resizing Photos)

OTFG Step 6: The Trouble With Comments

The next question I asked myself in my Flickr move: should I take comments with me? Flickr is built for having conversations around photos. Every photo entered into Flickr is also a place for discussion, and those places never close. As you add more photos into Flickr, there are more and more opportunities for discussion to monitor. I'm always surprised when a photo I uploaded in 2004 has a new comment on it, and I frequently miss comments on older photos because I don't check my "recent activity" page enough. But casual conversation about photos is what makes Flickr so much fun. Sometimes the comments are better than the photo people are discussing, and often photos are posted just because they'll generate comments.

There was no question in my mind that photos, titles, captions, and tags are mine, and I have no trouble taking them away from Flickr and sharing them somewhere else. But comments exist in gray area for me. The people commenting on my photos expected their comments to be at Flickr and nowhere else. At the same time, the comment wouldn't exist without my photo prompting it. Another complication is that Flickr doesn't expose comments via their API. I'm not sure why, but it could be for the very reason that people expect their comments to be at Flickr and nowhere else. (Though photos seem far more personal to me and they're available through the API unless someone specifically requests that they're not.)

I decided to compromise by downloading the comments but not displaying them on my site. That might seem odd, but I'd like to be able to go back in a few years time and see comments that people I know made on my photos. (Even those silly "please add this photo to the chickens being used as phones group" might be fun to read again someday.) If my Flickr Pro account lapses, that won't be possible. Or if I move off the grid, that won't be possible. So I downloaded the comments on my photos via screen scraping, and I'll keep them in a database for my private use. When I start my photoblog here at onfocus.com, I'm going to want comments, so I'll use the same table but flag the Flickr comments as unique (not for display).

Here's the table I set up for comments: otfg_tables_3.txt. The FlickrID field notes the internal Flickr ID of comments I scraped from the site. There won't be any entries in the IP field for Flickr comments, but this will eventually hold the IP Address of people who add comments here. (I figured I might as well include that now.)

And here's the script I used to import comments: import-flickr-comments.php. This is how it works:
  • Grabs the FlickrIDs of photos in the photos table, and starts looping through.
  • Logs in at Flickr (old skool only) and sets a local cookie for subsequent requests.
  • Grabs the photo detail page of that particular photo.
  • Picks through the HTML to find the comments.
  • Calculates the approximate date of the comment.
  • Throws the comment, username, Flickr profile URL, and date into the comments table.
  • Rests for one second.
Figuring out a comment date isn't an exact science. Flickr doesn't include an exact date/time for comments, instead they use a friendly format such as "30 days ago" or "2 hours ago". This script takes the current time the script is running, and then uses the PHP date_modify function to come up with an approximate date/time. It won't be close, but it should come up with the right month and year for that particular comment.

If you want to try this out you'll need to add your photostream URL and your Flickr username (most likely your email address) and password. The script logs in as you so it can grab comments on photos that are marked as friends/family only. But it's important to note that this login only works for pre-Yahoo!-acquisition Flickr members. I have a so-called old skool ID rather than a Yahoo! ID, so I didn't need to log in via Yahoo! If you want to run this under a Yahoo! ID, you'll need to fiddle around with the login stuff around line 17. As with the other scripts I've posted, this runs in your browser.

After this script ran, I had 260 comments in a local table. Even though they might never see the light of day again, at least I have a record of what friends and random Flickr folks said about my photos.

Coming up: A note about notes.

OTFG Step 5: Setting Up Sets

Grabbing all of the photos and their extended info is only the first step to going off the Flickr grid. I also have some information about how photos are related to each other that I need to grab. Flickr calls groups of photos sets, and they're simply a number of photos from your pool of photos that are displayed together. Here's one of my sets at Flickr: New Zealand 2006.

On this site (pre-Flickr) I grouped photos into two distinct areas: galleries and my photoblog. Like a Flickr set, a gallery is a group of photos that are related in some way, and my photoblog was for everything else. So the photoblog was a single, large gallery of photos that were only related by the fact that they weren't part of an existing gallery. (Don't worry, this isn't a paradox. Yet.)

Instead of two distinct areas, Flickr says you have one big set of all your photos—your Flickr photostream (much like my photoblog)—and from there you can have sub-sets, or simply sets. I think this is a much more intuitive approach for photographers. I can look at the entire pool of photos I've taken, and then create associations from that pool at will. Even though this makes sense for the photographer, I'm not sure this is the best approach for viewing photos. (I'll discuss this more later when I'm building the Web side of this project.)

Anyway, I want to save these photo associations I've already put together at Flickr. I set up two tables for this task: sets and setphotos. Every set has a title, description, and FlickrID (just like photos). The Flickr API doesn't provide the date a set was created, but I added a DateCreated field anyway because this is something I'll want to track eventually. I can either go back and guess on the dates my Flickr sets were created or base the DateCreated on the photos within the set somehow. The setphotos table is simply a list of local photoIDs associated with a local SetID. The IsPrimary field lets you flag one photo to represent the whole set—required when you create a Flickr set. And I added Order and DateAdded fields for later use. Flickr lets you order the photos arbitrarily within a set, but doesn't expose the ordering system via the API. I'll probably have to do that by hand later.

You can grab the SQL required to create the set tables here: otfg_tables_2.txt. And you can add it to a MySQL database like this:

shell> mysql -u [username] -p [password] [database name] < otfg_tables_2.txt

Ok, with the data set to be structured, I just needed to reach out and grab it. Once again, here's the covered wagon script I threw together for this: import-flickr-sets.php.

Here's what the script does:
  • Authenticates the person running the script at Flickr via the browser. (I don't think this is technically required for this step because sets are public, but I had the auth code from the other script so why not?)
  • Requests all of the account's sets, and adds the FlickrID, title, and description of each to the sets table.
  • Then the script grabs the list of photos in the set, finds the local photoID for that photo, and associates it with the set in the photosets table.
  • Finally, a 1 second pause for good API behavior.
Shew! Don't forget to add your local details to the top of the script if you're going to try this out, and then run the script from your browser.

It only took a few seconds to grab all of my set info from Flickr. I had six sets with a total of 88 photos.

One caveat: if you're going to try this out and you have sets with huge numbers of photos in them, say, 100+, this script probably won't work for you. There's no paging going on through the results that come back from the API. You'll need to dig into paging a bit to get it to work. And anyway, who can look through a single set with hundreds of photos in it?

That's almost all of the information I need to start displaying my photos. But first, the trouble with comments up next, ugh.
« Older posts  /  Newer posts »