Finding Lost URLs

A week or so ago, a page by Professor Solomon called The Twelve Principles made the link rounds. The prof lays out a 12-step plan for finding any lost object. Most of the principles are mental tricks to get you back to the place you lost a physical object: your keys, your glasses, your cellphone, etc.

Unfortunately, the principles don't translate well to digital objects like URLs. You didn't stick that URL for the Xbox hacking How-To in your junk drawer, and it's not likely to be stuck in the "Eureka Zone" under your keyboard. But I loose URLs all the time. I remember something I saw on the web a couple weeks ago and I can't figure out how to get there again.

I don't have anything close to a 12-principle system for finding lost URLs, but I thought it'd be fun to examine my haphazard ways of re-finding web things. These are probably obvious, but I thought collecting them together would help me start a system for finding those lost pages, blog posts, and other digital artifacts that I'd like to see again.

1. Google - As you already know, Google is great at finding things, and I can usually get back to old URLs by remembering keywords for the document. Even if I don't find exactly what I was after, I can sometimes find good substitute information on the same subject. Unfortunately, a query like "SQL Remove Duplicates" will bring up thousands of documents, and if I'm looking for a specific bit of code I found once for removing duplicate records in a database the search has to go to the next stage.

2. Browse Browser History - Ctrl-H in the browser will bring up your surfing history and it can be a lifesaver if I know I visited the URL within the last week or two. It's especially helpful if I can remember the approximate time I was visiting the page I want to find, and I sort the history by date. But because browser histories only show the domain and page title, it's not very useful if I simply remember the subject of the page. I don't think of pages in terms of the domains they're hosted on, I think in terms of the page's content. (Searching your browser cache with something like Google Desktop might be better because you can search the full text of your browsing history, but I haven't started using this regularly.)

3. Revisit Web Haunts - Chances are good that I probably found the link I'm looking for at one of the sites I read regularly. Since I follow hundreds of sites with the news reader Bloglines, this can be a big search. Unfortunately the "Search My Subscriptions" feature at Bloglines isn't working for me, so generally I'll try to narrow down which site would have had the URL and then go back in time for each site individually using the "Display items within the last x" feature. Then Ctrl-F can help me find specific keywords within past posts. Google can also come in handy here. If I know I spotted a link about SQL on O'Reilly Radar, I can use the site: keyword like this: site:radar.oreilly.com SQL.

4. Search People - del.icio.us just rolled out a feature called your network that lets you track other del.icio.us members. There's no search yet, but you can browse back in time to see what people you know bookmarked at del.icio.us. I think this'll be handy, and I have gone back into specific people's del.icio.us archives looking for a URL. Having them all in one place is good for browsing, and saves time if I can't remember exactly who posted the link I'm looking for.

del.icio.us leads into my primary strategy for finding lost URLs: make links more findable before they're lost. Here's how I do it.

1. Use Web-based Bookmarks - I use del.icio.us (my bookmarks), but there are a bunch of web bookmark systems out there. When I come across a URL I know I'm going to want to get back to at some point, I'll click the del.icio.us bookmarklet and tag it. Searching my del.icio.us bookmarks is easy, but like your browser history, you're only searching titles, tags, and notes, not the full text of the site you bookmarked. Yahoo's My Web, and Google's Personalized Search both do better on the searching front—which leads to...

2. Turn on Search History - Privacy implications aside, I've found Google's Personalized Search handy for finding lost URLs even though I have mixed feelings about it. Once enabled, Google will remember every query you make and every search result you clicked on. You can then search just those sites that you clicked on in the past. Of course, that means everything you've searched for and every site you've clicked on is stored in a digital archive somewhere. I go back and forth, but privacy usually trumps findability for me so I might remove this option from my toolbox soon.

I should echo Professor Solomon's 13th principle: sometimes you can't find what you're after and you have to give up. The Web is ephemeral and pages come and go all the time. Even though it's maddening not to be able to get back to a document I know I've seen, that's life. What strategies am I missing?

Comments

I find myself trying to remember lost URLs by looking up related tags on Del.icio.us, and see if it's in the recent history. For example, if I was looking for a recent site about SQL or duplicates, I'd look here:
http://del.icio.us/tag/duplicates

Sure enough, there's the page you referred to.
I get to most of my serendipitous browsing via a feed aggregator, too. I wish the mechanisms for searching through your read posts were better! Google's web index often isn't fast enough to make a domain-specify search bear fruit.

Another way I've found is talking to one of my friends who basically follows the same weblogs. They often will remember who posted what, or likely candidates for the original linker.
Oh, great point. I'll also often search through my historical feed archives in reBlog, my feedreader of choice.
And to think people giggled at me six years ago when I said I used my friend's weblogs as a massively distributed personal bookmarking system.
This is exactly why I started bookmarking and tagging anything I thought I could remotely have a need for again. Now if I can just figure out how to easily convert the 4200 or so bookmarks in my Powermarks file for import to del.icio.us... :)
http://www.archive.org/ and the google cache have proven invaluable for different but related problem, the time when you know the URL of a page (say from your browser's bookmarks) but it is no longer there.
Very nice article. I have been using del.icio.us for several months now and find it indispensable. It reduces the effort required to file and categorize a new bookmark significantly (especially with the Firefox extension), so I bookmark every article I find interesting, and am much more likely to find it again later.
Del.icio.us and IRC logs do it for me.
It's funny, but now that so many people have link lists with del.icio.us, I don't get nearly as many URLs in email or instant messages anymore. Going back through email and chat logs used to be a way I re-found URLs, but those spaces are becoming an oddly link-free zone for me. Maybe I'm just using email and IM less, and using the web more.
This is *the* reason I started my second blog, which is basically a regurgitation of everything I've found interesting on the web (plus offline things: recommendations, quotes, conversations, radio articles...). The "blog this" bookmarklet, add a quote or a few words on my response, and I can find that code snippet I read three weeks ago, plus maybe a reason why it struck me. It's a browsing+thoughts history that I can search with type ahead find (in Firefox).
I highly recommend gregarius.net

Web based RSS aggregator, with tags and categories, search and everything.

Free. Open source. PHP/MySQL

Also, I use changenotes.com that sends me daily updates on changed websites I want to track. I save the emails I get so I have a personal searchable website archive.
I believe there is some mathematical equation that defines my ability to find something I once read on the internet when I need it again (provided I haven't bookmarked it in some method.) Something to do with multiplication of the number of links the item made on the internet by the number of links it made in my brain.
In my biased opinion you are far better to simply save stuff you find, when you find it. And not just a bookmark, but the actual content itself. That way if the web site or page disappears you've got a permanent record.

There are various programs and web sites that do this. I'll give my product Surfulater a plug. Surfulater not only lets you save stuff, it lets you edit it, add notes, search, cross reference, e-mail it etc. See http://www.surfulater.com /plug-end
I have thousands of bookmarks. You can bookmark Firefox's file of bookmarks (file:///C:/Documents and Settings/Baby Sissy/Application Data/Mozilla/Firefox/ Profiles/default.zop/bookmarks.html ) and search it with Ctrl-F.

I don't save things to my hard drive because I can recover them at the Internet Archive's Wayback Machine (http://www.archive.org/index.php ) if I have the link where the page used to be.

Google Desktop is very nice. Just searching "catfish" gets me nowhere in the thousands of pages on the web, but gets me the six recipes I was browsing last week and want to see again.
Whoops, in that sample bookmark URL I gave you must replace "Baby Sissy" with the name of your user account, or maybe "Administrator."
i actually make extensive use of Google Desktop's browser history archive, especially with Search Across Computers, which lets me get back every URL I've visited, regardless of which of my machines I saw it on.

As often as not, that gets me a small enough result set that I can find an address pretty quickly.
Anil, I find the privacy implications of Google's "Search Across Computers" even more disturbing than their Personalized Search feature. The EFF issued a warning about it a few months ago:

http://www.eff.org/news/archives/2006_02.php#004400

I think Google Desktop forces us to think about how much personal information we're willing to hand over to a company in exchange for services such as re-finding URLs quickly. (ie. all of it?)
cant find this url