How Apple Can Improve Spotlight Search
(Or How Microsoft Can Beat Spotlight)

Spotlight is easily the most significant feature of Mac OS 10.4 and certainly takes desktop search to impressive territory. It’s the beginning of a future where sorting through files and folders becomes silly and arduous when all you need to do is search. It’s important to note though that Spotlight is really just version 1.0 of metadata-rich search and there are many improvements to be had. A little speculation and daydreaming will give us a sense of what’s to come.
User Activity Analysis to Sort Results
Spotlight has a default pattern that it uses to sort it’s search results, but fortunately you can choose to rearrange this pattern. For instance, Spotlight defaults to serving up applications that may match what you are looking for first, but I chose to make contacts that match and documents that match my first and second results.
In truth though I shouldn’t have to manually rearrange this list. There is data to be derived simply from my day to day activities. Spotlight can watch how I spend my time as well as what I usually pick from it’s search results and adjust ordering based on my use on the fly. This would be more effective than it’s default settings, which is simply an educated guess by Spotlight’s developers, and even more effective than my choices, which is simply a slightly more educated guess by me. There’s no better way to guess what I want now than to know what I wanted the last thousand times.
Image Analysis
The largest untapped resource of data on your computer is all those pictures you take with your digital camera. The only metadata to be found by default is a cryptic camera-assigned name, the date the picture was created and information like the picture’s aperture. None of this is helpful when all you’re really trying to find are pictures of your mom for a mother’s day card.
The way to remedy the situation is with intelligent image analysis. This is a huge undertaking in order to fulfill it’s full potential, but it’s an undertaking that can be approached with practical baby steps that will yield helpful short term results. There are three things that should be derived from image analysis: objects, colors and actions.
A lot of work is already well underway in recognizing objects in photographs and video, largely being put to use in security. I’ve experimented with software designed to recognize any human face and even recognize mine in particular. It’s fun when it works, but further experimentation shows that current solutions are easily tricked or confused. At the end of the day, your computer needs to be able to recognize faces and sunsets and cat and dogs. If it can recognize an object, it will eventually be able to get an accurate count of those objects allowing it to distiniguish between searches for a single item like “dog” or “tree” and appropriate plural forms like “dogs” and “forest”.
Establishing the dominant colors in a picture should be an easy matter and something that can hit a search release relatively soon. Simply map color bit ranges to common english terms like “yellow”, “blue” and “orange” and you now have the ability to pull up photos by their dominant color. Combine the analyzed data on both colors and objects and you now have the ability to satisfy searches like “yellow flower” and “blue dress”.
Actions will be much more difficult to discern and will have to be associated with images after a well extablished ability to recognize objects is achieved. Does this all seem too complicated for a computer to be able to recognize the objects in a picture and then possibly even the action being performed with that object given the nearly endless list of possible actions and objects? Well, you may find yourself recruited by your computer to help out.
All the hours people wile away on solitaire could actually be turned into something useful. Using the your own photos, you and your computer could play a game where the computer tries to guess what it’s looking at and you correct it to see how well it does. This is a lot more fun and intriguing than it sounds. To get a sense of what I’m talking about, try out 20q.net where you think of an object and the computer tries to guess what it is. Over time, these systems learn from their mistakes and even learn about objects they’ve never encountered before.
Another great way to get data from you about your pictures would be to have you talk about them. While showing off your vacation pictures in a slideshow to friends you are probably narrating the various images. Your computer could be listening and analyzing looking for words in your speech to backup what it may already suspect. If it thinks there’s a dog in the picture and you say “dog” a couple times while the picture is displayed, the computer has even more reason to believe that this picture is, in fact, of a dog.
Association by Activity
The OS should be aware of groups of items that are consistently opened within minutes of each other as well as the conversations we have and the emails we send. If I work on a document and email that document to someone, Spotlight should know that the document I worked on is related to both that email and that contact.
Related Search Terms
If I have an image labeled “Empire State Building”, Spotlight should know that this famous building is in New York City. If I should search for new york, this picture should be offered up as a result.
Including Outside Data
Should Spotlight include data from outside your system? If seems like an easy win to have Spotlight retrieve search results from the internet if it doesn’t find results on your local drive. The possibility of blending local and internet results could lead to a log of confusion, but ultimately must be figured out. With any search you perform, what you want is a helpful result. Provided you find what you were looking for you probably won’t care what it’s source was.
Spotlight with Eyes
What happens if Spotlight has eyes in the real world? Add an iSight camera and Spotlight could have just that.
Where Did I Put My Car Keys?
This is just fun, but there is something to be had here. Wouldn’t it be nice to have Spotlight find your keys? At the very least journalists would think it was the bee’s knees. (Think of the Volkswagen Beetle’s fairly useless but much spoken-of built-in flower vase.) Your computer will eventually have eyes by way of an attached camera. Once image and video analysis reached the point of identifying objects it will be a simple matter to apply that to the images constantly coming through it’s own camera. In the short term though, you could have the OS come with a keychain that speaks to the OS.
Will Desktop Search Still Matter?
As a side note, it’s reasonable to bring up the possibility that desktop search may not matter in years to come. As more and more web-based applications continue to evolve and supplicate traditional desktop apps, as the InternetOS continues to grow, desktop search, as we know it, may become a thing of the past. Of course, all the lessons already discussed still apply regardless of where our data and applications lie.
A Healthy Beginning
Spotlight, as powerful and useful as it is, is only the beginning. We’ve got a lot of fun ahead of us and hopefully the renewed competition between operating systems should make this quite interesting. Whoever should knock down even a small fraction of these features will certainly impress.

Permanent Link



Wouldn’t it be great to extend Spotlight to include music? Couldn’t Spotlight analyze the audio patterns of an MP3 and recognize the tempo and general tone of the music? Dunno how that would work, but it would be great to just type in “Music female vocal downtempo” and get served up a list that includes some Zero 7 or Olive…
I am concerned about the way that our important files have to remain on our computer to be indexed and searched by Spotlight. I appreciate that I can store images and other files on a network hard drive, but I would like the ability to search files and view Metadata and image thumbnails that are stored remotely on CDs and DVDs.
Applications like Cumulus and CDFinder have offered similar features for some time and CDFinder will soon be updated to include Spotlight, but I would like to see it in Tiger along with the ability to backup the index for those non-changing archives onto my iDisk.
I hear you SunSeeker. Spotlight is certainly a 1.0 iteration at this point and we have many improvements to look forward to in future revisions.
In the meantime there is a hack to index networked volumes. It’s not perfect, but it is something.
Create Spotlight indexes for networked volumes
Wonder if spotlight can be adapted to search all files, not just the ones readable through default apple-provided applications. E.g. it reads Mail files, but not the mail on say Thunderbird. Shucks. Any ideas?
Application have to make Spotlight aware of their data in order to be included. I haven’t a clue as to how complicated this is, but hopefully someone will have the time to tackle this soon.