What is Spotlight? mds? mdsimport? mdsworker?

Hard drive sizes are increasing every day…  Today you can purchase a 1 Terabyte hard drive for roughly $80-90 USD.  But ever since the 10 MB drive was released back in the early 80’s, people have asked where did I save that file?

Spotlight is Apple’s answer to that question…

What is Spotlight?

Users are used to manually searching for files, or using add-ons, including Windows Search or  Google Desktop Search. Since the release of Mac OS X Tiger in April 2005, Macintosh users have a new tool to simplify their search experience, Spotlight.

Spotlight marks a watershed in operating system history. For years people have been talking about making the file system as quick and easy to search as the web as well as using meta-data to make those searches more accurate. For years it’s been all talk. Other operating systems have long promised it, and only the BeOS Operating System succeeded historically, now Apple has succeeded in bringing this to the consumer.

Spotlight isn’t “bolted on” to the system. It’s a completely new search technology that is tightly integrated with a fundamental part of the OS: The file system. Every time a file is created, saved, moved, copied, or deleted, the file system automatically ensures that the file is properly indexed, cataloged, and ready for whatever search query might be issued—all in the background.

Spotlight is fast, intelligent, and thorough.  And has evolved becoming faster, and more responsive since it’s release in Tiger (10.4).  Mac OS X Leopard (10.5 and higher) has added more features , advancements, and search options to Spotlight, and virtually instant results when performing a search.

Spotlight is easily located in the upper right corner of the Mac OS X screen and is easily identified by the white and blue magnifying icon (10.4), or Black magnifying icon (10.5/10.6).  Alternatively, You can easily launch Spotlight by simultaneously pressing command-space (although this key can be remapped in 10.5/6).

Those new to Mac will experience a significant search difference under Mac OS X when compared to Windows. Whereas Windows has historically returned a list of files containing the search phrase (also called the string), Spotlight will return any and all things related to the search string. For example, if you searched for the string “vacation”, Spotlight would find all files on your Mac related to “vacation”. This could include items such as:

  • e-mails
  • documents
  • pictures
  • PDF’s
  • folders
  • settings
  • recent web pages

Spotlight works by organizing, reading, and indexing metadata. Metadata is simply “data about data” which is tied to the specific content stored on your system. For example, when you snap a picture with a digital camera additional data (metadata) is stored within the picture such as the time the photo was taken, the camera model, the camera serial number, and so on. Spotlight is able to read this “extra” data and can find it when a Mac user performs a search.

Not only is Spotlight available to end users, but the array of search technologies that make up Spotlight are also available to developers. This means that you’ll be able to tap into these powerful search technologies to find files to display, plugins to load, and data to mine in your applications. No restrictions. No limits.

The technologies that power Spotlight are:

  • A database consisting of a high-performance meta-data store and content index that is fully integrated into the file system.
  • Programmatic APIs that are part of the CoreServices and Cocoa frameworks that let you query the meta-data store and content index.
  • A set of importer plug-ins that are used to populate the meta-data store and content index with information about the files on the file system.
  • A plug-in API allowing you to provide meta-data and content to be indexed for your application’s custom file formats.

But more than a collection of individual technologies that work together, Spotlight gives you the ability to plug your application into the operating system and work with files in a totally new way. For example, if you were building an asset management application you could use Spotlight to find all of the files that match certain criteria rather than trying to slog through the file system yourself. Or, if your application specialized in supporting various kinds of workflows, you could use Spotlight to find all of the files that needed to be marked with a particular keyword. Once you get used to working with files in this new way, you’ll never want to go back.

The Restrictions on Spotlight

  • Spotlight does not index removable media, unless it appears as a harddrive to the OS.  So for example, a DVD or CD would not be indexable through Spotlight, but an USB thumb drive maybe indexable.
  • If Spotlight detects that the file system has been modified on a system running an earlier version of the Mac OS (eg. Pre-10.4), it will force a re-index of the volume.

Identifying the components of Spotlight

  • mds – mds is the metadata server.  It serves all clients of the metadata APIs, including Spotlight. There are no configurations to mds, and users should not run mds manually.
  • mdfind – finds files matching a given query.  The mdfind command consults the central metadata store and returns a list
    of files that match the given metadata query. The query can be a string or a query expression.
  • mdls – Lists the metadata attributes for the specified file.
  • mdutil – manages the metadata stores used by Spotlight.  (Typically used to manage the stores for mounted volumes)
  • mdimport – imports file hierarchies into the metadata datastore.  It is used to test spotlight plug-ins, list the installed plugs and schemas, and re-index files handled by a plug-in when a new plug-in is installed.
  • mdworker

How Does Spotlight work?

The spotlight server (mds or mdworker) is started by LaunchD when the system is booted, and is activated by client requests or changes to the filesystem.  When a filesystem change occurs, the mdimport service scans the file that has been changed, and updates the Spotlight database in the background.  Aside from basic information about each file like its name, size and timestamps, the mdimport daemon can also index the content of some files, when it has an Importer plug-in that tells it how the file content is formatted. Spotlight comes with importers for certain types of files, such as Microsoft Word, MP3, and PDF documents. Apple publishes APIs that allow developers to write Spotlight Importer plug-ins for their own file formats.

The creation of the initial database does require the entire system to be scanned, but once that is done, Spotlight only scans files that have been changed in a close approximation of real time.

Improvements in Spotlight

Mac OS 10.5 & 10.6 introduced new features to Spotlight.

  • Support for network based Spotlight searching
  • Expanded search support for AND, OR and NOT operators
  • Faster performance while constructing your Spotlight searches