Categorizing LDAP searches – inefficient vs. expensive?

I caught myself a couple of times mixing up the correct terms so I thought I’d wrap this up in a blog posting.

When looking at Directory Services searches, one of the goals is to find all results of the search as quickly and accurate as possible. When you query AD manually, you most of the time just want it to respond with the correct search results so that you can go on exporting the data, checking the results or doing something with the results. You don’t care if the search takes 13 milliseconds to complete or 5 seconds, it’s probably just a one-time action you perform to collect the data for some special task.

A different picture arises with directory enabled application. Those apps are supposed to read data off AD and write into it – sometimes heavily and multiple times a second. Just look at Exchange. Exchange is LDAP’s best friend. There surely are other applications in your environment that query Active Directory. As those apps query AD more often, it might be worth looking at how those applications form their searches and – how quick AD is to respond. It’s a different situation than searching manually. Searches that take AD to think 3 seconds that occur multiple times a minute are no good – you’d want to have that search be performed as fast as 13 milliseconds, right? The crappier the search is, the longer it takes AD to come up with good results – the more I/O it needs to perform to crawl the DIT database to find the objects searched and the more time is wasted that could be used to service other requests around. That’s an easy calculation.

Now, you want to find bad searches and bad search filters. Before you can find them, you need to know about three terms that are used when it comes to AD searching:

Searches are efficient, if AD needs to visit little objects in its DIT database to find all objects that were searched for.

A search is expensive, when AD visits a large amount of objects to find a lot of search results.

We say “inefficient search”, when we almost crawl through the whole DIT just to get a hand full of search results.

The following figure should help make the point clear:

Obviously, we don’t want expensive or inefficient searches. We want AD to only visit as few objects as possible while searching all objects, so we want efficient searches.

In order to get to know whether applications in your environment create inefficient or expensive queries, one needs to enable logging of those events. At HKLM\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics, there are AD-related options to turn on AD diagnosis. They understand values from 0 to 5, where 0 is no logging and 5 is excessive logging. All logs are written to the event log – so be wise when enabling logging. It’s recommended to use 5 only when you’re troubleshooting something. Those logs can seriously impact DC performance. The diagnosis that we want to turn on is “15 Field Engineering”. Setting it to 4, it create an event log entries with ID 1643 every time the Garbage Collection process runs. In that entry, there are statistics on how many queries the DC handled and a counter, how many of them were inefficient and expensive. You see, understanding what “expensive” and “inefficient” really means is important – there’s a difference.

If that information isn’t just enough for you, you can turn on level 5 logging of “15 Field Engineering”. That mode creates an event log entry every time a search was categorized as “expensive” or “inefficient”. Every time a search hits the DC, its evaluated. Now – how’s efficiency and expensiveness measured? What metrics are utilized? The standard definition is quite easy: A search is inefficient, if it hits more than 10000 objects to find results. It is expensive, if equal to or less than 10% of the visited objects make it to the result it (the hit rate is <=10%).

Does that sound reasonable? Heck, I know directories that have less than 10000 objects – will they ever see an inefficient search? Probably not. Luckily, there are ways to adjust AD’s understanding of those terms. What it takes is two registry keys in HKLM\SYSTEM\CurentControlSet\Services\NTDS\Parameters you need to create (if they’re not there already). Both are REG_DWORDs: “Expensive Search Results” and “Inefficient Search Results”. You can assign custom values here.

The juicy part is – what values would you assign there? That greatly depends on how your AD looks like and what you consider as “expensive” and “inefficient”. For the expensiveness part, it’s probably a good idea to look at the “production” data objects that you have in the domain partition. Only those queries make it to the event log, remember that.

If you’re interested in a dump of _all_ queries that hit AD, you’ll have to lower the two registry values to a small enough number, say, both “1″. That should make every AD search hit the event log. Note that this excessive logging might impace performance. Also note that this is a registry setting that impacts the local DC. You’d need to turn that on for other DCs as well.

More on that at a later time :-)

4 Comments so far

  1. Tomek on February 8th, 2010

    Hi Florian,

    In addition to this excellent information – to gather information about queries and its efficiency You can also use SPA for Windows 2003 or Performance and Reliability Monitor in Windows Server 2008 and higher.

  2. rich crandall on February 14th, 2010

    terminology abuse is a pet peeve of mine so i am a sucker for clarifying posts. great work Florian!

  3. florian on February 14th, 2010

    Thanks, Rich.

    Actually, I caught myself too many times mixing up the terms – and saying one but meaning the other. Those terms won’t be used in day-to-day AD life and most people won’t need to use them correctly (at all). I thought it’s good to have it at least written down once. Finally a place I could look at that up too, if I forgot about it :)

  4. Jay on October 14th, 2010

    Bit late on this, but looks like the registry values you site are incorrect.
    “Expensive Search Results” should be “Expensive Search Results Threshold” and “Inefficient Search Results” should be “Inefficient Search Results Threshold”.