FIM 2010 R2: SQL timeout on using large sets in other sets


Last week a strange error appears in my environment, I was adding the All expected rule resources set to another set to give read permissions to operational admins, so that they can see the EREs on the provisioning tab of user resources. I build a set for these operational admins containing all resources that they should have read access to, in that set I include serveral objecttypes like teams and groups and also the above sets which has large amount of members.

Here is the set I’m trying to build:

EditSetError2

I do this directly in production, and all went fine, however to don’t break my deployment through the 3 stages (dev, test, prod) I also try to make this change in my other 2 environments, but then I their I got an Postprocessing Error.

I have to say that all 3 environments running on identical physical machines, with nearly the same configuration. (Windows Server 2008 R2, SQL2008R2 and FIM 2010 R2 SP1).

Here is the error I got in portal:

EditSetError1

Error processing your request: The server was unwilling to perform the requested operation.
Reason: Unspecified.
Attributes:
Correlation Id: 1292580b-150f-4921-9beb-c8761476787e
Request Id:
Details: Request could not be dispatched.

I figured out that there are also 2 errors in FIM Eventlog:

.Net SqlClient Data Provider: System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader()
   at Microsoft.ResourceManagement.Data.DataAccess.DoRequestCreation(RequestType request, Guid cause, Guid requestMarker, Boolean doEvaluation, Int16 serviceId, Int16 servicePartitionId)

and this:

Requestor: urn:uuid:7fb2b853-24f0-4498-9534-4e10589723c4
Correlation Identifier: 1292580b-150f-4921-9beb-c8761476787e
Microsoft.ResourceManagement.WebServices.Exceptions.UnwillingToPerformException: Other ---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at Microsoft.ResourceManagement.WebServices.RequestDispatcher.CreateRequest(UniqueIdentifier requestor, UniqueIdentifier targetIdentifier, OperationType operation, String businessJustification, List`1 requestParameters, CultureInfo locale, Boolean isChildRequest, Guid cause, Boolean doEvaluation, Nullable`1 serviceId, Nullable`1 servicePartitionId, UniqueId messageIdentifier, UniqueIdentifier requestContextIdentifier, Boolean maintenanceMode)
at Microsoft.ResourceManagement.WebServices.ResourceManagementService.Put(Message request)
--- End of inner exception stack trace ---

I found out that the request is taking more time than the SQL timeout configured in FIM Service, which is 58 seconds by default. But why only in dev and test and not in production.

Sadly I have an answer on this, since I don’t do further debugging with SQL profiler, as the following changes to the FIM Service configuration file resolves the issue.

To extend the timeout modify the FIM service configuration file Microsoft.ResourceManagement.Service.exe.config by adding the both timeout parameter:

<resourceManagementService externalHostName="myfimservice" dataReadTimeoutInSeconds="1200" dataWriteTimeoutInSeconds="1200"/>

These parameters are also documented within the config file.

I’ve tried the value 120 and 300 first but the request seems to take longer, after that I decided to use value 1200. After the request completed I could see the request takes more than 6 minutes to complete. In production it could complete within the 58 seconds timeout. Very strange behavior.

However after extending the timeout and restart FIM Service the portal will still present you a timeout but the request will complete in background, you can check this in request history. There is also a possibility to extend this timeout in portal by changing the web.config, but for me that was not necessary.

Beside getting rid of that error there are still 2 question i could not answer myself:

1. Why did that affect dev and test but not production (having in mind the environments are identical) ?

2. What takes so long to add a set to another set, even it has a large amount of members ?

If you could answer one or both of them so please leave me a comment or mail.

 

Advertisements

About Peter Stapf
Senior Consultant Identity and Access MVP (Enterprise Mobility)

6 Responses to FIM 2010 R2: SQL timeout on using large sets in other sets

  1. Peter – I have had pretty much the same experience as yourself in most of my FIM sites at one time or another, and I have come to the conclusion that it’s the health of the FIMService database (in particular the indexes) that are the root of the issue. I have usually found that by (a) scheduling a rebuild of the full text catalog (see my wiki article https://social.technet.microsoft.com/wiki/contents/articles/5701.how-to-automate-the-rebuilding-of-fim-full-text-indexes.aspx) and (b) by selectively rebuilding table indexes where fragmentation has reached a “significant” level (this is very subjective – depends on lots of things) I have been able to avoid changing the default timeout periods and successfully save my more complex set definitions.

    My point is that (even in Production where specs are invariably much better than your lab) without proper SQL maintenance you will see your Production FIM environment degrade over time, invariably leading to the same problems you are having in DEV/TEST.

    I am planning to blog myself about my own experiences with the index rebuilding, which I find myself doing an awful lot lately. Basically whenever you have guid table keys that are not sequential (like FIM does everywhere) then you will always end up with index fragmentation, and this needs to be addressed regularly.

  2. I should add that I would expect your DEV and TEST environments to be suffering from defragmentation way more than PROD because PROD will invariably not have had the same number of data loads/deletes/reloads that you would have done in DEV and TEST.

  3. Peter Stapf says:

    Hello Bob,
    thanks for that hint, will have a look at your article and implement the index rebuild to my environment. After my article on sql indexes i haven’t had the time to implement this but with that issue think it is a good idea to to this in all my environments of my customers.

  4. Peter Stapf says:

    So I tested rebuild the FIMService.ftCatalog but as i assumed that as no effect, but after that I checked the fragmentation of all indexes of the tables in FIMService DB, and do a rebuild on them in my Dev environment. That speeds up adding the above set to my set, from 6min to 10-15 seconds: Wow.
    However in my QA environment this does not help, still get timeout.Have to further investigate that.

  5. Pingback: Maintenance of Indexes and Fulltext-Catalog of FIM SQL databases | JustIDM

  6. Sometimes even rebuilding the indexes and fine tuning everything under the sun won’t make up for a poorly resourced lab VM – I was forced in the end to simplify my set definition (which worked fine in UAT and PROD but not on my poor DEV VM). Followed the hints from Tim MacCauley here: http://social.technet.microsoft.com/wiki/contents/articles/16057.fim-service-troubleshooting-set-creation-fails-when-using-negative-conditions.aspx but still consider this to be a fairly vague guideline. In my case I had 3 positive and 3 negative, and adding a 4th positive killed it. Timeouts made absolutely no difference in this case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s