Friday, January 27, 2012

ccmexec crashing on windows 2008 r2

UPDATE: looks like MS caved and released a hotfix.  I have not tried it yet, please let me know if you do and if it solves this problem: http://support.microsoft.com/kb/2724939/en-us

Hey all,
I have been tracking an issue for some time.  I have no fix but some things are becoming clear and I just found a good way to scope the problem, hence this post…

I am running Systems Center Configuration Manager 2007 SP2 R2 and I have been getting this error on some seemingly random set of my windows 2008 r2 servers:
Log Name:      Application
Source:        Application Error
Date:          1/27/2012 5:52:09 AM
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      xxxxxxxxxxxxxxxxx
Description:
Faulting application name: CcmExec.exe, version: 4.0.6487.2000, time stamp: 0x4ab33e4d
Faulting module name: ntdll.dll, version: 6.1.7601.17514, time stamp: 0x4ce7ba58
Exception code: 0xc0000005
Fault offset: 0x0009ce04
Faulting process id: 0x1d58
Faulting application start time: 0x01ccdcd93adaa8bf
Faulting application path: C:\Windows\SysWOW64\CCM\CcmExec.exe
Faulting module path: C:\Windows\SysWOW64\ntdll.dll
Report Id: f570d39f-48d4-11e1-b96c-6431504ea6e0

Notes from this page and especially the german write up here (translation here) were very helpful.   To briefly summarize, when you run resource explorer, you create an ETS performance trace called:  WDC.BE95A9B1-DE15-4B78-B923-A12AB70BE951.  You can verify this by running ‘logman query –ets’ or in server manager -> Diagnostics -> Performance -> Data Collector Sets -> Event Trace Sessions.

Apparently the sccm client dislikes this trace and will crash, in my case every 65 minutes.  Stopping the trace reportedly fixes it, I am still testing before I fully believe that.  I have not found it necessary to reboot/reinstall etc as are suggested elsewhere.  Stopping the trace will kill your Resource Monitor.

Next step would be looking into the trace to see if it is possible to adjust the trace.  Word is MS is refusing to fix it.

The only thing I have to add is that I am seeing a status message of 669 (2 in fact) each time this happens.  So in order to determine which machines this is happening on, run the report Status Messages -> All messages for a specific message ID and query for message ID 669.  Exported to excel (so I could mask server/site names more easily), it looks like this:
Record ID
Severity
Message ID
Component
Computer Name
Time
Site Code
1012608
Error
669
Advanced Client
SERVER--SCCM
1/27/2012 6:15:46 AM
SC2
1012605
Error
669
Advanced Client
SERVER--WEB3
1/27/2012 6:08:23 AM
SC1
1012607
Error
669
Advanced Client
SERVER--WEB3
1/27/2012 6:08:23 AM
SC1
1012602
Error
669
Advanced Client
SERVER--WEB2
1/27/2012 5:51:38 AM
SC1
1012604
Error
669
Advanced Client
SERVER--WEB2
1/27/2012 5:51:38 AM
SC1
1012601
Error
669
Advanced Client
SERVER--SQLA2
1/27/2012 5:50:07 AM
SC1
1012603
Error
669
Advanced Client
SERVER--SQLA2
1/27/2012 5:50:07 AM
SC1
1012599
Error
669
Advanced Client
SERVER--WEB1
1/27/2012 5:44:05 AM
SC2
1012600
Error
669
Advanced Client
SERVER--WEB1
1/27/2012 5:44:05 AM
SC2
1012596
Error
669
Advanced Client
SERVER--SQLA1
1/27/2012 5:32:24 AM
SC1
1012598
Error
669
Advanced Client
SERVER--SQLA1
1/27/2012 5:32:24 AM
SC1
1012591
Error
669
Advanced Client
SERVER--SQLA2
1/27/2012 5:24:14 AM
SC2
1012593
Error
669
Advanced Client
SERVER--SQLA2
1/27/2012 5:24:14 AM
SC2
1012590
Error
669
Advanced Client
SERVER--SQLB1
1/27/2012 5:21:40 AM
SC1
1012592
Error
669
Advanced Client
SERVER--SQLB1
1/27/2012 5:21:40 AM
SC1
1012584
Error
669
Advanced Client
SERVER--SCCM
1/27/2012 5:10:17 AM
SC2
1012585
Error
669
Advanced Client
SERVER--SCCM
1/27/2012 5:10:17 AM
SC2
1012577
Error
669
Advanced Client
SERVER--WEB3
1/27/2012 5:02:27 AM
SC1
1012583
Error
669
Advanced Client
SERVER--WEB3
1/27/2012 5:02:27 AM
SC1
1012574
Error
669
Advanced Client
SERVER--WEB2
1/27/2012 4:45:50 AM
SC1

According to the ever handy SystemCenterCentral:
Message ID 669: This message is caused when SCCM component raised an exception but failed to handle it. We should further investigate this problem.
So this could easily be from something else but this crash seems to be the primary (only) error I am seeing in my sites that is sending this message.

Please respond if you have any more details to add…

Update: Feb 1, 2012.  It looks like we are not getting any fixes to this in sccm 2007.  It is fixed in 2012 and, despite the product not being released, MS has decided this is good enough.  The options are to not run resource manager, accept the crashes, or (if you don't care about asset intelligence and CAL reports), to rename %WINDIR%\syswow64\ccm\ccm_caltrack.dll to ccm_caltrack.ThisFileSux.  I have done this to a few machines and it seems to solve the problem.


analytics