Monday, January 19, 2009

What is TADDM stuck doing?

Yesterday I ran a discovery in an environment and after the sensors finished scanning the servers, the topology builder started it's work.

2%..4%..6%..9%..11%...75%.

It breezed through the first percentiles quickly and smoothly, but stuck on 75% for ages. Eventually we left and came back the next day.
The good news is that it completed successfully.

I don't like not knowing what and why computers are doing something. Why did it get stuck on 75% but nowhere else?
The percentiles that the topology builder show us are now percentiles such as displayed when copying a file. I.e. 9% doesn't mean that 9% of the work has been done, it means that 4 "steps" have been completed. So TADDM was stuck on the step represented by 75%.

per this Note, the various steps are represented in the file $COLLATION_HOME/dist/etc/TopologyBuilderConfigurationDefault.xml and there are nearly 40 steps.

Each step represents the topology builder generating the dependencies and links for a sensor. Part of creating a brand-new sensor is updating this file, so that after the discovery is done the topology builder will fold the items found by the new sensor into the list of links it holds.

Here are the sensors and the percentile they are represented by:

2 DatabaseServerCleanupAgent
4 ComputerSystemConsolidationAgent
6 ComputerSystemTypeAgent
9 RuntimeGcAgent
11 DeletedObjectGcAgent
13 SystempConsolidationAgent
16 VmwareVirtualCSConsolidationAgent
18 J2EEServerDeploymentAgent
20 AppServerClusterAgent
23 JDBCDependencyAgent
25 JBossClusterAgent
27 WeblogicClusterAgent
30 OracleAppClusterAgent
32 OracleDependencyAgent
34 WebConnectionDependencyAgent
37 WebSphereConnectionDependencyAgent
39 SAPDependencyAgent
41 SoftwareHostReferenceAgent
44 DNSServiceAgent
46 LDAPServiceAgent
48 DNSDependencyAgent
51 NFSDependencyAgent
53 HostDependencyAgent
55 ConnectionDependencyAgent
58 GenericAppAgent
60 AppDefinitionAgent
62 AppDescriptorAgent
65 ObjectDisplayNameAgent
67 DiscoveryLogCleanupAgent
69 DominoConnectionAgent
72 DominoClusterDomainAgent
74 PortableAgentWrapper
76 l2.L2Agent
79 l2.CDPAgent
81 CompositeCreationAgent
83 MQServerAgent
86 CitrixAgent
88 ExchangeDependencyAgent
90 VCSDependencyAgent

Seeing as I was stuck on 75% I see that the L2 (communications layer 2) sensor was very busy. In retrospect this makes prefect sense, since this discovery was the first time I found a new type of switch in the environment. It makes sense that the builder would take much longer for this step than when I'm only discovering servers.

What happens after step 90%?

I don't know. I'll try to find out and let you know!
If anyone does know, please drop a line

-- 2011 Update

In TADDM 7.2.x, the log file dist/log/TADDM.log details each step, so if your discovery is stuck on step x, just check the last line of the log and you'll see what class the Topology Builder is working on.

-- Robert

No comments: