Thursday, July 31, 2008

TADDM logs and logging tips

A good TADDM technote has been published. It lists the various logs and their meanings.
The logs are located in $COLLATION_HOME\log. I've bolded the more useful ones.

control.log - contains messages from the start script

cdm.log - web portal logs are located here

discover.log - contains messages from the Discover jini service

discover-admin.log - contains messages from the DiscoverAdmin jini service

error.log - contains serious errors for any of the TADDM/CDT services

events-core.log - contains messages from the events core jini service

l2.log - contains messages for the layer2

local-anchor.<hostname>.log - logs messages from J2EE sensors

login.log - user login audit trail

proxy.log - contains messages from the Proxy jini service

tomcat.log - contains messages from the startup sequence

topology.log - contains messages from the topology jini service

services/ApiServer.log - XML, Java and EJB interfaces to CMDB are processed here

services/ChangeManager.log - ChangeManager works with StateManager to process change events after discovery completes

services/ClientProxy.log - start here for GUI issues. The GUI talks to client proxy exclusively

services/DiscoverManager.log - contains message for Sensor and Template Matcher messages (see comment further on)

services/DiscoverObserver.log - moves completed work items from DiscoverManager to TopologyManager

services/MonitorStateManager.log - processes discovery and change events

services/ProcessFlowManager.log - event processing engine for Discovery

services/ReportsServer.log - Handles some reports tasks

services/TopologyManager.log - interface between all other components and the datastore.

services/ViewManager.log - ViewManager builds the CI navigation trees

If the setting com.collation.discover.engine.SplitSensorLog is set to TRUE in the /etc/collation.properties then each sensor will have it's own log in the directory log/sensors/<runid>/sensorName-IP.log. this is VERY useful in debugging discovery errors. If the value is set to FALSE then the DiscoveryManager will contain all the messages running together. I always make sure the SplitSensor is set to TRUE.

Here are some more log file settings which are defined in the /etc/collation.properties file

  • com.collation.log.level - Logging level. Default is INFO, I set it to DEBUG whenever I need to see why a discovery failed. At DEBUG level you'll see the precise taddmtool command which was run and the server's response.
  • com.collation.deploy.dynamic.logging.enabled - If true, you don't have to restart TADDM when you change logging settings. I always make sure this is true, the (slight) performance drop you (might) get due to the constant rechecking is worth it when you want to debug a discovery.
  • com.collation.log.filesize - Controls the maximum size of each log file. Default is 20 Megabytes. When is limit is reached, the file is renamed and a new file is started.
  • com.collation.log.filecount - Maximum number of log files created (older ones are deleted). Default is 3.

-- Robert

Tuesday, July 29, 2008

Simpler than I thought

OK, Here's the situation:
I've got an ITM/TBSM environment where the tech guys look at ITM screens and the helpdesk and managers look at TBSM screens.

I've got a custom canvas or two for eye-candy on the TBSM. The problem is that when the helpdesk or managers looks at the messages coming over from ITM - they get a heart attack! A simple "no room on disk" turns into : NT_Disk_Full [(Free_Megabytes < 1000000) ON "Primary:AlphaServer:NT" ON D: (Free_Megabytes = 114)]

What I want to do is just change the main text which is sent from ITM to TBSM. Sounds simple?

Now, there are three basic ways in which you can find how to do something in Tivoli:

  1. You can read the basic documentation. However, some things are written in a comprehensive, rather than simple fashion:
    I defy anyone who is not an expert to make use of Chapter 6 of the Admin's guide : Customizing event integration with Tivoli Netcool/Omnibus!
  2. You can go online to a number of forii (plural of forum) and places and ask. However, you probably won't get a cook-book answer - unless you're pointed to somewhere else.
  3. The third option are the how-to guides which are periodically published. The most well known are the Redbooks - but I didn't find what I'm looking for there.
    I did find Coding an event mapping file for ITM TEC Event Forwarder - but it's got too much TEC specific hay which hides the needle I'm looking for.
    I also found Enriching IBM Tivoli Monitoring (ITM) Events For IBM Tivoli Business Service Manager (TBSM) in the TBSM Wiki. This is NEARLY what I'm looking for, except that it's got some BSM "hay" and modifies much too many files or databases to be "simple" enough for my needs.

I did all of this, came to the conclusion that it's complicated, and left it alone for a few months. Till last week when I HAD to do it :)

I did, however, get enough out of all sources (which are excellent sources - especially the last one, which talks about adding important information to the situation data en-route to TBSM) to do it myself and discover that the beast wasn't as bad as I had feared. I could even create this simple how-to which I will share with you.

The secret lies in what are called "mapping files". These mysterious configuration files lie in the directory C:\IBM\ITM\CMS\TECLIB (replace C:\IBM\ITM with whatever's relevant to you) and defines what changes, if any, ITM makes to the various parameters it passes to TEC or Omnibus through the EIF adapter.

Open one up and you'll see a thick wall of XML code.

The good thing is that you can ignore ALL the *.map files which already exist - they're used for integration with ITM5 and DM3.7 (as far as I can tell)

I created a new file called qnt.map 

The first lines I just copied/pasted from an existing map file and then then I changed the <id> from the original file
Then I added my first original material - the situation I'm interested in. The mapAllAttributes tell ITM to send all the attributes, including those I haven't changed.
The information between dollars comes from the attributes which are relevant to the situation. Edit your situation and click the "add conditions" buttons to see the what items you can use.

Note the use of $hostname$ for the server which triggered the situation.

<itmEventMapping:agent                                                         
    xmlns:itmEventMapping="
http://www.ibm.com/tivoli/itm/agentEventMapping"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.ibm.com/tivoli/itm/agentEventMapping agentEventMap.xsd">

    <id>99</id>                                                                
    <version>6.2.0</version>                                                   
    <event_mapping>              

         <situation name="NT_Logical_Disk_Space_Critical" mapAllAttributes="Y">
         <class name="NT"/>
             <slot slotName="msg">

                 <literalString value="There is only $NT_Logical_Disk.%_Free$ percent / $NT_Logical_Disk.Free_Megabytes$ Megabytes free on server $hostname$ $NT_Logical_Disk.Timestamp.TIMESTAMP$"/>
             </slot>
         </situation>

         <situation name="NT_Services_Automatic_Start" mapAllAttributes="Y">
         <class name="NT"/>
             <slot slotName="msg">
                 <literalString value="The Service $NT_Services.Service_Name$ is down on server $hostname$ $NT_Logical_Disk.Timestamp.TIMESTAMP$"/>
             </slot>
         </situation>

    </event_mapping>
</itmEventMapping:agent>

After you have your file, run the command tacmd -t all so that ITM will load the new mapping file.

THAT'S IT!

Triggering those situations will lead to Omnibus messages which look like this:

Normal Event Text

 

instead of the cryptic formula which we had before.

Just goes to show - sometimes things are simpler than I think!

-- Robert

Sunday, July 20, 2008

A Man on the Moon

I'll write a longer post tomorrow of the next day, but I've got to put something up now. 39 years ago, at 02:56 GMT 21st of July, 1969 a man, Neil Armstrong, stepped on the Moon.

"That's one small step for [a] man, one giant leap for mankind"

If thinking about that doesn't make you pause, then think about it again.

Read more about the mission here on Kennedy Space Center's site and Wikipedia.

During the last few days there have been some amazing results from some newer missions:
The EPOXI mission has video showing the Moon orbiting the Earth. I know that we all know this logically, but here you can actually SEE the moon moving!


For more details, see the Bad Astronomy Blog - which is actually very good.

 

Another space probe, called SELENE, has taken pictures of the moon which show debris from a later moon landing, Apollo 15.

Last link for now - VERY high resolution images of Tycho, one the largest craters on the Moon taken by SELENE,

 

Think about it again when you look up at the moon. 39 years ago there was someone there!

-- Robert

Friday, July 18, 2008

TBSM tips

Due to a few bureaucratic problems. I found myself installing TADDM and TBSM about 6 times over the last few days!

 

The first good news is that everything works.
The second good news is that I've got a list of problems and solution which I can share.

 

Most of these have solutions in the documentation, but it's not always clear.
All my installations were on Windows 2003, but I assume the problems are pretty platform agnostic.

 

  1. The Omnibus service stops after a while / Error 500 in the TBSM default screen.
    For some reason, on two of the servers, the Omnibus service kept dying. This is apparently a known bug in the version of Omnibus which is shipped with TBSM 4.1.1. You can solve this problem by upgrading the Omnibus to a more current version.
    SOLUTION : Add the parameter -regexplib TRE to the startup configuration of the Omnibus server (and upgrade when you get the chance)

  2. XMLToolkit service does not start. The Tivoli Discovery Library Reader service is the component of TBSM which loads DLA files or TADDM data into TBSM, This way you don't have to build the Lines Of Business yourself, rather you can load them from somewhere where they have already been defined.
    The service wouldn't start and kept giving the message "The Tivoli BSM CCMDB Discovery Library Reader service started on Local Computer and then stopped".
    SOLUTION : Copying the file mscvr71.dll from %SystemRoot%\System32 to the toolkit's \bin directory solved the problem.


  3. TBSM starts with incorrect/blank main display screen. The other symptom is seeing the following message in the browser's java console:
    Sorry, there has been a problem responding to your request. Database may be unavailable. Please report this to the server administrator: Error rendering Velocity template: layouts/ngf-default.vm: Unable to find resource 'layouts/ngf-default.vm' not found
    at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source)
    Now, if you're just reading this line, then you might figure out the solution for yourself. However, the problem is that the line is buried between many others.
    SOLUTION : Copy %NCHOME%\guifoundation\webapps\desktop\WEB-INF\system-templates\vm\layouts\html\ngf-default.vm to the directory %NCHOME%\guifoundation\webapps\dNesktop\WEB-INF\system-templates\vm\layouts


  4. The biggest problem I had was quite bizarre: The Service Component Repository was not initialized properly.
    The SCR is the repository where all the various objects (Servers, Databases, Services, etc) are kept. One of the automatic post-installation steps is the running of a script called tbsmdefaultimport. You should then see the following tab in the Service Administration screen: 
Good_SCR Unfortunately, what I saw was this : Bad_SCR

The garbled entries make it impossible to import data into TBSM, because the tree does not contain the proper entries.

SOLUTION: My solution was uninstalling/reinstalling. On the THIRD attempt, the TBSM installed correctly. Since then, Doug McClure has given me the following undocumented procedure to reset the SCR without needing to reinstall:

UNIX :
run "setdbschema -U <dbuserid> -f a". This will drop and recreate all of the tables associated with the SCR. After you have done this, you should invalidate the "Imported Business Service" and the "Component Registry". That should delete everything.

Windows:
run (using psql) the following files (found in .../XMLtoolkit/sql):
scc_staging_schema_deletes.sql
scc_schema_deletes.sql
scc_schema_setup.sql
scc_staging_schema_index_setup.sql
scc_staging_schema_setup.sql
Then you should invalidate the "Imported Business Service" and the "Component Registry"

I hope I've saved some of you some time...

-- Robert