Friday, November 16, 2007

Troubleshooting Agent Communications

The managed home agents communicate with the management console using secure JMX connections. Once started the agent will connect with the console, perform some registration tasks, and appear automatically in the management dashboard. This article provides inside to the communication process and steps to troubleshoot the communications.

Agent Communication
The management console is configured with a JMX port used to establish communication between the management agents (both the managed home agent and the embedded agent contained within the server products). This port is specified during the installation wizard and defaults to 14501.

When the agent is installed a configuration file containing the name of the management console and JMX port to use is configured. This is in the file install_location/config/agent.properties.

management.server.name=myserver.mydomain.com
management.server.port=14501

The name of the server may either be a short machine name, a fully qualified domain name, or an IP address depending on how the Java process running the management console was able to resolve the host name.

The agent will use this information to attempt to connect to the management console. If unsuccessful, for example if the console isn't running, the agent will continually re-attempt the connection. This activity is recorded in the agent's log files. Look in the log file e1agent_0.log (the zero will always be the most current log) located within the install_location/logs directory of the managed home. In the log file you will see something along the following, in this case the management server is on denlcmwn5.mlab.jdedwards.com and the JMX port is 18501:

Nov 7, 2007 10:35:37 AM com.jdedwards.mgmt.agent.E1Agent$ManagementServerDaemonThread run
FINER: Attempting to connect to service:jmx:jmxmp://denlcmwn5.mlab.jdedwards.com:18501

If the connection could not be established a corresponding error will appear shortly in the log file.

Tip #1
On the machine the agent was installed attempt to ping the management console using the server name configured in the agent.properties file. If the ping is not successful you may either change the agent.properties file (for example to not use a fully qualified name) or modify host files (e.g. /etc/hosts on unix) as necessary. Either way restart the agent after making any changes.

Tip #2
If the ping was successful you can attempt to telnet to the management console using the same port. For example 'telnet denlcmwn5.mlab.jdedwards.com 18501'. If the connection is successful you know the agent is able to establish a connection to the console. If this fails there may be firewall or other networking issues preventing the connection that need to be resolved. Note: On a successful connection you may ignore any content displayed in the connection; it will not be human readable.

Once that connection has been made the management console will assign a TCP/IP port that the agent should use to listen for incoming connections. The agent will pass in it's machine name and install location, and the console will provide the next unused TCP/IP port starting at the 'Management Agent Starting Port' which was also configured during the installation wizard and defaults to 14502.

In the managed home agents log you will see the port that was assigned, in this case it was 18607:

Nov 7, 2007 10:35:42 AM com.jdedwards.mgmt.agent.Server startListener
INFO: Starting the management agent listener on port '18607'.
Nov 7, 2007 10:35:42 AM com.jdedwards.mgmt.agent.Server startListener
FINE: Attempting to start the local management agent listener on port 18607
Nov 7, 2007 10:35:42 AM com.jdedwards.mgmt.agent.Server startListener
FINE: Succesfully started the management agent listener on port 18607

If the operation was successful, as shown above, you may continue to to the next step. If there are errors indicating the listener could not be started you should make sure no other program is using that same port (and if they are you may change the 'Management Agent Starting Port' to something else in the management console (Select the 'Management Agents' link in the Quick Links. Do not change the 'Management Server JMX Port' setting.

If everything has been successful so far we will now focus our attention on the logs for the management console. You may view these logs using the console application itself. Navigate to the managed home for the management console (the managed home that contains the 'home' instance). On the bottom of that page select the log file home_0.log. The log should contain an entry indicating the initial connection (thus a port of -1) from the managed agent:

FINER: Received heartbeat from the remote management agent on denlcmlx2 listening on port -1 of type 2 in managed home /home/oracleas/oasagent

Next you will see an entry about the calculated port discussed above.

FINER: Determining the port the remote agent 'denlcmlx2' should start listening on.
FINER: Assigning the port 18607 to the remote agent 'denlcmlx2'.

Followed by a "heartbeat" request from that agent:

FINER: Received heartbeat from the remote management agent on denlcmlx2 listening on port 18,607 of type 2 in managed home /home/oracleas/oasagent

Finally the console will attempt to connect to the remote agent on the port assigned. If successful you will see something like:

FINE: Attemping to establish a connection from the management console to the remote agent 'denlcmlx2' on port 18607.
FINE: Successfully established a connection from the management console to the remote agent 'denlcmlx2' on port 18607 with connection id 'jmxmp://10.139.163.53:3213 32330841.

This completes the communication negotiation process and the managed home will soon appear in the dashboard.

Tip #3
If there are errors indicating the connection was not successful follow similar steps as above to troubleshoot the issue:
  1. On the management console machine ping the server using the name reported in the log files (in this case denlcmlx2). If not successful ensure the network/dns configuration is correct. Add an entry to the hosts file (\windows\system32\drivers\etc\hosts) for the machine name if necessary.
  2. If the ping was successful telnet from the management console to the specified name and port, for example 'telnet denlcmlx2 18607'. If that connection was not successful ensure that there are no firewall or other networking issues preventing the connection.
Nearly all the problems related to the managed agents not appearing in the management dashboard are related to networking and host name resolution issues or firewalls that are in place between the server running the management console and the remote machine.

Tip #4
On the iSeries platforms if there are errors in the logs indicating crypto or encryption related problems (because the connection between agents is fully encrypted) this usually indicates the required JDK (1.5) is not present.