Friday, October 9, 2009

ActiveMQ: Ghost consumers just in time for Halloween

I have been noticing some issues with the communication between ActiveMQ and Coldfusion the higher our load gets. We've got a large process that can cause a java heap space error in the environment. If that happens when there is a coldfusion gateway consuming messages, it can cause the current consumer to go into ghost mode and a new consumer is created. If this is allowed to persist, the communication between ActiveMQ and Coldfusion sometimes comes to a screeching halt. The only way we have found to remedy this battle of digital wills is to restart the Coldfusion service. This seems to properly reset the connections and things return to normal.

So, what is causing this behavior? It's a question I wrestled with for a few weeks. I could not replicate this in my local environment at all. It helped to get jconsole fired up so that I could see a little bit of what was going on behind the scenes beyond what the ActiveMQ web console shows. One thing I noticed with the actual filling of the queue is that if the MemoryLimit is not set high enough, the MemoryPercentUsage will be maxed at 100% and you can only put so many messages in the queue. A new message can only get added if one in the queue got consumed. This can be a really long Coldfusion task if you are max'd at 4000 messages and you have to fill it with 500,000. With jconsole you can edit some of the limits on the fly for the queue.



The queue size at the time was 5m. I started bumping it up to see what effect it would have. I went to 10m and the queue doubled in size, but still wasn't where I wanted it. I bumped it to 20m and wooosh, the queue filled with all 500,000 messages. There seems to be some threshold that needs to be hit in order for it to hold that many messages. With 500,000 messages in the queue, the MemoryPercentUsage is only at 43% or so. I'm sure an ActiveMQ guru could explain what is going on there. If you know, please leave a comment. Now rather than the Coldfusion process taking 12 hrs to fill the queue, it fills in 10 minutes.

So, let's make this MemoryLimit change permanent. Open up your activemx.xml in your /conf directory and adjust as needed:

<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" memoryLimit="20mb"/>

I think the default memory limit for ActiveMQ off the shelf is 5m. If you are doing a higher quantity of messages over 5000, it seems that you will have to bump this number up.

Now the cool thing about this is that our hanging consumer issue has disappeared. This was an unexpected benefit. I decided to test it on my local environment by dropping the MemoryLimit down to 48k. This allowed a max of 42 messages in the queue. I then wrote the following code to create a java heap space error:

<cffunction name="javaheapspace" output="false" access="public" returntype="void">
<cfset var myxml = ''/>
<cfset var parsedxml = ''/>
<cfset var i = 0 />
<cfset var j = 0 />

<cfsavecontent variable="myxml">
<rootnode>
<cfloop from="1" to ="299999" index="i">
<subnode>
<textinfo>with text in here</textinfo>
<mynumber>#i#</mynumber>
</subnode>
</cfloop>
</rootnode>
</cfsavecontent>
<cfset parsedxml = xmlparse(myxml) />
</cffunction>
The loop count may have to be moved up or down depending on the amount of memory you have allocated to your JVM. I think mine is up around 512m. I wanted it just small enough to be able to finish the myxml variable creation and just big enough to die in the parse.

So in order to replicate this issue, I put in motion the process that would begin filling the queue and make sure the consumption begins. With jconsole open, I watched the MemoryPercentageUsage and waited until it hit 100%. Then I fired off the javaheapspace() call in another script. Once it bombed, I refreshed my jconsole view and, sure enough, the consumer count went from 1 to 2.



It doesn't work every time so you may have to try a few times. The more time it sits at 100% usage, the better your chances. I went into CFAdmin and shut off the gateway that was my consumer and the count went down to one. So it left a connection that cannot be managed. In fact if you try to kill that connection through jconsole, it will create another one it its place. The only way to kill it that I have found, is to restart the CF instance.

So the moral of the story is to keep your queue's MemoryLimit high enough so that your MemoryPercentUsage does not reach 100%. Otherwise you may be haunted by the "ghost of consumer past"

Blessings,
Terry

Monday, October 5, 2009

JConsole has consoled me

A long and hard battle was fought tonight and finally victory is mine! I set out to get a little further down the road of my understanding of ActiveMQ. I have read all over the place that you can monitor AMQ with jconsole. It should be so easy right?
Type jconsole at the command prompt and slap service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi in the remote process box and whamo.

Well "no soup for me". In the words of the Pirates who don't do anything, "nothing, zilch, nodda". At first I was getting a connection timeout and would get absolutely nowhere. After a flurry of changes, it would actually connect but I didn't see the org.apache.activeMQ in the list on the left.

I was looking through my activemq.xml in my conf directory and noticed this:
<managementcontext>
<managementcontext createconnector="false">
</managementcontext>
Ok... lets turn that puppy on. Now I start seeing this in the startup output of ActiveMQ:
WARNING: Failed to start jmx connector:
Cannot bind to URL[rmi://localhost:1099/jmxrmi]
:javax.naming.NameAlreadyBoundException:
jmxrmi [Rootexception is
java.rmi.AlreadyBoundException: jmxrmi]
So... at least I saw something remotely familiar to me that is supposed to go in the Remote Process box. But it seems that something else is on that port. I started thinking about what else on my system might be using jmx. I am running Apache Solr. Let's shut that down and see if it starts up. Nope. Nodda. Oh wait, how about Jrun. There was coldfusion and jrun stuff in the left pane of jconsole. After I shut down jrun and restarted ActiveMQ, finally it magically appeared. A thing of beauty.
INFO  ManagementContext - JMX consoles can connect
to service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
Now we still have an issue. Obviously if I'm going to be using coldfusion and ActiveMQ together, they need to learn to get along. All we have to do is add an attribute to the managementContext node in conf/activemq.xml:
<managementcontext>
<managementcontext createconnector="true" connectorport="1199">
</managementcontext>

Now restart ActiveMQ and start coldfusion back up and you should be able to view ActiveMQ behind the scenes through jconsole. Now were cookin with gas.

Ultimately I'd like to be able to connect to JMX via java in CF to get things like queue count and connection status. We've been seeing some strangeness when CF dies due to java heap space errors that CF and AMQ will just stop talking to each other after the instance comes back up. If I cycle the CF instance, then consumer counts go back to normal and they start talking again. Communication is key to any good relationship, especially when clients are riding on it.

Speaking of communication, how has your's been with Jesus? Your eternity is riding on it.

Blessings,
Terry