Friday, October 9, 2009

ActiveMQ: Ghost consumers just in time for Halloween

I have been noticing some issues with the communication between ActiveMQ and Coldfusion the higher our load gets. We've got a large process that can cause a java heap space error in the environment. If that happens when there is a coldfusion gateway consuming messages, it can cause the current consumer to go into ghost mode and a new consumer is created. If this is allowed to persist, the communication between ActiveMQ and Coldfusion sometimes comes to a screeching halt. The only way we have found to remedy this battle of digital wills is to restart the Coldfusion service. This seems to properly reset the connections and things return to normal.

So, what is causing this behavior? It's a question I wrestled with for a few weeks. I could not replicate this in my local environment at all. It helped to get jconsole fired up so that I could see a little bit of what was going on behind the scenes beyond what the ActiveMQ web console shows. One thing I noticed with the actual filling of the queue is that if the MemoryLimit is not set high enough, the MemoryPercentUsage will be maxed at 100% and you can only put so many messages in the queue. A new message can only get added if one in the queue got consumed. This can be a really long Coldfusion task if you are max'd at 4000 messages and you have to fill it with 500,000. With jconsole you can edit some of the limits on the fly for the queue.

The queue size at the time was 5m. I started bumping it up to see what effect it would have. I went to 10m and the queue doubled in size, but still wasn't where I wanted it. I bumped it to 20m and wooosh, the queue filled with all 500,000 messages. There seems to be some threshold that needs to be hit in order for it to hold that many messages. With 500,000 messages in the queue, the MemoryPercentUsage is only at 43% or so. I'm sure an ActiveMQ guru could explain what is going on there. If you know, please leave a comment. Now rather than the Coldfusion process taking 12 hrs to fill the queue, it fills in 10 minutes.

So, let's make this MemoryLimit change permanent. Open up your activemx.xml in your /conf directory and adjust as needed:

<policyEntry queue=">" memoryLimit="20mb"/>

I think the default memory limit for ActiveMQ off the shelf is 5m. If you are doing a higher quantity of messages over 5000, it seems that you will have to bump this number up.

Now the cool thing about this is that our hanging consumer issue has disappeared. This was an unexpected benefit. I decided to test it on my local environment by dropping the MemoryLimit down to 48k. This allowed a max of 42 messages in the queue. I then wrote the following code to create a java heap space error:

<cffunction name="javaheapspace" output="false" access="public" returntype="void">
<cfset var myxml = ''/>
<cfset var parsedxml = ''/>
<cfset var i = 0 />
<cfset var j = 0 />

<cfsavecontent variable="myxml">
<cfloop from="1" to ="299999" index="i">
<textinfo>with text in here</textinfo>
<cfset parsedxml = xmlparse(myxml) />
The loop count may have to be moved up or down depending on the amount of memory you have allocated to your JVM. I think mine is up around 512m. I wanted it just small enough to be able to finish the myxml variable creation and just big enough to die in the parse.

So in order to replicate this issue, I put in motion the process that would begin filling the queue and make sure the consumption begins. With jconsole open, I watched the MemoryPercentageUsage and waited until it hit 100%. Then I fired off the javaheapspace() call in another script. Once it bombed, I refreshed my jconsole view and, sure enough, the consumer count went from 1 to 2.

It doesn't work every time so you may have to try a few times. The more time it sits at 100% usage, the better your chances. I went into CFAdmin and shut off the gateway that was my consumer and the count went down to one. So it left a connection that cannot be managed. In fact if you try to kill that connection through jconsole, it will create another one it its place. The only way to kill it that I have found, is to restart the CF instance.

So the moral of the story is to keep your queue's MemoryLimit high enough so that your MemoryPercentUsage does not reach 100%. Otherwise you may be haunted by the "ghost of consumer past"


No comments: