Unfortunately, we continue to have this problem, and the insane thing is that is happens at a very predictable time almost every night, but I cannot trace the source. The hang lasts for approximately 6-8 minutes.
I have turned Max Pooled Statements off by setting to ZERO in case that was the problem, but we still encounter the locked threads and thus timed out queries across the board, even for web sites connected to totally separate datasources running under different username accounts (but overall same database server). We are on Oracle. Failed Request Tracking shows a couple of odd yet perhaps telling things way down the list of loaded modules and responses:
One of the responses says “GENERAL_FLUSH_RESPONSE_END” messages: ErrorCode="An operation was attempted on a nonexistent network connection. (0x800704cd)" or "The specified network name is no longer available."
All of these clues are making me think ColdFusion (or the driver) is losing communication temporarily. Why else would ALL sites fail at the same time, and why would my dev and staging servers also be affected? ColdFusion still works, any ColdFusion page with no database connection comes right up. Traffic spikes have been ruled out, as I have stress tested the crap out of our site and can never reproduce the hang-up. I also looked anywhere we had CFLOCK but that did not seem to be an issue. There are NO scheduled tasks running. Internal scans only run at the timeframe in question on Sunday nights, so does not explain why I see problems almost every night. On the nights there is no problem I can't explain.
I will try to enable logging database calls in CFAdmin next time to see what it says there, but that log becomes HUGE in a very short period of time.
Any other ideas are welcome. I am trying to eliminate any possibility, no matter how small.