Quantcast
Channel: Adobe Community: Message List - ColdFusion
Viewing all articles
Browse latest Browse all 21760

Re: cfmail glitches in ColdFusion 9 (not sending, but says sent)

$
0
0

Hi all,

My post here is about understanding the cfmail 'vanished email' bug (as described by booster94 and webfoot1 on page one of this thread), and finding a solution(s) that works around it.

 

I decided to tackle this problem head on, because this bug has been hurting our business, and our customer's businesses. I have devoted a full work week on it, and another several hours documenting my findings, presented here. This hopefully will be a WAKE UP CALL to Adobe that your customers consider this issue dead serious. It IS impacting our businesses!!

 

My submission is divided into sections, as follows:

================================================

> Background (describes my environment)

 

> How I arrived at my own conclusions that this problem is for REAL

 

> THE PROBLEM (a review in detail of exactly what the problem is. This includes valid items that have already been posted on this thread that need to be considered in my replication section, and also includes my own observations.)

 

> Breadcrumbs (ideas that have been posted on this thread that I rejected as causes and/or solutions)

 

> REPLICATION (a VERY detailed review of my testing procedures which lead to replicating the bug)

 

> SUMMARY of the BUG ( !! Attention Adobe !! )

 

> WORKAROUNDS for dealing with this bug :The GOOD NEWS: yes, you CAN workaround this bug and prevent mail loss

 

> My coding used in the Replication and the Workaround sections

 

================================================

BACKGROUND:

(describes my environment)

------------------------------------------------

I am cloud based with Amazon. I'm using ColdFusion Enterprise 9.0.1 Cumulative Hotfix 4, W2K3 R2, 64bit, 4 virtual cores running the equivelant horsepower of {13 * 1.2ghz 2007 xeon CPUs}, 36gig RAM. Amazon rates the I/O Performance of this server instance as "High". My CF Java heap size is set to 9 gig. These two java arguments (relevent to a later discussion in this post) are used: [-XX:MaxPermSize=620m -XX:+UseParallelGC]. {NOTE: don't raise your MaxPermSize arbitraily! It should be a percentage of the heap size and you must have enough RAM before you start mucking around with that setting. Also, those numbers are only appropriate for a 64bit install! Also note that the UseParallelGC is the default setting on a fresh install (ie: Adobe's suggestion, not mine).} Perfmon and the built in CF Server Monitor tell me that I am never stressing my CPU and only using on average about 1/2 to 2/3 of the RAM devoted to CF. My mailserver is Surgemail. It is on another server instance in the same 'Amazon region', and so my CF server communicates with it using region specific private IP addresses (so data transfer is a bit faster than across the internet). My Surgemail uses it's default setting of "maintain connections for 9 minutes".

 

{In addition, after I was all done testing from the server listed above, I did a final test on a much less robust CF server (8gig RAM with only 2gig devoted to the Java heap size. 2 virtual cores. Amazon rates the I/O Performance of this server instance as "Medium". All other specs listed above were the same on my second test server). The "SUMMARY of the BUG" section did not change at all using this second server to test with}.

 

{In additiom my final test was to setup a local IIS/SMTP relay server (v.6) on the same server that CF was running. It uses it's default setting of "maintain connection for 10 minutes". The results are covered in the REPLICATION and SUMMARY sections.}

 

 

================================================

How I arrived at my own conclusions that this problem is for REAL:

-------------------------------------------------------

We had been getting sporadic complaints about undelivered mail, but initially I brushed it off as unlikely to be an internal problem: after all, the CF mailsent.log said the mail had gone out, right? Then, our own internal accounting department sent me a spreadsheet detailing several "provable" instances of lost email. We had earlier added a cc line on all internal accounting related emails sent to clients. This cc "files" the emails so the accounting department can audit it. But occasionally some were missing. Communication with clients in these cases confirmed that they had not received their copy either. Then I spent laborious amounts of time auditing CF and mailserver logs, and came to the conclusion that indeed, mail was being "vanished" on the CF side after being logged as successfully delivered in the CF mailsent.log. So, I googled, and found this forum thread.

 

 

================================================

THE PROBLEM:

(valid points from this forum thread summarized, and my observations)

----------------------------------------------------------

Email "vanishes" when using cfmail. CF mailsent.log says mail is delivered, but occasionally (1-50%) the logs on the mailserver indicate no inbound connection attempt at all. For lack of a better term, the email simply "vanishes without a trace".

  .....

 

There have been at least three bug reports about this issue submitted to Adobe. Listed in order of the most votes:

https://bugbase.adobe.com/index.cfm?event=bug&id=3312296 (I am following this one, since it has the most feedback).

https://bugbase.adobe.com/index.cfm?event=bug&id=3313431

https://bugbase.adobe.com/index.cfm?event=bug&id=3376568.

All are still open. They describes the problem well. Adobe has not responded to them.

  .....

 

More errata (my own observations and other relevant observations culled from this thread):

 

> Coldfusion's mailsent.log show a successful handoff to a mailserver. It could be a remote mailserver, or a local mailserver: example:

   "Information","scheduler-3","04/09/13","12:57:13",,"Mail: 'Confirming Your Monthly Payment for Services' From:'admin@mydomain.com' To:'customer@hisdomain.com' was successfully sent using [mailserver-address-shows-here]"

   

> The mailsent.log file can show three different types of threads used to send mail (discussion of this later):

   where X is a number: "Information","scheduler-X", (snip) or "Information","mailworker-X",(snip) or "Information","jrpp-X",(snip)

 

(**** Contrary to speculation on this thread, my testing shows conclusively that mail can "vanish" using any of those three thread types (discussed later in this document, with my theory why (in production), you may only rarely see vanished email on the mailworker type thread))

 

> Normally, the mailserver logs then shows the connection (same time stamp as the CF mailsent.log time stamp, assuming the clocks are synced properly), and then accepts and processes the mail.

 

> However, occasionally, the mailserver logs shows nothing. No connection attempt, no failure of any kind. The mail simply "vanishes".

 

> If the same mail is resent, it may fail again if conditions are right.

 

> A prior post on this thread (Robert_Com99) indicates that for cfloops of 1-19 iterations, that the scheduler thread is used, and over 19 the mailworker thread is used. This is generally true, but with a caveat (discussed later).

 

> A prior post on this thread (Robert_Com99) indicates that if you tell cfmail to NOT send to the spooler (by using spoolenable="no" or by  turning it off in the CF GUI and not using the spoolenable attribute at all), then a different type of thread type is logged in the mailsent.log (jrpp-X). This is true. (Also worth noting, the spoolenable attribute will over-ride the CF GUI spool on/off setting).

 

> Several different mail software has been mentioned in this thread. None of them show any log entry when the 'vanished email' bug is encountered. My mail server also does not show any log entry when the 'vanished email' bug is encountered.

 

 

================================================

BREADCRUMBS:

(ideas that have been posted on this forum thread that I proved to

myself are NOT causes and/or solutions to the bug)

------------------------------------------------

> someone suggested the problem might have to do with attachments. This does not apply to me, yet I get the bug.

 

> someone suggested the problem might have to do with the use of the failto or replyto attributes of the cfmail tag. This does not apply to me, yet I get the bug.

 

> someone suggested the problem might have to do with the use of cfmail's bcc attribute. This does not apply to me, yet I get the bug.

 

> someone brought up that there is a problem running a scheduled program that sends cfmail, when that program resides under SSL territory. There is a blog link to that problem (http://ethermuse.blogspot.com/2011/10/adobe-coldfusion-cf-mailer-dies. html) and a work around. (The blog is an "all or nothing" failure having to do with a scheduled process that resides under SSL). Good info, however, it is NOT the same problem that this thread is about.

 

> someone mentioned CF 9.01 bug 83980 "might" factor in. I am using CHF4 on my CF 9.01 and so I already have that bug fix. Rule that one out.

 

> some have mentioned the java versions, wondering if there is a tie in. My own observation: The problem existed when we were running java 1.6, and is still a problem after updating to 1.7 (which chf4 allows).

 

> someone suggested that the problem may have to do with Java garbage collection (GC) issues. However, the symptoms posted about the GC do not match up to the bug that this thread is about. In the case mentioned earlier on this thread, it talks about mail delivery STOPPING ... not "vanishing an occasional email". Based on my experiences (see 'optional GC' reading below), I am removing the GC theory as a contender in this particular email 'vanish' bug.

....

(optional GC reading): My own experience with GC is substantial: In the past when we ran on a 32-bit system, occasionally we would see "out of memory errors" due to GC, and that would manifest itself as an ugly CF GC error in the client's web browser. When this happened, CF was crashed. It would not recover by itself. We were running CF standard at the time, thus I did not have access to the CF 'server monitor' tools (it's in CF-Enterprise only), so I loaded the  java monitoring tools discussed here:

http://docs.oracle.com/javase/6/docs/technotes/guides/management/jcons ole.html

which are quite a bit more robust than the CF monitoring tools in regard to monitoring what's going on under the hood of java's memory use. I played with many GC settings, and other JVM arguments, and eventually mitigated that GC crash to some degree. (But it wasn't until we updated to 64bit & CF-ent so that I could throw more memory at the JVM heap (we went from 1gig to 9gig) that the GC problem finally went away for good).

 

You must use some form of java garbage collection. The default used by Adobe is the one I now use (XX:+UseParallelGC, as shown in the "background" section above). That default setting is mainstream and robust. It is fully discussed by Oracle here:

http://www.oracle.com/in/corporate/events/jvm-tuning-1886470-en-in.pdf

 

However out of curiosity, I monitored my memory usage as I ran test runs of bulk email sends in batches of 10,000 at a time. As the spooler processed these batches, I would see CF memory usage go up (in the ColdFusion server monitor). Then, at the same time, I would manually "Run GC" and I would see the memory graph changes as GC was processed. However, at no time did I ever see the spooler processing come to a halt or even falter one bit. I even manually ran the GC multiple times during the emptying of the mail spooler. I am not denying that the GC might manifest itself by "stopping mail delivery" in some CF installations (32b and low memory installs particularily) and I appreciate the time that EnryTheMangia took to share his information :-) , but I could not see any issues with GC in my installation, and I DO have the CF mail 'vanish' bug. Thus I removed the GC theory as a contender.

(end optional GC reading)

....

 

================================================

REPLICATION:

(a very detailed review of my testing procedures which lead to replicating the bug)

If are not interested in my testing methodology, you can skip to the next section.

section where I sum up the key points.

------------------------------------------------

> See the Background section for architecture info.

> Check to make sure that my test mail server has reasonable connection timeout value. I used the default setting (9 minutes for Surgemail, 10 minutes for the IIS/SMTP mail server).

> Check to make sure that my test mail server has no tarpitting, spam traps or quantity received limits for the IP address I will be testing from.

> Whitelist the test IP from the virus filter so that doesn't get in the way.

> Setup a test email account and make sure it has no spam filters or limits that might get in the way, and that it's mail box is plenty large.

> Launch a test instance of my Coldfusion 9.01 Enterprise server and shut down all unneccessary services.

> Write a simple CF program that will let me input number of emails to send, and whether to use the spooler or not, and which mail server to use. The program sends email with a subject line formatted to indicate the parameters used, plus the individual email # (crucial to solving this mystery it turned out), in an order that allows me to easily sort the resulting email account inbox by subject.

> Create spreadsheet templates to track the results of the runs (initially I tracked number of emails in the batch, the number of 'vanished' emails in that batch, spool on/off, and the thread type. Later, as patterns emerged, I added spool interval, mail delivery threads, and finally, 'maintain connection' on/off.

> Initially I use the default CF mail settings (which are "maintain connection to mail server: yes", spool interval: 15 seconds, Mail Delivery threads: 10, Spool: yes, Spool to: disk.)

 

And off to the races! Initially I simply wanted to prove that mail was vanishing, and that the CF mailsent.log showed it as being sent and that the mail server logs had NO entry at all. I also wanted to gain a feel for just how many emails were being lost.

 

Thinking that the more I sent per iteration, that the more likely it would be that I would lose mail, I started with large batches (several thousand). Then I would look at the number of items in the test email acocunt's inbox. If it matched up, I would clear the inbox, and run another batch, over and over. It is worth noting here that with the spooler interval set to 15 seconds, that was plenty fast enough so that the spooler would empty before I started the next iteration. However, I got pretty fast at each audit (sometimes in under 30 seconds), and that actually proved to cost me extra time before I started seeing patterns, because the faster I went, the fewer I seemed to lose! Soon I would discover why.

 

When the number of emails sent did not match the number in my inbox, I would record how many were lost and verify that the mailsent.log showed that the lost item(s) was sent, and verify that same email item was not in the mail server logs. In order to do that verification, I would sort the inbox in subject order and scan through to see find the gap(s) in the numbering sequence, then parse the mailsent.log to find that number, and then parse the mailserver logs for that number as well (Surgemail logs include the subject of the email received). This was made more difficult due to the spooler processing the spooled files in FILE NAME numeric order, but those file names seemed to be created randomly (ie: not created in an order that matched the order in which the email was actually sent to the spooler. So, for example, in a batch of emails, the first email sent in the cfloop might have a file name (in the spooler) of Mail444444444444444444.cfmail and the 35th one might have a file name of Mail1111111111111111111.cfmail and so the spooler would process the 35th email sent to it before the 1st email sent to it and that one would be the first to show up in the mailsent.log... a pita for auditing). However, I stuck with it and ... !BINGO!

 

*** REVELATION #1 ***: if an email(s) vanishes, it is ALWAYS the first one(s) listed in the mailsent.log's records of the spooler batch that it went out in.  But, conversly, the first one(s) listed in the mailsent.log's records of a spooler batch don't always vanish.

 

However, one thing that perplexed me was that some of you have said that it seemed that mailworker threads were never associated with vanished email, but that scheduler threads were. (and that "maybe" jrpp were too). With my bulk send testing, only the mailworker thread type was being used (since I was processing spooler batches greater than 19 emails), yet, mail was being lost! Hmmm. I eventually came up with a plausible theory that reconciled this descrepancy (mentioned below under errata).

 

I decided to test the 'scheduler' thread type. First I tested that the spooler processing quantity threshold that CF uses for scheduler thread type was indeed 19 (over that uses the mailworker thread type) ... and I proved that to be the case with one exception ... if you set the Mail Delivery Threads count to 1, then the scheduler thread type is always used. Any other Mail Delivery Threads setting uses the 19/20 limits. I also proved to myself that the "X" number associated with the threads (scheduler-X, mailworker-X) maxes out at the value of the Mail Delivery Thread count, but starts at zero. So, with a Mail Delivery Thread count of 8, X could be 0-7. Sometimes I had to send out many larger batches before I would see all the X values, but eventually I would.

 

I set the Mail Delivery Threads back to it's default of 10, and started testing low volume batches (19 and under) so that I could thoroughly test the scheduler type thread. This was MUCH much faster to audit, as you can imagine. At first, as I was warming up to the audit process, I saw some vanished emails, and recorded them to my spreadsheets, and noticed that once again, REVELATION #1, above, was true here too. However as my auditing speed increased I saw fewer and fewer failures. But then my phone would ring, or a co-worker would interrupt me, and when I went back to testing, often I would see a failure right off the bat! Hmmm.

 

Then I decided to test the jrpp thread type, which is used when the spooler is off. I proved to myself that this thread type also would lose emails. (the "X" value in a jrpp thead is not tied to the Mail Delivery Thread count since the Mail Delivery Thread count is only applicable to the spooler. I did not take time to examine the X value logic on the jrpp threads). Discovering that jrpp threads were present on 'vanished' emails lead to ...

 

*** REVELATION #2 ***: All three thread types are potentially able to be seen associated with 'vanishing' mail. THEREFORE the bug is not caused by the spooler (since thread type jrpp-X is only used when the spooler is not used).

 

Then I had my next breakthrough. I started playing with the Mail Delivery Threads setting. Up until now, the highest number of mail I had seen 'vanish' in a spooler batch run was 10 (and yes, they were the FIRST ten listed in the mailsent.log for that spooler batch ... in accordance with REVELATION #1 as noted above). I changed the Mail Delivery Threads to 5 and it wasn't too long that I noticed I never lost more than 5 emails per spooler batch. Playing more with the Mail Delivery Thread setting, iIt slowly came clear that the maximum lost emails per spooler batch never exceeded the Mail Delivery Threads setting. However, many times no email was lost and many times a number less than the Mail Delivery Threads was lost. (I did not test this above the default Mail Delivery Threads setting of 10). INTERESTING NOTE: when I had my Mail Delivery Threads set to 1, my test environment could send 2500 emails per minute. However, when I tested the performance when the Mail Delivery Threads was set to 10, then delivery speed was only 3360 emails per minute ... not a very large gain! Not sure how to explain that, but I tested it many times.

 

*** REVELATION #3 ***: The upper limit of how many emails are lost in a spooler batch never exceeds the value of Mail Delivery Threads.

 

During the copious time spent auditing the mailsent.log in the prior testing, I also discovered this in regard to the two thread types that aree seen on spooler batches (scheduler-X and mailworker-X): let's say I have Mail Delivery Threads set to 10. In the worse case, within one spooler batch, I might see 10 lost emails. If so, then mailworker-x threads will be X=0 to 9. X is NOT repeated! If running a small batch (under 20), then let's say that I lose 2 emails in that batch. If so, then I see scheduler-x threads but again, X is NOT repeated! (worth noting is that the X order can be any order, ie, it does NOT neccessarily start as 0 and sequentially work up to 9. However, again, if 10 emails are lost from a spooler batch, then X is NOT duplicated in those ten lost mailsent.log entries).

 

*** REVELATION #4 ***: a thread # is never repeated for lost email within the same spooler batch.

 

Next I spent a bunch of time testing the spool interval setting which lead to the next breakthrough. I started with a spooler interval of 5 seconds (sending out just a few emails per batch), and slowly increased to higher and higher. I tried to keep up with my test batch sends (so that if the interval was 10 seconds, I sent out a test batch once every 10 seconds). I was seeing no lost email, except if I got distracted, then came back to the test, I might see lost email on the first batch I sent. Now I was starting to get it. Sure enough, as soon as I was using a 30 second interval, and sending test batches every 30 seconds, I started seeing lost email! On a hunch, I lowered the interval to 5 seconds again, but this time, I sent test batches only every 30 seconds. Email was lost! and it was based on the interval between times that the spooler actually had something to process (and not the spooler interval setting per se').

 

*** REVELATION #5 ***: In my test environment, I never lost email at PROCESSED spooler intervals of 25s and lower. I started seeing loses at spooler intervals of 30s (60s seemed worse than 30s, but any higher than 60s didn't seem any worse than 60s).  By processed spooler intervals I am not talking about the value of the spool interval setting. I am talking about the time between actual spooler activity (ie: no mail in the spooler = no spooler activity).

 

Finally, I had one more setting to go. The 'Maintain Connection to Mail Server' setting. I already knew from early trial runs that turning this off meant VERY SLOW mail delivery ... in my test environment, the capacity is about one email per second. Very slowly, another pattern started to emerge. It is the BIG ONE:

 

*** REVELATION #6 ***: If you don't use the 'Maintain Connection to Mail Server' setting, then you will NOT LOSE EMAIL.

 

Leaving the 'Maintain Connection to Mail Server' is such an easy fix!!, but significantly slows down your mail deliver (see workaround section below for a better discussion). Therefore,  having it off is problematic for CF installations that need to send a lot of email. In my case, we occasionally send out "some" bulk mail (such as newsletters, etc), but it is an amount that CF could keep up with if I set 'Maintain Connection to Mail Server' to OFF (ie: one per second would keep up with our needs in theory). BUT, we also MUST have our normal email (such as billing, new orders, etc) go out promptly! and if a newssletter is being processed slowly, then it becomes a bottle neck (in FRONT of subsequent 'non-bulk' email) that we can't afford to put up with. Thus, I needed another solution to fix this bug. And there is one. See the "Workaround" section below.

 

And finally, I decided to test sending mail through a local IIS/SMTP relay server (on same server as CF), which simply passes mail to the same test accounts on the remote Surgemail server. By now, I was a pro at understanding how to "make this bug happen". I could not trigger the bug this way. All mail sent to the local smtp server was accepted, with both sides logging the activity properly, then the relay server passed off to the remote mail server, and both sides again logged properly on that handoff as well. Now it seems I have proven that even with long delays between mail sends (30+ seconds) and "maintain connection to mail server" ON, that I could not trigger this "vanished email" bug when mail was passed off to the local smtp server.

 

*** REVELATION #7 ***: Sending email to the local IIS/SMTP server (on same server as CF) never displays the bug.

 

Errata:

MY THEORY as to why many of you mentioned that you don't see failure on mailsent.log items on which the mailworker type thread is present, but yet I am able to create the bug with that same thread type. Remember, I tested in batches, and I paused between batches. Generally, the mailworker thread indicates to me (in a production setting) a fairly busy CF mail server (since it is used for bulk sends of over 19 mails at a time). So, my theory is that if the server is busy, then mail is being sent out fairly constantly. And if it is fairly constant, then perhaps there are fewer lulls. And fewer lulls mean less chances of lulls over 30 seconds, thus fewer lost mails. Conversely, if the scheduler thread is used, then it during a less busy CF mail period. Less busy means more lulls. More lulls mean a greater chance of 30+seconds passing between mail processing. And that means more chance of lost email. And yet another take: for those of you that have your Mail Delivery Thread count set to 1, then that means that the scheduler type thread is always used (see the Replication section for details). And so, any failure will aqlways be associated with the sscheduler thread type in this case. And finally, one more theory: in CF standard edition, there is no Mail Delivery Thread option. I have no idea what the value sets to in the standard edition. Maybe it is 1. And if so, then anyone that uses CF standard will always see the thread type as scheduler.

 

I strongly recommend that you run your own test and make sure the interval threshold for triggering this bug, in your environment, is also 30 seconds. If not, adjust the "workaround" notes accordingly. Also, running your own test will allow you to do load testing so that you can see how much mail you can send if you turn OFF the "Maintain Connection to Mail Server" setting. Also load test using one "Mail Delivery Thread" versus a higher value (such as 10). Both those pieces of imformation will be useful as you read through the "Workarounds" section below. Use my 'Replication' section above as a jump start for your testing. (my code snip for testing is included, below). 

 

 

================================================

*********************************************

SUMMARY of the BUG ( !! ATTENTION Adobe !! )

*********************************************

------------------------------------------------

In certain replicable situations, CF 'vanishes' email, but logs it as successfully sent. When this happens, the logs on the mailserver show no connection attempt at all from the CF server. The bug happens when the "Maintain Connection To Mail Server" setting is turned ON and IF 30+ seconds has elapsed since the last mail was sent. {IE: for the spooler: if the interval between spooler activity is 30+ seconds (which can be caused by a "Spooler Interval" setting of 30 or higher). It can also be caused by low mail activity where the spooler sits idle for more 30+ seconds). If not using the spooler, then if 30+ seconds has passesd since the last cfmail command}.  As mentioned in my Background section above, I did more testing with a significantly less robust CF server, and the failure threshold remained at 30 seconds on that server as well).

 

The above conditions must be met for the 'vanished email' bug to happen. Although the above conditions are needed to cause this bug, the above conditions do not guarantee that the bug will happen. However, in my own testing, it happens a LOT when those conditions are met.

 

In the case of the spooler, the 'vanished' email can number up to the value set in the Mail Delivery Threads setting, for each spooled batch sent out. For unspooled mail, I have never seen more than one email lost per interval. So, for example, if 30+ seconds goes by with no cfmail activity, and then a burst of 20 emails is sent by the cfmail command, and the spooler is used, and if Mail Delivery Threads is set to 4, then 0,1,2,3 or 4 emails is lost from that batch of 20 emails. On the other hand, if the spooler is not used, then 0 or 1 email is lost.

 

IMPORTANT CLUE: The mail that is lost is ALWAYS that mail which is logged at the very start of the post-30+ seconds interval, as seen in the mailsent.log file! AND the thread ID that is logged is NEVER implicated in more than ONE loss per 30+ second interval. That is why if you have your "Mail Delivery Threads" set to 1 (and the spooler on), then you only lose a max of 1 mail. If you have it set to 4, then you can loss up to 4 mails, and all 4 show in the logs as using DIFFERENT thread IDs (not type, ID). And if you have the spooler turned off, my observation is only one thread ID (of the jrpp type) is used for each new 30+ second interval. (ie: per 30+ second interval, potentially one email can be lost when spooler is off).

 

IMPORTANT CLUE #2: I can't reproduce the bug if my mail server is IIS/SMTP that is on the SAME server as CF.

 

ALL of the above patterns have been born out in my extensive testing over the span of several days, sending hundreds of thousands of emails in test batches, using my test program (code included below). See the Replication section above for more details.

 

MY THEORY: The "maintained connection" to the mail server is corrupted, perhaps a network/TCP type issue. The magic number of idle time (30+ seconds) triggering the failed "corrupt(?)" reused connection seems more than a coincidence (and that 30 second threshold holds true when I tested from two different locations using different architecture). The fact that the maintained connection does NOT fail if sending to an internal smtp server seems to also implicate a corrupt connection across the network. Most importantly, CF seems oblivious to this failure! and logs success instead.

 

BOTTOM LINE: The responsibility for properly handling the email delivery FALLS SQUARELY IN THE LAP OF COLDFUSION. Either make a connection to the remote mail server, pass the mail to it and log the pass-off as success, OR log that the remote mail server did not respond and put the mail into the undelivered folder. But PLEASE do NOT tell us falsely that the mail was delivered and then simply vanish it! The log files on the remote mail server have caught the lie. My testing proves, as well as the experience of other CF users, that when the conditions are right, that NO CONNECTION IS ESTABLISHED to the remote mail server and the mail is simply "vanished" by CF. That is what is causing us users, and our businesses, so much grief.

 

 

================================================
*********************************************

WORKAROUNDS for dealing with this bug:

*********************************************

(The good news: yes, you CAN workaround this bug and prevent mail loss)

------------------------------------------------

Solution #1: (For low volume CF email installations): Turn OFF the "Maintain Connection to Mail Server" setting. SIMPLE! This will really slow down delivery of email, but if you send a low volume of email, and never send out bulk mail that could delay your important mail (which it may do since the delivery is gonna be slow), then this is by far the easiest fix. Do some bench testing FIRST with that turned OFF... sample code for testing below. (My testing environment shows I can send one email per second with that setting turned off... your milege may vary). With that setting set to OFF, then the 'vanished email' bug will not plague you! (no matter what your other CF email settings are set to).

 

Solution #2: for all the rest of us that need to have "Maintain Connection to Mail Server" turned ON so that all of our mail (bulk mixed with non-bulk) goes out quickly, then I suggest writing a simple program (sample code is shown below) that runs for two minutes and during that time, it sends an email to a 'bit bucket' account every 10 seconds or so. Then run that program from the CF scheduler every 2 minutes. That way there is never a lull in email processing that is long enough to trigger the 'vanished email' bug! (IMPORTANT, if using the spooler (recommended), then set it's spooling interval setting to well below 30 seconds (10 seconds is a good number). And, as a added precaution, if your load testing shows that you can get away with a spooler Mail Delivery Thread count of 1, then set it to 1 "just in case", in order to limit vanished emails in the event your scheduler or scheduled program fails). *NOTE*: if you use more than one mailserver in your CF production cfmail environment, then adjust your scheduled program's code so that it writes to all of them. (See sample code below)

 

 

================================================

My coding used in the Replication and the Work Around sections:

-----------------------------------------------------

Two code snips are provided below:

1. my email testing program, can also be used for benchmark/load testing

2. the program that I run in the scheduler so that I create email activity once every 10 seconds so that there is never a lull in CF email processing that is long enough to cause the dreaded 'vanished email bug'.

 

......................

......................

1. my email testing program, can also be used for benchmark/load testing

......................

......................

 

<html>

<body>

<!--- author: Byron Knapp {me at byronknapp.com} 4-21-2013 --->

<!--- test-cfmail-bug.cfm --->

 

<!--- SET THE FOLLOWING 4 VARIABLES --->

<cfset defaultmailserver="your-email-server-FQDN-or-IP.yourdomain.com">

<cfset bitbucketmailaccount="cf-mail-bit-bucket-acct@yourdomain.com">

<cfset testingmailaccount="cf-mail-test-account@yourdomain.com">

<cfset testmailfromaccount="you@yourdomain.com">

 

<!--- the testingmailaccount should have NO OTHER ACTIVITY except from this program!!! --->

 

<!--- Also, if you change the name of this program, then make sure to use same name in the form action line below --->

 

<!--- use for testing for the CF 'vanished' email bug (and for benchmark/load testing... particularily for testing your capacity to send email for various settings of the CF mail GUI setting of 'Mail Delivery Threads' and testing for capacity to send email with the CF mail GUI setting of 'Maintain Connection to Mail Server' ON or OFF) --->

 

<!--- FOR MUCH MORE HELP AND ADVICE in setting up your testing environment, PLEASE REFER TO THE REPLICATION section of my post on the thread found at

  http://forums.adobe.com/thread/585718

  my post is on page two of that thread under my handle of byron_knapp --->

 

<!--- NOTE NOTE if testing with spooler on, then you should set your spool interval in the CF mail settings GUI to a very low number (I use 2 seconds) so that the interval set in this test program controls the actual interval that mail is sent (plus or minus 2 seconds) ... otherwise it's gonna confuse you (for example, if you have spool interval set to 15 seconds, and run this program set at 10 seconds, then there is actually time for two batches sent from here to land in the spooler between spooler processing rounds ... tricky to audit in that scenario ... so set your spool interval to a really low number so that the spooler checks 'continuously' for mail in the spooler to process, and let this program control how often the spooler actually has stuff to process --->

 

 

<body style="font-family:Arial, Helvetica, sans-serif">

<div align="center">

 

<cfif not IsDefined("form.fieldnames")>

 

<form action="test-cfmail-bug.cfm" method="post">

 

  <table><tr><td><center>

    Please read all of the comments embedded in this program for more tips and advice. <b>THERE ARE four variables you must set in the program before using it.</b><br><br>

    <b>This program allows you to send batches of email to your Coldfusion server for the purpose of testing for the 'vanished email' bug. </b>The primary use in testing for the bug is to determine if the bug DOES happen to you, and if so, at what interval between mail activity does the problem begin to manifest itself (for me, it seems to be if there is a 30+ second interval between mail send activity on the CF server). My initial findings show that although slow, if you turn OFF the 'maintain connection to mail server' setting, that you will never lose email. However, that is not an acceptable solution for many of us (too slow), and <b>I have listed a workaround in my post on the Adobe user forum</b> (that allows you to continue to use the 'maintain connection to mail server' setting = ON), as well as a detailed analysis of my testing in regard to this bug, on page two of this thread (look for the post by byron_knapp):<br>

    http://forums.adobe.com/thread/585718<br>

    Also note that the number of emails per batch, for bug testing, does not 'cause' the bug per se', but rather, it is the interval between batches that seems to matter. Don't forget that if you have the "maintain connection to mail server' turned OFF (in your CF mail GUI settings screen) then my testing shows that you will never see the bug, but don't take my word for it; test it both ways.<br><br>

    <b>This program can also be used to do benchmark/load-testing to find the capacity of your CF mail</b>, and how that capacity is affected by the CF mail GUI settings such as 'maintain connection to mail server", spooler on/off, spooler 'Mail Delivery Threads', spooler interval, etc.

    <br><br>

    Suggestion: if testing for the 'vanished email bug' I recommend that you set your CF mail GUI setting of 'spool interval' to 2 seconds. See comments in this program to understand why. (Ignore this comment if testing but not using the spooler, and ignore this comment if using this program for load-testing).<br><br>

    Use spool?

    <input type="radio" name="spoolyn" value="Yes" CHECKED/>Yes

    <input type="radio" name="spoolyn" value="No" />No   <br><br>

 

    Number of batches: <input type="text" name="numbatch" size="4"/>   <br><br>

 

    Delay between batches(seconds): <input type="text" name="batchinterval" size="4"/>   <br><br>

 

    Number of emails per batch: <input type="text" name="numemail" size="4"/>   <br><br>

 

    <cfparam name="svr" default="#defaultmailserver#">

    Server: <input type="text" name="svr" size="20" value="<cfoutput>#svr#</cfoutput>">    <br><br>

 

    <input type="submit" value="Submit (The code has timeout logic to over-ride your default CF timeout setting since this may take awhile)">

 

  </center></td></tr></table>

 

</form>

 

<cfelse>

<cfif NOT IsNumeric(numbatch)>

    Number of batches needs to be a number

    <cfabort>

</cfif>

 

<cfif NOT IsNumeric(batchinterval)>

    Delay between batches needs to be a number

    <cfabort>

</cfif>

 

<cfif NOT IsNumeric(numemail)>

    Number of emails per batch needs to be a number

    <cfabort>

</cfif>

 

<!--- the next comment and logic applies to when you are using this program to test for the vanished email bug. However, the logic can be left in for benchmark testing since it dosn't hurt anything in that scenario ... --->

<!--- first start the process by throwing away the first batch, since we have no idea how long it has been since prior email was processed, and so the first batch could skew the results. The first batch sets the baseline as the starting point for all other batches sent next. The reason why it sends 10 is because it is assumed that you may be using UP TO the default number of 'mail delivery threads' (MDTs) ... set in the CF GUI for mail ... and prior testing has shown that up to that number of emails might be vanished per batch. If you are testing with a higher number of MDTs, then raise this loop to match, however, there is no need to make this loop lower (even if your MDTs are set lower), since there is no harm in sending out too many bit-bucket priming mails --->

 

<cfloop index="iw" from="1" to="10">  

  <cfmail to="bit bucket email don't need to audit <#bitbucketmailaccount#>" from="<#testmailfromaccount#>" spoolenable="#spoolyn#" server="#svr#" subject="#iw# can delete me, not needed" >

    can delete me, not needed

  </cfmail>

</cfloop>

 

<!--- convert seconds to milliseconds for sleep command used later --->

<cfset sleeptime=batchinterval * 1000>

  

     <!--- below is another way to introduce a pause in coding, mentioned here so I can come back to it if needed, in the future 

               <cfscript>

                  thread = CreateObject("java", "java.lang.Thread");

                  thread.sleep(4000);

                </cfscript> 

      --->

 

<!--- set value for timeout and be generous, so double the time it needs, since there is no harm in doing so --->

<cfset looptimeout = 2 * (sleeptime * numbatch)>

<cfsetting requestTimeOut = "#looptimeout#">

 

 

<cfloop index="it" from="1" to="#numbatch#">

 

  <cfset ipad1 = Right(10000+it,4)>

 

  <!--- next line creates the delay, with the theory being above a certain threshold, you start losing email from some batches --->

  <cfset sleep(sleeptime)>

   

   <cfloop index="ix" from="1" to="#numemail#">

      <cfset ipad2 = Right(10000+ix,4)>

   

      <cfmail to="test cfmail <#testingmailaccount#>" from="<#testmailfromaccount#>" spoolenable="#spoolyn#" server="#svr#" subject="batch #ipad1# of #numbatch#: mail #ipad2# of #numemail# -- Delay:#batchinterval# Use Spool:#spoolyn# Server:#svr#">

      batch #ipad1# of #numbatch# : mail #ipad2# of #numemail# (Use Spool:#spoolyn# server:#svr# delay:#batchinterval#)

      </cfmail>

   </cfloop>

 

</cfloop>

 

<cfset ttlsent = ipad1 * ipad2>

<cfoutput>Number of batches: #numbatch#<br>Time between batches: #batchinterval#<br>Emails Sent per Batch: #numemail#<br>Use Spool: #spoolyn#<br>Server sent to: #svr#<br><br><b>#ttlsent# emails should be in your test email account</b>, if not, then you may have just suffered from the nightmare of 'VANISHED EMAILS', however if the numbers don't match, then first check the mailsent.log to make sure that there were no failed deliveries logged there that would explain the discreprency, and then check your mail server logs to make sure that there is absolutely no record of the CF server connecting to the mail server for that vanished item that would explain the discreprency. Also, when email vanishes, my observation is that you should be able to locate them for auditing by looking at the first one(s) in each batch (top of batches in the mailsent.log file). My observation is also that when using the spooler, that for each spooled batch sent, that you may lose 'up to' the value set in CF's mail GUI for the 'Mail Delivery Threads' setting. If, for example, you lose 4 emails, then those 4 items lost will be the first 4 items in the mailsent.log for that batch. See the comments at the top of this program for where to find more info.<br></cfoutput>

</cfif>

 

</body>

</html>

 

.........................

.........................

2. the program that I run in the scheduler so that I create email activity once every 10 seconds so that there is never a lull in CF email processing that is long enough to cause the dreaded 'vanished email bug'.

.........................

.........................

 

<html>

<body>

<!--- author: Byron Knapp {me at byronknapp.com} 4-29-2013 --->

<!--- send-cf-bit-bucket-email.cfm --->

 

<!--- notes:

This program sends an email to a 'bit bucket' account every 10 seconds (can be adjusted below as sendinterval) for two minutes. Make sure the bit-bucket account exists. Run this from the CF scheduler every TWO minutes. That way there is never a lull in email processing that is long enough to trigger the 'vanished email' bug (my testing shows that lull needs to be 30+ seconds long. (IMPORTANT, if using the spooler, set it's interval setting to well below 30 seconds (10 seconds is a good number). And, as a added precaution, if your load testing shows that you can get away with a spooler Mail Delivery Thread count of 1, then set it to 1 "just in case", in order to limit vanished emails in the event your scheduler and/or this scheduled program fails).  

--->

 

<!--- create a version of this program for every mail server that you use in your CF's cfmail environment! and schedule all of them to run every two minutes --->

 

<!--- FOR MUCH MORE INFO, PLEASE REFER TO my post on the thread found at

  http://forums.adobe.com/thread/585718

  my post is on page two of that thread under my handle of byron_knapp --->

 

<!--- SET THE FOLLOWING VARIABLES --->

<cfset mailserver="your-email-server-FQDN-or-IP.yourdomain.com">

<cfset bitbucketmailaccount="cf-mail-bit-bucket-acct@yourdomain.com">

<cfset mailfromaccount="you@yourdomain.com">

 

<!--- if you have a second mail server that you use in your production cfmail environment, then set the values below  --->

<cfset secondmailserver="">

<cfset secondbitbucketmailaccount="">

<cfset secondmailfromaccount="">

 

 

<cfset sendinterval="10">

 

<!--- set value for timeout at 200 seconds ... plenty long since this program should only run for 120 seconds (in case the default CF timeout is too low) --->

  <cfsetting requestTimeOut = "200">

 

<!--- how many times? = 2 minutes divided by the interval --->

<cfset numsends = Round(120/sendinterval)>

 

<!--- convert seconds to milliseconds for sleep command used later --->

<cfset sleeptime=sendinterval * 1000>

 

<cfloop index="iw" from="1" to="#numsends#">  

   <cfmail to="bit bucket <#bitbucketmailaccount#>" from="<#mailfromaccount#>" server="#mailserver#" subject="#iw# from #mailserver# delete me, not needed" >

   can delete me, not needed

   </cfmail>

 

   <cfif secondmailserver neq "">

    <cfmail to="bit bucket <#secondbitbucketmailaccount#>" from="<#secondmailfromaccount#>" server="#secondmailserver#" subject="#iw# from #secondmailserver# delete me, not needed" >

    can delete me, not needed

    </cfmail>

   </cfif>

 

  <cfset sleep(sleeptime)>

</cfloop>

 

</body>

</html>

 

=========

CHEERS! Byron Knapp

ps: To Adobe: as compensation for all my time on this, a free upgrade from CF 9 to CF 10 would be graciously accepted

==================


Viewing all articles
Browse latest Browse all 21760

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>