POP3 versus IMAP mail

Methods to check your mail:

Amplex supports several different ways to access your email:

  • POP3 (Post Office Protocol #3)
  • IMAP (Internet Message Application Protocol)
  • Webmail

The major difference between POP3 and IMAP is where the messages are stored.

When retrieving messages with POP3 the default behaviour is to:

  1. Retrieve from the mail server (at your ISP) the number of new messages on the server.
  2. Transfer the messages from the ISP to your computer
  3. Delete the messages from the mail server.

When checking a mailbox using IMAP a completely different thing happens:

  1. Compare the list of messages at the server and the local computer to determine message state (new, read, deleted, replied to, etc.)
  2. Show the current state of the mailbox.  Synchronize the state of the messages on the server and the local computer.

The big difference between the two is that POP3 REMOVES the messages from the server once it has transferred them to your local computer.  That POP3 removes the messages from the server is very important to understanding the difference between the two accounts.  IMAP leaves the messages on the server until they are deleted by you.
Webmail is simply a way of using a web browser to read your mail using IMAP.  Webmail interacts with your mailbox using IMAP.
Nearly all mail client software (Outlook, Outlook Express, Thunderbird, Incredimail, Entourage, Vista Mail, etc.) can be set up to check mail using either POP3 or IMAP but all default to POP3 unless told otherwise.

So why would you want to use POP3 or IMAP?   Which one should you choose?

If you always check your mail from the same computer then POP3 is a good choice.   Since POP3 transfers the mail to your computer you always have a copy of your mail and you can read it when you are not connected to the Internet.  Remember – POP3 will transfer the mail and then delete it from the server.   Once you retrieve your mail using POP3 it is erased from the ISP’s mail server.
If you check your mail from multiple computers then IMAP is a better method.   Since IMAP keeps the mail on the server along with the state of the mail (read, unread, replied to) it makes it much easier to check your mail from multiple computers.   If you have a computer at work and at home both set up to check the same account using IMAP you will see the same messages on both computers.   When you read a message on one computer and then check the other one the message will show up as having been read already.
If you set up two computers to check mail using POP3 then something really confusing happens.   If both computers are set to check mail every 10 minutes (the default) then the first computer to check after a new message arrives retrieves it and deletes it from the server.   Let’s say for example   your ‘home’ computer is checking for messages using POP3 at 5, 15, and 25 minutes after the hour.   Your ‘work’ computer is checking at 0, 10, and 20 minutes after the hour.   When a new message arrives at 2 minutes after the hour it will show up only on the home machine.   A message that arrives at 8 minutes after the hour ends up only on the work machine.   A message arriving at 12 minutes would only show up on the ‘home’ machine.   Very confusing if your at work waiting for a message to arrive.
Things can get very  confusing if you are using both IMAP and POP3 at the same time.  Keep in mind that Webmail is really an IMAP client.  Let’s assume your home computer is set up to use POP3 and you leave it running and it’s checking for new mail every 10 minutes.   If you’re at work and decide to check your mail using webmail you log in and don’t see any messages – because your home computer is retreiving and deleting the messages from the server every 10 minutes.  Or you get lucky and catch the message before your home computer retrieves it – and then you check again 15 minutes later and it’s gone -  because your home computer just retrieved it and deleted it off the server.

So what’s the moral of this story?

Pick a method of checking mail and stick with it – if you use webmail then always use webmail.

If you want to use both webmail and a mail program like Outlook then set it up to check mail using IMAP.

If you want to use POP3 to check your mail then make sure you DO NOT leave it running when you are not using it.

If your messages all suddenly disappear off webmail it’s a safe bet that somewhere you have a computer checking your mail using POP3 and all of your mail was transferred to that computer.

Are there exceptions to the above discussion?

Yes – there are options available in most mail clients to tell POP3 not to delete messages off the server, to delete them after a certain amount of time, or based on other criteria.  These options are available to make POP3 behave more like IMAP – but they are something of a kludge – your probably better off using a protocol like IMAP.

Two other things occasionally happen with mail:

When using POP3 if the connection to the server is interrupted before all of the messages are retrieved the next time you connect  you will get another copy of all the message you already received.   The is because messages are not deleted until after all the messages are transferred.
When using both POP3 and IMAP the POP3 client will occasionally show a message in your mailbox that says “DO NOT DELETE THIS MESSAGE – INTERNAL MESSAGE DATA”.   This message is stored on the mail server and contains information used by IMAP.   Occasionally a POP3 client accidentally retrieves this message.   You can safely delete the message without hurting anything.

Partial Internet outage 11/12/08 4:24pm to 4:45pm

We noticed a brief loss of connectivity to some destinations on the Internet this afternoon.   The problem occured in a portion of the Verizon network and affected traffic to some popular destinations such as CNN, MySpace, and Facebook.     The problem cleared while we were analyzing the situation and deciding on a course of action.

Numerous network operators are reporting the problem on outage mailing lists.   Verizon has not issued a statement at this time.   The rumor mill is pointing the finger at Level3 claiming bad announcements from Level3 (another very large network).

So how does all this work you ask?  (or the really short intoduction to BGP).

The Internet is not a single entity but rather a collection of independent networks connected together.  The networks connect to each other at gateway routers.   The gateway routers speak a language (actually a protocol) called BGP where they announce to each other what networks (and destinations) are available by sending traffic through the gateway.

Amplex maintains connections to two large networks (Verizon and Cogent) and we recieve information from both telling our router the fastest way to deliver traffic to it’s destination.   Should a network cease to be able to carry traffic to a particular destination (say MySpace) the neighbor router is supposed to ‘withdraw’ it’s offer to carry traffic to that destination.    When that happens, if we still have a route to the requested destination via our other connection, we will send data out the working connection.   Sometimes the route is withdrawn by both providers at the same time – this likely indicates that the destination network itself is no longer online.

In today’s outage Verizon continued to tell our router that the best path to MySpace, CNN, and other sites was to deliver the traffic to Verizon.   Unfortunatly Verizon was not keeping that promise but rather dropping the traffic inside it’s own network.    While that situation is not supposed to happen it does on fairly rare occasions.

Verizon will likely issue a ‘root cause analysis’ regarding the outage at a later date to explain to the routing engineers at other companies how and why this happened and how to prevent it in the future.

How could Amplex work around this problem?

We would shut down the connection to Verizon which then routes all traffic to Cogent.  Unfortunately this is not a decision to be made lightly since shutting down an upstream carrier causes our own announcements to the rest of the Internet to change.   There can be fairly long waits (and disconnections of existing VPN, Video, and other sessions) while the Internet determines the new best path to reach us.

Once we had established that the problem was at Verizon we were preparing to shut down the connection when the problem in Verizon’s network was resolved.

Why is it so hard to make a small router that works properly?

How Netgear routers manage to blow up the network:

We have a customer that was reporting frequent temporary lockups on his wireless connection.   To diagnose a situation like this we have a variety of standard things that we do:

  • Check the signal strength at the customer premise radio and at the transmitting tower.
  • Check for a high number of re-registrations of the customer radio.
  • Check for errors on the Ethernet interface at the customer site.
  • Verify that the software load on the Canopy radio is current.

Assuming none of the above reveal any problems we use a program called Multiping to ping the customer radio and the customer router.   Multiping sends a ICMP Echo Request to the target computer or router and waitw for the response.  If there is a reply the round trip time is plotted on a graph.  If there is no reply that is marked on the graph as well.

In this case Multiping was showing only an occasional dropped packet (no reply).   This is relatively normal behavior and when kept below 1% it is not an issue unless the drops are sequential.   It is important to note when looking at ICMP reply times that routers (and computers) consider responding to ICMP requests a very low priority – if they respond at all.  The lack of a response, or a high ping time to a router in the network path, does NOT necessarily imply a problem – it’s just another piece of information and must be evaluated along with other troubleshooting steps).

If we can’t find any problem at this point well… hard to say.   The problem could be the customers computer, perhaps the customers routers, maybe the site they are trying to reach, or some other issue outside of our control.   In this case we noticed that the packet loss occurred at the same time for the devices between the Oak Harbor router and the Carroll Water customers.   This pointed to a possible issue at Oak Harbor or with the VLAN we use for the Carroll Water tower.   Last week we tried removing the VLAN from the router at Oak Harbor and moving it’s gateway back to the core router at Lemoyne.      While this initially appeared to have no effect the amount of packet loss on the network radically increased as the network load picked up during the day.  Monitoring the network at the network tap locations did not show any obvious reason for the increased loss.  Due to multiple customer complaints we removed the changes made to Carroll Water midday (something we normally try to avoid during weekdays).

It was very odd that moving the VLAN made things worse – it shouldn’t but it did.   The only possibility left is that the problem is something at Carroll Water or Oak Harbor.    On Wednesday we replaced the router at Oak Harbor – which helped nothing.

On Thursday night around 11:45pm the network monitor indicated problems with much of the network.  Normally when this happens (not that it happens often) it indicates a loop on the network or a broadcast storm.   While troubleshooting something very odd appeared – large quantities of ICMP traffic destined to the customer we have been having a problem with.  The traffic was coming from the public IP address of other customers on the network but carried the payload of the packets from the machine running Multiping.  Even worse – the packets have the ‘broadcast’ flag turned on.

Tracking down the routers the packets are coming from reveals that they are all Netgear routers with static IP addresses assigned.  ARG!   Now it’s obvious what is happening…   A packet destined to the customer gets slightly mangled on the way turning on the broadcast bit.   The Netgear routers fail to detect that the packet checksum doesn’t match (since it’s mangled) and far far worse proceed to create a copy of the packet and send it at the original destination.   All the other Netgear routers on the network hear this broadcast packet and do the same thing.  This is like throwing a ball in a room full of mousetraps – the whole thing blows up.

So now it’s obvious… The reason the customer is having problems isn’t that he is losing connectivity – it’s that he is being buried under bogus traffic from a bunch of buggy Netgear routers.   When we moved the VLAN back to Lemoyne earlier in the week this traffic overload hit the entire network rather than being directed at Carroll Water.

The Solution:

Since we were able to identify all of the customer routers involved we contacted the customers on Friday and had them change the type of connection they use (from Static to NAT).  This prevents the routers from doing what they have been doing.

What a mess…..

Mark

Mail servers were slow today

Mail processing was slowed today due to a high load on the machine that checks mail for viruses and spam. The problem occured while performing upgrades to the operating system.

How is mail processed?   It’s far more complex than it appears…

There are 3 machines responsible for processing mail – 2 machines (named sylvio and paulie) serve as the front end and are responsible for initially receiving incoming and outgoing mail, making a few preliminary checks to see if the recipient is valid, and storing the mail to disk (a process called queuing).   Once the mail is queued a seperate process sends the mail to a third server (tony) to be checked for spam and viruses and then (presuming no viruses were found) returns it to sylvio or paulie where it is again queued to disk.   A third process then collects the queued mail and performs final delivery to the local mailbox (for local users) or the recipient’s mail server (for non-local users).

Why so complex?   A bunch of good reasons actually…

  • 2 front end machines allow us to work on one machine without disrupting mail processing.
  • Spam filtering and virus checking is a slow and difficult process and requires considerable resources (CPU, Memory).   Separating the storage and processing helps prevent client timeouts.   Many mail clients (i.e. Outlook Express, Outlook, Thunderbird, etc.) will generate error messages if the mail server does not accept mail quickly.
  • Delivering mail from disk (rather than from memory) is safer.   By queuing mail to disk before acknowledging acceptance we do not lose mail in the event of a software or server crash.
  • Mail is often bursty in nature – a few messages a minute to hundreds a minute.   Since it’s possible for the incoming rate to exceed the rate that messages can be checked for spam and viruses the front end servers hold the mail until the scanner can check it.

The servers have had an issue for some time where the servers will lock up when requested to make a ‘snapshot’ (backup) of the disk.  The lockup issue is a known problem with the operating system version we have been using.    We are in the process of upgrading the operating system which caused the high load on the server today.

Static Electricity is just weird

Our Oregon Road tower has been driving us nuts for months. Odd things happen every time even a small weather event happens – UPS’s shut off for no apparent reason, power supplies blow out, switch ports dying, etc. I know… all of this sounds like either a power or grounding problem. But where and why?

The tower itself is a water storage tank. Steel. Full of water. Connected to miles of underground pipe. You can’t build anything with a better ground if you wanted to.

Power to the site is standard utility power – nothing special but the other tenants on the tower and the owner don’t seem to have any power problems. Why should we?

The weirdness started last summer (2007) when we installed a UPS (backup power) and a network switch at the tower. When a small storm would come by the UPS would shut down. Run over to the tower and everything looked normal but the UPS would be off. Push the power button and it comes right back on. After the 2nd time this happened we ordered a new UPS and installed it.

Due to some interference issues we replaced the backhaul radios feeding the tower. About 2 weeks later a small snowstorm came through and the site shut down again. When I get there the power supplies for the radios are dead and one of the switch ports is dead. At this point I’m wondering if there is something wrong with the new backhaul radios so we switch back to the old ones and test the new ones at the office – they work fine. Now what the heck does all this mean?

Fast forward to Friday night 3/21 when yet another snow storm comes through. Yep – Oregon Road is dead again. Run over there and I can’t believe what I’m seeing. Sparks. Really big sparks. Sparks coming from the network cables jumping to the UPS. Jumping to me when I get too close. Unplug the cables from the power injectors and sparks jump from the cable to the nearest ground or just crack between pins on the cables. Ack! Where the heck is all this electricity coming from? What’s different about this tower?

Time to backup a little bit. Our access points are a Motorola Canopy radio modified by Last Mile Gear (http://www.lastmilegear.com) with a better antenna and a stronger case called a Cyclone. The older Cyclones had a metal mounting bracket that was a pain to install but was all metal and grounded the radio case to the tower. The bracket design was improved about 2 years ago to one that is much easier to install but does not ground the radio case or the antenna.

We use shielded cable to connect the AP’s to the equipment at the base of the tower. We intentionally do not connect the shield at the top of the tower and ground the shield at the bottom. This design creates a Faraday cage inside the shield and prevents electrical noise from entering (or leaving) the cable. The shield is not intended to be a ground for the equipment. If you connect both ends of a shield you no longer have a shield but a conductor.

So what was creating all of the sparks? The only logical explanation is that the dry snow blowing past the antenna and case generates a static charge on the case. As the case is not grounded the charge builds up until it can jump across either the insulated mounting bracket or onto the shield of the Ethernet cable. The shield being closer that was were the charge was going, traveling down the cable, and discharging into the Ethernet switch at the base of the tower. Can blowing air generate much static? Well – I know it occurs on airplanes when flying in precipitation. Helicopters generate very high static charges on the airframe when flying in snow or dust. Apparently it also happens on isolated Access Points as well.

So why are the Cyclone Access Points (AP) not grounded? The idea was that lightning normally discharges from the ground to a cloud (despite how things look). A lightning bolt starts with both a charged path coming from the ground up toward a cloud that meets the bolt coming down from the cloud. The theory is that if you ground the AP you increase the risk that the leader will start at the AP resulting in a greater chance of damage. All good in theory – except that allowing the AP to accumulate static charges due to wind and snow is causing more damage than lightning itself.

We replace the AP with a new one and grounded it very well to the tower. At the base we installed good surge suppressors on both lines coming from the AP and BH with good grounding at the base. We replaced the switch with a new one since the old one was damaged.

I really hope this is the end of this mess. Now to go address the issue at all of the other towers…..

Updated 3/24/08

I had sent Brian Magnuson from Last Mile Gear an email regarding what we had found and asking about grounding the case given the static issues. Brian was kind enough to call me back this morning and we had a long discussion regarding grounding practices for the radios. Brian indicated he had not seen too many other cases where the Cyclone appeared to be collecting a large static charge. He did suggest grounding the AP using the shield in the STP CAT5 cable rather than leaving it open at one end (the common practice for shields). Brian also suggested putting Motorola 600SS’s (surge protectors) at both the top and bottom of the tower (unless you are using a CTM in which case put the 600SS’s only at the top).

I went back to the tower this afternoon and added the surge suppressors at the top of the tower. I had installed a ground wire to the case of the Cyclone on Friday night and decided to leave that in place as I did not have the proper shielded RJ45 connectors in order to ground the Cyclone using the drain wire. I may modify this the next time I need to climb the tower.

As far as the ground lug on the Cyclone case – Brian indicated that he didn’t think they were doing that yet but were going to be doing so on the units with timing onboard.   The reason for adding the ground was that the timing onboard units have surge suppression built into the case and therefore need a ground reference.   Not providing a ground to the case on the existing units was an intentional decision to try to avoid lightning issues.  In our situation where we are seeing strong static buildup it may be necessary to ground the case.