Tuesday, April 08, 2008

The internet is broken

My passage of a small milestone today points to a not-so-small problem with the internet at large. That image over there isn't a joke: I really passed the 50,000-message mark in my Gmail spam box this morning. Gmail deletes spam messages after 30 days, so that means that I have received 50,000 new spam messages since March 10.

Let's look at a few stats on my spam:
  • Spam conversations in the last 30 days: 50,000
  • Other conversations in the same timeframe: 745
  • Percent of conversations that are spam: 98.53%
  • Percent of conversations that are legit: 1.47%
  • Average time between spam messages: 51.8 seconds
I should note that my email setup is a little different than most people's, since I relay mail through brucec.net before forwarding it on to Gmail. However, even with my somewhat unique configuration, 50,000 spam messages in 30 days is ridiculous!

I'm glad that Google has such an amazing spam filtering system, or else email would be completely unusable for me. (As an aside, I'm taking a class on machine learning right now, and it's cool that Google uses advanced machine learning techniques to implement their spam filtering. All of that nerdy computer science stuff I'm learning about is quite useful in real life.)

How did we get to this point? After all, back when I started using the internet in 1995, Spam was almost unheard of; today I have received 37 spam messages since I started writing this blog post. It all boils down to two things: motives and methods.


For spammers to be successful, they need a market of people who will buy their products. If no one ever clicked on a link in a spam message, it seems pretty obvious that it wouldn't be worth it for spammers to continue. However, there is unfortunately some small, clueless proportion of the population who are in the market for "Male enhancement", "New collection of Swi$$ R0lex", or "Mic.rosoft Win.dows Vista U1timate" and think that responding to spam is a good way to get what they want. I have yet to meet one of these clueless people in real life, but they must be out there, or I wouldn't be receiving spam. It's simple economics.

So what has changed since 1995?
  • More people use the internet now, so it's a larger, more attractive target to potential spammers.
  • The average internet user is less computer-savvy (and maybe less everything-savvy) now than in 1995.
Back in 1995, the only people who used the internet were scientists, professionals, and nerdy early adopters like me. Everyone who was on the internet spoke computerese pretty fluently. Also, most of them were pretty smart people who recognized that there were better ways to lose weight or get a discount mortgage than by responding to some spammer scammer's email.

These days, your grandma, Joe Sixpack, my baby brother, and everyone else in the developed world is on the internet. Now, instead of marketing to a group of scientists and engineers, spammers are marketing to a pretty broad cross section of the population. Both the size and the gullibility of the potential market have increased.

Let's do a little cost-benefit analysis to look at the economic incentives for sending spam. Let's estimate how much the bandwidth to send all of that spam might cost. We'll focus only on the top 3 web mail providers (Yahoo!, Microsoft, and Google), ignoring the vast number of other email accounts out there.
  • There are about 600 million web mail accounts in the world.
  • If every one of them receives spam at the same rate that I do (which I admit is probably a little high), that's about 30 trillion messages per month, or an average of about 11 million spam messages per second.
  • If each spam is about 1KB in size (seems about average, looking through my spam box), then that's about 85 gigabits of spam per second.
  • A quick internet search found a site that lists wholesale internet connection lease rates. They charge $15,000/month for a 622Mbps OC-12 line.
  • Even if we assume a 50% discount on line lease rates since the spammers would be using such massive amounts of bandwidth, that still comes out to 138 OC-12 lines, which would cost $1 million/month.
Let's estimate that it costs another $1 million/month to maintain the network equipment and servers to send all of that spam. Finally, let's assume that spammers make only $1 for every message that someone responds to. What response rate do they need to break even?

1 message in 15 million.

In other words, if one person in 600 responds to one of their 50000 spam messages a month, they provide an economic incentive for spammers to keep spamming.


Even if spammers have an incentive to do their thing, they still need a way to do it. They won't be able to walk up to most respectable internet providers and say "I'd like to flood the internet with 30 trillion messages a month. Can I buy a little bandwidth from you?". So how do they send spam without being blocked?

They use your computer.

A large percent of spam is sent by organized crime groups. They realize that they can't just go out and buy bandwidth to send spam—after all, who would sell to them? Besides, if they bought bandwidth from one place, it would be pretty easy to block that one place and stop the flow of spam. They would also have to buy millions of dollars of computer equipment to handle sending all of that spam.

Rather than deal with all of those problems, they use botnets to send spam. A botnet is a network of hacked computers, usually running Windows, that is under centralized control. The person controlling the network can have 1000s of computers all across the internet send out spam for them. This offers a couple of advantages to spammers:
  • It's really difficult to track and block spam from so many different sources.
  • They don't have to pay for bandwidth or servers.
(In reality, many spammers pay a botnet operator for their services, but the general principles are the same.)

If you've ever downloaded and run one of those fun little games someone emailed to you, or if you've ever run a cool screensaver you found on a shady-looking web site, or if you don't run antivirus and firewall software, or if you don't keep your system patched with Windows Update, or maybe even if you do do all of these things, your computer might be part of a botnet and you don't even know it.

The situation is depressing, and there are no really good technological measures for stopping the flood of spam without drastically changing the way that internet email works. At least Google has some good filtering algorithms.

PS—Here's the current count:


I See Badgers said...

freak, I hate those spammers. They are even on blogs now with stupid "click here" comments.

jacob said...

I don't know what SPAM stands for, but I'm glad that Google places it in a special folder so I know exactly where to go every time I need to find pills. And I know the pills are quality because they put these high-tech serial code numbers in the middle of the names like V1agr@.

Jonathan said...

Weird... I haven't received a single spam e-mail since I've had my accounts with gmail. Even my yahoo account (which I've had for over a year and a half) hasn't gotten spam. Why would I receive none and you receive 50,000?

Bruce said...

Jonathan, I get so much spam because my email address is all over the internet (and has been for several years), and because I have a wildcard email address, which means that <anything>@brucec.net goes to me. Spammers don't have to guess my email address username correctly; all they have to do is get the domain name (brucec.net) correct.

PS—I'm up to 53914 spams now.