Email Syntax Check in Python

3 May 2008

Sometimes you may want to check that an email address is not syntactically invalid, i.e. it looks like a recognisable email address. I use this approach in my zetact contact form processor.

Of course, it does not mean the address actually leads anywhere, but at least you know are dealing with an email address that could exist.

This is the code I have been using, albeit I have changed it from a class method to a simple function to make this post simpler.

"""Email check using regex."""
    def invalidreg(emailkey):
        """Email validation, checks for syntactically invalid email
        courtesy of Mark Nenadov.
        See
        http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65215"""
        import re
        emailregex =
        "^.+\\@(\\[?)[a-zA-Z0-9\\-\\.]+\\.([a-zA-Z]{2,3}|[0-9]{1,3\
    })(\\]?)$"
        if len(emailkey) > 7:
            if re.match(emailregex, emailkey) != None:
                return False
            return True
        else:
            return True

I decided it would be more Pythonic to try to do this using the built-in string methods, rather than importing the re module and using a monster regular expression. Here was my first attempt.

"""Email checks using string methods - simple version."""
    def invalidemail(emailaddress):
        """Checks for a syntactically invalid email address."""
        try:
            emailitems = emailaddress.rsplit('@', 1)
            emailitems.extend(emailitems[1].rsplit('.', 1))
        except IndexError:
            return True

        if [x for x in emailitems if not x.replace(".","").isalnum()] \
                and emailaddress >= 7:
            return True
        else:
            return False

After a bit of testing and playing with this, a friend pointed me towards the relevant RFC on restrictions of email addresses. While the standard allows the use of many different special characters, in practice email addresses have to be much stricter if you actually want people in the real world to be able to send email to you.

For example, if we allow the email address []@commandline.org.uk, will whatever receives the output of this function be able to use it? As pointed out by Jan Goyvaerts, most software won't actually be able to handle obscure special characters.

We also don't want to water down the syntax check and allow junk for the sake of theoretical but non-existent addresses.

My compromise is to allow these special symbols -_.%+. in the local-part of the email address, and -_. in the domain name. I also do sanity checking on the top-level domain, it needs to be either a generic name or two characters long (country codes are all two letters).

So below is my current version, I added lots of comments and white space to make it easy to read.

"""Ditch nonsense email addresses."""

    GENERIC_DOMAINS = "aero", "asia", "biz", "cat", "com", "coop", \
        "edu", "gov", "info", "int", "jobs", "mil", "mobi", "museum", \
        "name", "net", "org", "pro", "tel", "travel"

    def invalid(emailaddress, domains = GENERIC_DOMAINS):
        """Checks for a syntactically invalid email address."""

        # Email address must be 7 characters in total.
        if len(emailaddress) < 7:
            return True # Address too short.

        # Split up email address into parts.
        try:
            localpart, domainname = emailaddress.rsplit('@', 1)
            host, toplevel = domainname.rsplit('.', 1)
        except ValueError:
            return True # Address does not have enough parts.

        # Check for Country code or Generic Domain.
        if len(toplevel) != 2 and toplevel not in domains:
            return True # Not a domain name.

        for i in '-_.%+.':
            localpart = localpart.replace(i, "")
        for i in '-_.':
            host = host.replace(i, "")

        if localpart.isalnum() and host.isalnum():
            return False # Email address is fine.
        else:
            return True # Email address has funny characters.

    # Start the ball rolling.
    if __name__ == "__main__":
        print invalid("warrior@example.com")

Discuss this post - Leave a comment

1 dbr says...

There's a better, if utterly horrible to read way of doing this using regex's.

http://emailverification.pastecode.com/?show=f76a41a8b

This way isn't too bad, it allows blah+thesethingys@example.com which a lot of websites invalidate (Which is incredibly annoying).. One thing I find a little weird - a return of False means the email is valid? I would have though if valid(mail): print "Valid email" would be a more sensible way of doing things? That way: if not valid(email): print "Wrong" # would work

Posted at 4:33 p.m. on May 3, 2008


2 Ted Hosmann says...

I like the idea in your last example to check that the Domain is valid - problem is...what about users with subdomain email addresses (ted@mail.example.com) or users with country email domains (ted@example.co.uk)

Posted at 7:43 a.m. on May 4, 2008


3 Zeth says...

@dbr,

Checking for syntactically invalid email addresses is what the function does, so:

if invalid(emailaddress):
  #do something

Otherwise the program can just carry on, no else clause required. Maybe my programming style is just different, you can easily change it to be the other way if you want.

Ted, If you read the code more carefully or try it out, you will see that both of your examples will pass the test.

subdomains are not a problem because I allow dots in the hostname: for i in '-_.':

Country code domains are catered for by if len(toplevel) != 2

Posted at 10:06 a.m. on May 4, 2008


4 Zeth says...

@dbr

On regular expressions, the aim of this post is to use Python built-in string methods instead of regular expressions. Your example, blah+thesethingys@example.com will be considered valid by my function as I allow the plus sign: for i in '-_.%+.'

Posted at 10:10 a.m. on May 4, 2008


5 Zeth says...

Here is dbr's regular expression (the pastebin is only temporary).

import re

monster = "(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%" + \
    "&'*+/=?^_{|}~-]+)*|\"(?:" + \
    "[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]" + \
    "|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9]" + \
    "(?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?" + \
    "|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)" + \
    "{3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?" + \
    "|[a-z0-9-]*[a-z0-9]:(?:" + \
    "[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]"  + \
    "|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"

evil = re.compile(monster)

if evil.match("test+label@google.museum.au"):
    print "yay!"

Posted at 10:33 a.m. on May 4, 2008


6 John Reese says...

Just as an FYI, I get an 'XML Parsing Error: not well-formed' message in my newsreader (Liferea) for this entry. Line number 94, Column 98.

This is the first (mostly/enough) valid email checker I've seen that doesn't use a monster regex. I definitely like it.

Posted at 7:56 p.m. on May 4, 2008


7 Ted Hosmann says...

@Zeth

ARGH - I feel like such a n00b. You, my friend, are absolutely correct. Thanks for clearing that up for me.

Posted at 8:59 p.m. on May 5, 2008


What do you have to say?

Show Editing Help


About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

Kurushiyama

June 30, 2008
XML is no replacement for SGML, it's a subset.
An Introduction to ReStructuredText

Peter

June 27, 2008
This is pretty nice, maybe if an mget or mput could be added it would really improve its use. not sure how to do that couldn't figure it out using ...
SFTP in Python: Really Simple SSH

Bryce

June 25, 2008
Sorry to comment on an older post, but I wanted to point out that you misunderstand the purpose of at least two of the extensions you mentioned: Foxmarks and Greasemonkey. ...
Will Epiphany be able to compete with Firefox's extensions?

The Dude

June 24, 2008
1. Green Eggs 2. Ham 3. *things* .. _Here: http://google.com/
An Introduction to ReStructuredText

S.

June 20, 2008
A space is .25 of a level??? Instead of tabs or spaces, or tabs being so many spaces, there should be a "level" character, where one character equals one level ...
Twelve commandments for Beautiful Python code

jk

June 20, 2008
First, excuse my english. I was using firefox like every body else, until one day i look on Top command utility (system monitor) that firefox was using 190Mb of my ...
Epiphany and Webkit 2008

Tom

June 17, 2008
find -exec is nice, but escaping can become complicated if you want to execute, say, awk using a weird pattern. My preferred way is not elegant at all but very ...
Five Tips for Easter

Casual reader...

June 16, 2008
"Firstly, sending your friend a 6 MB file over the network, is nothing like murder. " It is true that sending your friend a file over the network is nothing ...
Filesharing is the democratic choice

Adam Bielinski

June 15, 2008
I like using epiphany because it's fast and lightweight, and is more intuitive in a GNOME environment. I don't think extensions are everything.
Will Epiphany be able to compete with Firefox's extensions?

Kewlmyst

June 12, 2008
Hmmm ... I have been rsync for a long time just to do back ups, but be aware that if you put the --delete option, and have a nice cron ...
Backing up my laptop

yegle

June 12, 2008
@Zeth Hello Zeth,can you share your script you mentioned about the Six degrees of separation?I'm so intrested about it~ And, this is really an excellent work !Thank you for shareing ...
Twitter and GNOME integration

Orlandus

June 9, 2008
Well, if they offer only an object-code-only license.and no object code, then I suppose nobody is legally entitled to have the software at all.
Are your Firefox extensions proprietary software?

Swashbuckler

June 5, 2008
"object code is a C term." Uh no. It's any compiled language.
Are your Firefox extensions proprietary software?

Craig

June 4, 2008
Here's a thought: I don't recall ever being asked to agree to a EULA before downloading a Firefox add-on. Maybe I just haven't downloaded any proprietary ones, but I think ...
Are your Firefox extensions proprietary software?