Gmail downloader



All timestamps are based on your local time of:

Posted by: stak
Tags:
Posted on: 2009-10-15 23:04:13

If anybody else would like to download the contents of their Gmailbox into nice little RFC822 message files, here's a quick-and-dirty (read: no error detection or handling) java program to do it. Requires java 1.6 for the console stuff, but can be ported to 1.5 or 1.4 in a pinch.

import java.util.*;
import java.io.*;
import java.net.*;
import javax.net.*;
import javax.net.ssl.*;

public class GmailDownloader {
    public static void main( String[] args ) throws Exception {
        int start = 1;
        int end = -1;
        if (args.length > 0) {
            start = Integer.parseInt( args[0] );
            if (args.length > 1) {
                end = Integer.parseInt( args[1] );
            }
        }

        Console console = System.console();
        String username = console.readLine( "Enter username: " );
        String password = new String( console.readPassword( "Enter password: " ) );

        SocketFactory sf = SSLSocketFactory.getDefault();
        Socket socket = sf.createSocket( "imap.gmail.com", 993 );
        InputStream in = socket.getInputStream();
        BufferedReader br = new BufferedReader( new InputStreamReader( in ) );

        OutputStream out = socket.getOutputStream();
        PrintWriter pw = new PrintWriter( out );
        br.readLine();

        pw.print( "A LOGIN " + username + " " + password + "\\r\\n" );
        pw.flush();
        br.readLine();

        pw.print( "B SELECT \\"[Gmail]/All Mail\\"\\r\\n" );
        pw.flush();
        for (int i = 0; i < 3; i++) br.readLine();
        String numMsgs = br.readLine();
        for (int i = 0; i < 3; i++) br.readLine();
        if (end < 0) {
            StringTokenizer st = new StringTokenizer( numMsgs );
            st.nextToken();
            end = Integer.parseInt( st.nextToken() );
            System.out.println( "Found " + end + " messages" );
        }

        System.out.println( "Downloading messages from [" + start + "] to [" + end + "]" );
        for (int i = start; i <= end; i++) {
            System.out.println( "Downloading message " + i );
            pw.print( "ZZ FETCH " + i + " RFC822\\r\\n" );
            pw.flush();
            br.readLine();
            String fn = i + ".msg";
            while (fn.length() < 12) {
                fn = "0" + fn;
            }
            PrintWriter file = new PrintWriter( new File( fn ) );
            outer: while (true) {
                String s = br.readLine();
                while (s.endsWith( ")" )) {
                    String t = br.readLine();
                    if (t.startsWith( "ZZ OK" )) {
                        if (s.length() > 1) {
                            file.println( s.substring( 0, s.length() - 1 ) );
                        }
                        break outer;
                    }
                    file.println( s );
                    s = t;
                }
                file.println( s );
            }
            file.close();
        }

        pw.print( "C LOGOUT\\r\\n" );
        pw.flush();

        in.close();
        out.close();
        socket.close();
    }
}


The above code is in the public domain, so feel free to do whatever with it. To use:

javac GmailDownloader.java
java GmailDownloader


If it dies partway through and you want to resume, just add the message number you want to start at as a parameter:

java GmailDownloader 42

Posted by Eric at 2009-10-16 08:28:04
Am I just being bloody ignorant, or is there something wrong with the way the POP3 or IMAP interfaces do things?
[ Reply to this ]
Posted by stak at 2009-10-16 09:03:04
What do you think is wrong with the interfaces?
[ Reply to this ]
Posted by stak at 2009-10-17 19:57:30
Update: made a modification the end-of-message detection (the part with the close-paren) since some messages had them tacked on the end of the last line instead of a new line by themselves. This fixes a problem where the downloader would get "stuck" on those messages.
[ Reply to this ]
Posted by stak at 2010-12-20 21:25:53
For anybody that stumbles across this in the future: the code (much improved and generified) now resides on github.
Name:
Comment:
Allowed expansions in comments/replies: [i]italic[/i], [u]underline[/u], [b]bold[/b], [code]code[/code], [sub]subscript[/sub], [sup]superscript[/sup], [url=http://some.url]linked text[/url]
Human verification: Sum of eleven and twenty-four =
Posted by Varun at 2010-12-21 11:10:34
Now that is a nicely efficient bit of coding. I see why they pay you the big bucks.

Of course, my fear isn't that Google will go belly up, but rather that Yahoo will destroy itself. Naturally... there's no way to get data out of Yahoo without paying them.
[ Reply to this ]
Posted by stak at 2010-12-21 18:50:49
Whatchoo talkin' 'bout? I can connect and view my Yahoo! mailbox using imap.mail.yahoo.com just fine. Email me if you want to download your Yahoo mail and can't get it working.
[ Reply to this ]
Posted by Varun at 2010-12-22 17:53:55
Seriously?

IMAP doesn't seem to work here, but it's possible work has a proxy that's acting up. I'll check at home, but almost completely positive I tried exactly that server a month or so ago and got "Bad response".

Hm.
[ Reply to this ]

[ Add a new comment ]

 
 
(c) Kartikaya Gupta, 2004-2024. User comments owned by their respective posters. All rights reserved.
You are accessing this website via IPv4. Consider upgrading to IPv6!