• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.gitignoreH A D30-Mar-201516 32

README.mdH A D30-Mar-20156 KiB11162

imapdedup.pyH A D03-May-202213.4 KiB315213

README.md

1# IMAPdedup
2*A duplicate email message remover*
3
4IMAPdedup is a Python script (imapdedup.py) that looks for duplicate messages in a set of IMAP mailboxes and tidies up all but the first copy of any duplicates found.
5
6To be more exact, it *marks* the second and later occurrences of a message as 'deleted' on the server.   Exactly what that does in your environment will depend on your mail server and your mail client.
7
8Some mail clients will let you still view such messages *in situ*, so you can take a look at what's happened before 'compacting' the mailbox.  Sometimes deleted messages appear in a 'Trash' folder.  Sometimes they are hidden and can be displayed and un-deleted if wanted, until they are purged.
9
10Whatever your system does, you will usually have the option to see what has been deleted, and to recover it if needed, using your email program, after running this script.  (If your server purges the deleted messages automatically, you may be able to prevent this with the *--no-close* option.)
11
12## How it works
13
14By default, IMAPdedup will simply look for messages with a duplicate Message-ID header.  This is a string generated by email systems that should normally be unique for any given message, so unless you've got some rather unusual mailboxes, it's a pretty safe choice.  (Note that GMail, for example, *does* sometimes have some unusual mailboxes.)
15
16If you have messages *without* a Message-ID header, or you don't trust it, there's an option (-c) to use a checksum of the To, From, Subject, Date, Cc & Bcc fields instead.
17
18And if you want to add the Message-ID, if it exists, into this checksum, add the '-m' option as well. I'd recommend this in general, because some (foolish) automated systems can send you multiple messages within a single second, with different contents but the same headers. (e.g. "Subject: Your review has just been published!")
19
20
21## Trying it out
22
23If you want to experiment, create a new folder on your mail server, and copy some messages into it from your inbox.  Then copy some of them in a second time.  And maybe a third. That should give you a safe place to play!
24
25## Simple use
26
27*Note: IMAPdedup expects to run under Python 2. Some work has been done to make it Python-3-compatible, but your mileage may vary!*
28
29You can list the full syntax by running
30
31    ./imapdedup.py -h
32
33but the key options are described below.  You will of course need the address of your IMAP email server, and your username on that server.
34
35Try starting with something harmless like:
36
37    ./imapdedup.py -s imap.myisp.com -u myuserid -x -l
38
39which prompts you for your password and then lists the mailboxes on the server. You can then use the mailbox names it returns when running other commands. (The `-x` option specifies that the connection should use SSL, which is generally the case nowadays. If this doesn't work, you can leave it out, but you should probably also complain to your email provider because they aren't providing sufficient security!)
40
41It's worth trying getting this list at least once because different mail servers structure their folders differently: mine thinks of all the folders as being 'within' the inbox, for example, so they're called things like 'INBOX.Drafts','INBOX.Sent', and those are the names I need to use when talking to the server.
42
43Once you know your folder names, you can run something like
44
45    ./imapdedup.py -s imap.myisp.com -u myuserid -x -n INBOX.Test
46
47and the script will tell you what it would do to your *INBOX/Test* folder.
48
49The `-n` option tells IMAPdedup that this is a 'dry run': it stops it from *actually making* any changes; it's a good idea to run with this first unless you like living dangerously.  When you're ready, leave that out, and it will go ahead and mark your duplicate messages as deleted.
50
51The process can take some time on large folders or slow connections, so you may want to add the `-v` option to give you more information on how it's progressing.
52
53You can specify multiple folders to work on, and it work through them in order and will delete, in the later folders, duplicates of messages that it has found either in those folders or in earlier ones.
54
55# Use with a config file
56
57Michael Haggerty made some small changes to facilitate calling imapdedup from a script (e.g., from a cron job).  Instead of running it directly, create a wrapper script that can be as simple as:
58
59    #! /usr/bin/env python
60
61    import imapdedup
62
63    class options:
64        server = 'imap.example.com'
65        port = None
66        ssl = True
67        user = 'me'
68        password = 'Pa$$w0rd'
69        verbose = False
70        dry_run = False
71        use_checksum = False
72        use_id_in_checksum = False
73        just_list = False
74        no_close = False
75        process = False
76
77    mboxes = [
78        'INBOX',
79        'Some other mailbox',
80        ]
81
82    imapdedup.process(options, mboxes)
83
84This is nice because it doesn't require a password to be passed to the program via a command-line argument, where it could be seen by other users of the system. (This short startup file could be made read-only.)  Note that you will normally need to include in your options class ALL of the options that you might specify on the command line.
85
86
87## Accessing the IMAP mailboxes via a local server
88
89The -P option allows you to access the mailboxes via stdin/stdout to a subprocess, rather than over the network.
90Dovecot can be run in this mode, for example:
91
92    /usr/lib/dovecot/imap -o mail_location=maildir:~/.mbsync/mails
93
94Typically you might wrap such a command in a script, and then specify the script as the argument of the -P option.
95
96
97## Acknowledgements etc
98
99For more information, please see [the page on Quentin's site](http://qandr.org/quentin/software/imapdedup).
100
101This software is released under the terms of the GPL v2.  See the included LICENCE.TXT for details.
102
103It comes with no warranties, express or implied; use at your own risk!
104
105Many thanks to Liyu (Luke) Liu, Adam Horner, Michael Haggerty, 'GargaBou', Stefan Agner, Vincent Bernat and others for their contributions!
106
107[Quentin Stafford-Fraser][1]
108
109[1]:http://statusq.org
110
111