Monday 22 August 2016

AUTHENTICATED RECEIVED CHAIN PROTOCOL

FINAL EVALUATION


INTRODUCTION

With the adoption of stricter email authentication policies to curb spam, many MTAs are moving to stricter DMARC policies, i.e. any mails that fail the DMARC test are rejected outright `p = reject`. This has helped curb spam, yes, but it has also created problems for intermediate mail handlers like mailing lists. The basic philosophy underlying these authentication checks is to check the extent to which the mail has been altered from its original shape to the point of delivery to the recipient. Mailing lists inherently do modify the mail before broadcasting it onward to the members by adding list-specific headers, footer, and alterations to the Subject, etc. These are necessary for the identification of the mail with the mailing list. And till now, the mailing lists had no way of letting the receiving MTAs know about their handling of the message. This lead to high probability of these mails to be flagged suspicious, and in some strict cases as spam. The solution for this was recently drafted in the IETF ARC Protocol.

From Mailman's point of view, ARC is a protocol that can help mitigate denial of service to subscribed addresses at Yahoo!, AOL and other MTAs that have a `p=reject` DMARC policy. Also it will help reduce the ambiguity in decisions for other MTAs with a lenient policy. Basically setting up ARC would allow Mailman to securely register its handling of the message, thus allowing the set-up of a trust mechanism (not binding) between Mailman and the involved MTAs and hence reducing the denial of service.


WORK DETAILS

The project involved working on two repositories.
The `arc` module was created for Mailman, whereas the work on the `dkimpy` (or sign-message) module mostly involved refactoring the existing module.

1. The `dkimpy` module - The `dkimpy` package originally developed by Scott Kitterman provided the functions for DKIM signing of a mail. So as a part of the project, we worked on refactoring the original module to add ARC features, with minimal API changes. This was done with the final motivation of contributing this code back upstream by sending a PR to the original author.
The commits for the work done can be found here -

https://gitlab.com/adityadivekar/sign-message/commits/master

2. The `arc` module - Since the ARC protocol involves signing in three stages, a separate module was required to implement the signing by making function calls to the `dkimpy` module. This module has been developed with the intention of merging into the Mailman core for ARC signing ability. Also, the required test-suite for the ARC protocol has been implemented in this module.
Currently, we are in the process of merging this module into Mailman.
The commits for the work done can be found here -

https://gitlab.com/adityadivekar/arc/commits/master

AUTHENTICATED RECEIVED CHAIN PROTOCOL

FINAL EVALUATION


INTRODUCTION

With the adoption of stricter email authentication policies to curb spam, many MTAs are moving to stricter DMARC policies, i.e. any mails that fail the DMARC test are rejected outright `p = reject`. This has helped curb spam, yes, but it has also created problems for intermediate mail handlers like mailing lists. The basic philosophy underlying these authentication checks is to check the extent to which the mail has been altered from its original shape to the point of delivery to the recipient. Mailing lists inherently do modify the mail before broadcasting it onward to the members by adding list-specific headers, footer, and alterations to the Subject, etc. These are necessary for the identification of the mail with the mailing list. And till now, the mailing lists had no way of letting the receiving MTAs know about their handling of the message. This lead to high probability of these mails to be flagged suspicious, and in some strict cases as spam. The solution for this was recently drafted in the IETF ARC Protocol.

From Mailman's point of view, ARC is a protocol that can help mitigate denial of service to subscribed addresses at Yahoo!, AOL and other MTAs that have a `p=reject` DMARC policy. Also it will help reduce the ambiguity in decisions for other MTAs with a lenient policy. Basically setting up ARC would allow Mailman to securely register its handling of the message, thus allowing the set-up of a trust mechanism (not binding) between Mailman and the involved MTAs and hence reducing the denial of service.


WORK DETAILS

The project involved working on two repositories.
The `arc` module was created for Mailman, whereas the work on the `dkimpy` (or sign-message) module mostly involved refactoring the existing module.

1. The `dkimpy` module - The `dkimpy` package, based on work by Greg Hewgill (originally called "pydkim" on PyPI) and then substantially augmented by Scott Kitterman, provided the functions for DKIM signing of a mail. So as a part of the project, we worked on refactoring the original module to add ARC features, with minimal API changes. This was done with the final motivation of contributing this code back upstream by sending a PR to the original author.
The commits for the work done can be found here -

https://gitlab.com/adityadivekar/sign-message/commits/master

2. The `arc` module - Since the ARC protocol involves signing in three stages, a separate module was required to implement the signing by making function calls to the `dkimpy` module. This module has been developed with the intention of merging into the Mailman core for ARC signing ability. Also, the required test-suite for the ARC protocol has been implemented in this module.
Currently, we are in the process of merging this module into Mailman.
The commits for the work done can be found here -

https://gitlab.com/adityadivekar/arc/commits/master

Friday 29 July 2016

Working on the new implementation.


Hi!
So after coming back from the conference I had a plan in mind of how things were to be executed. 
The two main tasks were - 
1. Refactoring the dkimpy API for adding ARC support.
2. Developing a standalone ARC module for integration into Mailman.

(Here package refers to the `dkimpy` package)

Since the package was a dependency for the ARC module, it was necessary to proceed in order. After some thinking I decided to add an Enum class to the package with the different types of possible signatures. This could then be passed as an argument to the pre-existent signing/verifying methods. It was made a parameter with the default argument of the signature type DKIM. This was necessary since adding a non-default argument would break all the existing tests and function calls. Also, we wanted to keep the API change minimal, so as to not break setups after the intended upstream merge. 
Now, the ARC set of fields and the DKIM field vary slightly in terms of which "tag - value" pairs are included. However the underlying construction remains the same in the general view.
So now based on the signature type parameter which could be either - dkim, ams or aseal, different tags were added to the signature, using conditional statements.
One of the hurdles I faced here was verifying the ARC Seal. The ARC Seal does not specify the set of headers it hashes in the header hash tag, so it has to be constructed on the receiver side from the ARC Seal instance value. From my (flawed :/) understanding before the signing/verifying functions, given the list of headers, would sign/verify them bottom up which made me construct the headers so -
for i in range(1, idx): headers.append(b'arc-seal') headers.append(b'arc-message-signature') headers.append(b'arc-authentication-results')headers.append(b'arc-message-signature')headers.append(b'arc-authentication-results')
However, the draft clearly mentioned the format with the places of the `arc-authentication-results`and `arc-seal` exchanged in the above logic. I was aware of this, but had tweaked the order as accordingto my understanding the signing was bottom up, so I was right. This was one problem that I had no clue how to approach! After spending quite some time thinking, an idea finally occurred! As a temporary modification, I addedthe header "tag" to the ARC Seal and manually inspected the signature generated to see if the order wasright. And voila, I was wrong :PThe correct order - 
for i in range(1, idx): headers.append(b'arc-authentication-results') headers.append(b'arc-message-signature') headers.append(b'arc-seal')headers.append(b'arc-authentication-results')headers.append(b'arc-message-signature')

I changed the order and the entire test message data (in the ARC module, point 2)had to be changed too since they had implicitly used the older signing.With that the package refactoring was completed. Yay!
Working on the second package was fairly straightforward. I had to write scripts which wouldparse the message and make the calls to the package for the verification and subsequentsigning of the messages. The only tricky area was to correctly verify the ARC Seal. The exact algorithm is outlined in thedraft but it still wasn't a straightforward implementation and took some time.Tests were added and the final pieces were filled in like the setup script and a simple Python scriptthat processes the message through the entire pipeline and provides the results on the terminal foruser's convenience.And that completed the ARC module for Mailman!
Now, currently there are two things scheduled in the pipeline - 1. Sending our refactored dkimpy package MR to the upstream original author for his review andpossible merge.2. Participating in the next interop for live testing of our package in mail exchanges overthe internet with other similar implementations.
I'll keep updating the blog the details for the above.Thanks for reading!
Aditya

Wednesday 29 June 2016

PYCON 2016, PORTLAND.


Yes, that's right! I attended the PyCon at Portland, Oregon as a member of the GNU Mailman group.

This was my first PyCon, and was one of the best things that happened in GSoC. This PyCon is the largest annual gathering of all the pythonistas around the world :D
Attending the PyCon was a very good learning experience. I got to meet people from all corners of the python open source community and made a few new friends too. I also got the opportunity to meet the entire Mailman group at the PyCon who didn't let me feel like a new member though I'd only communicated through mails with them before. 

I was a bit nervous to meet Stephen for the first time! After all he was my main mentor ;)
But then Stephen turned out to be the coolest mentor I could've had. We got the opportunity to discuss the GSoC project at length and decide upon a new plan of work. Before attending the PyCon I had done a major part of the project work already as I had started working in the community bonding period. But then after the discussion we had, it was decided that the project design had to be changed since the implementation I'd developed was inefficient when it came to code duplicacy. ( I wonder why I couldn't think of it before ) 
The new project would involve refactoring the existing `dkimpy` package to add ARC support to it, and create a separate ARC module for usage in Mailman which used the refactored package.
One of the biggest upsides to this was that we'd be able to contribute back upstream to the original `dkimpy` package with the added ARC features. Also the process had to be so, that it involved minimum API changes thus making the upstream changes easy. 
The discussion not only greatly helped in clearly setting the project's workflow but also gave me a lot of insight into spf, dkim, trust boundaries, MTAs and other concepts involved in mail transfer, i.e. Stephen's practical knowledge.

Other than that I made a few new friends at the PyCon. Kushal Das from Red Hat, Arc Riley from PSF, Dan Callahan from Firefox in addition to the Mailman group, who were great company. I hope I get to seem them more.

The thing that interested me the most was how friendly the atmosphere was. It was "open" in literally all senses. We were free to approach anyone during the sprints and ask them about their work, and, if interested, sit right there and start hacking on it. Right next to the project owners! Everyone was really open to conversations and helpful to their best.
During the last 4 days at the PyCon the sprints were held. In sprints you are allowed to work on any python project of your interest among the ones sprinting. So there were many orgs like Mailman, CPython, django, Fedora, Firefox and many more at the sprints and you were free to work with any of them. I worked on Mailman during the sprints with the rest of the group for the next Mailman 3.1 release. It didn't come through due to some reasons but it'll soon :)

For all this I have to thank Stephen and Barry, who took utmost efforts in securing the funding for me to travel to the United States. It wouldn't have been possible without them, and their friends at DMARC who agreed to fund my conference trip. 

Aside from the technical details, the stay at Portland was really enjoyable. I was in time to see the Portland Roots festival which is a food justice movement consisting of a plethora of food vendors offering a view of variety of food cultures. I also visited the Portland Food Trucks which was any eater's joy. You could literally find food trucks for any type of food from any part of the world there. Such was the wide assortment. 
There were a lot of other things but they would probably suit a travel blog more :P

All in all, the experience was truly a great one and I am sure I will try to attend the PyCon in Portland next year too :)

Thanks for reading.

Aditya

Friday 24 June 2016

Week 2


Hi! Thanks to all who went over my previous post.

So the coding period has started and we are two weeks into the schedule. 

The code I had developed till now was designed such that each individual module of the ARC set used its own set of signing and verifying functions. That is, heavy code duplication since the basic underlying process remains the same.
I was actually ahead of the milestones and ended up writing (or refactoring!) a huge chunk of the code, i.e. including the AAR, AMS and AS signing and verifying! Basically the code worked, but was highly inefficient.

So after a long discussion with Stephen, we decided to give the entire project a makeover.
Stephen suggested that instead of developing code with a local objective that would work for us, why not go ahead and refactor the `dkimpy` package such that we could contribute the new code upstream and make it available to other users too. The `dkimpy` package could be refactored to provide ARC capability in addition to DKIM using the same API with minimum possible changes. We could then contribute this back upstream and do our good bit!

The new project would consist of two separate modules - 

1. The refactored dkimpy module with ARC support.
   https://gitlab.com/adityadivekar/sign-message

2. The ARC module consisting of code to take the email from Mailman, and return the ARC signed email. 
   https://gitlab.com/adityadivekar/arc

For now I will start working on the refactoring then, and get back with the next two weeks' updates soon.

Thanks for reading!

Wednesday 18 May 2016

The start of my GSoC journey!


Hi!

This is my first blog post in the many to come as a part of my GSoC journey with GNU Mailman. Today I'll try to explain the project, its purpose and why it's important!

The title of the project is *ARC Protocol Implementation in GNU Mailman*.


With the adoption of stricter email authentication policies to curb spam, many MTAs are moving to stricter DMARC policies, i.e. any mails that fail the DMARC test are rejected outright `p = reject`. This has helped curb spam, yes, but it has also created problems for intermediate mail handlers like mailing lists. The basic philosophy underlying these authentication checks is to check the extent to which the mail has been altered from its original shape to the point of delivery to the recipient.Mailing lists inherently do modify the mail before broadcasting it onwards to the members by adding list-specific headers, footer, and alterations to the Subject, etc. These are necessary for the identification of the mail with the mailing list. And till now, the mailing lists had no way of letting the receiving MTAs know about their handling of the message. This lead to high probability of these mails to be flagged suspicious, and in some strict cases as spam. The solution for this was recently drafted in the IETF ARC Protocol.
From Mailman's point of view, ARC is a protocol that can help mitigate denial of service to subscribed addresses at Yahoo!, AOL and other MTAs that have a `p=reject` DMARC policy. Also it will help reduce the ambiguity in decisions for other MTAs with a lenient policy. Basically setting up ARC would allow Mailman to securely register its handling of the message, thus allowing the set-up of a trust mechanism (not binding) between Mailman and the involved MTAs and hence reducing the denial of service. 

Mailing lists have a variety of purposes, but are perhaps the most important medium of communication for the open source community. Thus the importance of their services cannot be stressed on more, highlighting the importance of the ARC Protocol in enabling their functioning within a trust framework!

The ARC protocol involves the addition of two new headers to the already existing mail.
1. ARC Seal
2. ARC Message Signature
3. ARC Authentication Results
( prepended bottom up )

The draft explaining the protocol can be found here - ARC Draft

So that's the project I'll be working on this summer!
I have deliberately skipped out any implementation details as they will be shared as and when the project progresses with the milestones.
Still, if you come across this blog and find the project interesting, feel free to hit me up if you have you any curious doubts :)

Thanks!
Aditya Divekar