My $100 UpWork job

Here’s the posting:

Milestone 1: Parsing and processing of encoded IMAP messages for data analysis

The task

Take hundreds of stored emails and process them through with Python3, ‘imapclient’ and ‘beautifulsoup’ (HTML processing) to give a statistical report of the words used in the body of the emails. There are English, Russian, Ukranian, Greek, Chinese, Thai, Japanese, French and German language words in the data set. The output should be a HTML file ‘results.html’

There should be a regular python3 build for this. There should be unit tests (that you create) for this, and the result of those unit tests should be listed as part of the ‘nose’ usage from the command line.

There should be a README that details what pieces to install through ‘pip3’ and how to a) run the build including tests and b) how to use the solution to analyze the data set, and print out the HTML report.

The emails are stored in plain text, and the HTML content of the emails is contained within but is encoded in one of a number of standard ways - base64, utf7, utf8 - you’ll see as you look at the emails.

The Data Set

I have hundreds of stored emails hosted here - I’ve long since deleted the email account used for this. The messages are ‘notifications’ from and concerning fairly unimportant public events. It’s just a data set

The data set to analyze was posted to GitHub: See here

Choosing a programmer

I had a few short conversations with developers. Programmer1 and Programmer2 stopped responding after one or two message, which was weird because both seemed keen. I got the feeling that they were wanting a longer relationship and potentially after the work in question. Upwork can’t veto the after-work, of course. Well not without indentured servitude.

Programmer3 was personable and keen, and knew the price was low, and that its take them some time to complete. They were in their second year of university and facing the end of year exams. I said I was happy for them to prioritize exams and was in no particular hurry. Thus we set off on the chore. That was Sunday, March 26th, 2017. Oh, Programmer4 showed and was similarly personable and keen, but I was happy to stick with Programmer3.


Programmer3 and I finished and I paid the $100 on June 23rd, nearly three months later. In between start and finish, there were 293 messages between us with about 10 words each (average). The resulting solution is checked in too: see the second commit. The only difference between the two is the move of the original emails to another directory, and that was because Programmer3 didn’t continue in my Git clone, but instead started over with a new one.

The solution includes a conversion of all of the files. First from 7/8bit Mime or Base64 into regular HTML (extracted from the bodies of emails), then a generation of a HTML with the analysis in it. You can see the originals, and the two gen’d files for each inside the data folder. There’s no roll-up ‘results.html’ of the frequency of words across a number of emails, though. That was in the spec, buy I can’t complain, I would never specify real work in such a crude way, and I could have nudged Programmer3 earlier.

It took a few goes to get the test done after completion of the ‘prod code’. But that’s to be expected when almost no University teaches unit testing let alone test driven development (TDD). In the end the solution wasn’t what I really wanted. It counted words in each language (like this one), rather that straight work frequency regardless of language, but three months was enough time, to draw the experiment to a close.

All in all, I spent about four hours managing the project from afar. Programmer3, I bet, spent between ten and twenty hours on the project. I think that for both of us it was worth it for the experience, but perhaps it was not worth it economically.

UpWork itself

The platform has usability quirks. One was logging in when it insisted on me regurgitating a secret. My favorite song, in fact, and I have no idea what that is. It took some human intervention to get me back in. Of course, favorites based secrets are bullshit security because some other shit site has been hacked and those answers are all stored somewhere for hackers to reuse. And anyway what is a platform like UpWork trying to protect from hackers?

What’s more curious is the nature of delivery of the work. Programmer3 just uploaded a RAR file (complete with .Git folder within it - with 20 commits). In fact, Programmer3 uploaded a RAR file with the same name a few times. That’s fine until you click it to download it, and you end up with filename (1).rar and filename (2).rar (etc) in your ~/Downloads folder, and a new area for mistakes to happen. I whined about browser downloads before. I’d have preferred a Git repo managed on UpWork. That I could clone/pull as normal. And I’d also like a review system built into it like GitLab has. While I’m at it, I’d also like the messages to be .message files within a directory in that repo/branch, so that I have a lasting offline record of the job, years later.


June 29th, 2017