Gathering Facebook Identities from Email

Looking at the announcement of the Facebook Graph API from Facebook F8 it seems like it will be a little easier to work with the Facebook system.

In Raindrop we already have some integration with Facebook in order to identify emails coming from the Facebook system and help you filter them out. But there is a lot more that can be done to help keep your email, Facebook, and other contacts in a cohesive form.

So here’s a quick code example written in Python to grab Facebook identities from emails sent by Facebook. This could be used to gather Facebook identities and then possibly merge those with twitter and email contacts.

First we need to import a couple things.

import email, json, urllib2

Then we’ll need to grab an email message.  I used Thunderbird to save a Facebook notification email message as an EML file locally, I called that file ‘facebook.eml’ as you can see below.

msg = email.message_from_file(open('facebook.eml'))

Now we have a parsed email message msg object and we want to look for the X-Facebook-Notify header in the email so we can extract what happened.

fb_notifiy = [tuple(t.strip().split("=")) for t in \
              msg.get('X-Facebook-Notify').split(";")]

The object fb_notify contains tuples of information about the type of notification.  Here is an example of an object you might see.

[('event_wall',),
 ('eid', '14102494623'),
 ('from', '21602578'),
 ('mailid', '12bf28cG149a112G63016bG21') ]

Using fb_notify we’ll do a really simple grab of the from attribute because that is what is going to be publicly available from the Facebook Graph.

from_identity = json.load(urllib2.urlopen("http://graph.facebook.com/%s" % fb_notify[2][1]))

Here’s an example from_identity object:

{u'first_name': u'Bryan',
 u'last_name': u'Clark',
 u'id': u'21602578',
 u'name': u'Bryan Clark' }

The from_identity can be used to more clearly identify who Facebook is sending this notification on behalf of and we could try merging this Facebook identity with other identities we already have in our contacts.

I saved all this code into this gist if you want to take a look at it in code only form with syntax highlighting.

What are Attachments?

Should links inside emails be considered attachments?  In the technical sense of an email (like rfc 2183) links wouldn’t be considered a different content type.  The question isn’t whether they are technically attachments as much as if they should be attachment-like in the user interface.

Facebook

Facebook handles links in a message almost like an attachment-object and will do some additional meta work on the link to provide a default photo and short description for it.

In the message list view Facebook offers an icon to note that a link attachment was included in a messages.

In the composition view Facebook also grabs links from inside the message and shows them separately as an attachment like thing.  In the screenshot below the composition window grabbed the link inside my message and pulled down a description and number of photos from the site.


link detected in the composition area

This kind of meta data around a link can be really beneficial.  The presentation of the link is better than a person naturally would and since it’s the information is retrieved automatically it only takes extra seconds  to make sure a good image and description appear.

Beyond just the benefits of better presentation is another hot topic in the Thunderbird world of offline support.  When reading mails offline it’s far better to have a more context about the link than none at all.  Even if I can’t bring up the link in an offline state the image, description and comment can help me to recall what the link is about.

Gmail

When you’re using the rich editor for composing a message in Gmail and create a link it has some nice features for recognizing a link and helping you edit it.  Here are some screen shots of what Gmail is doing right now.

Popup indicates the link has been recognized in compose window

Editing a Link

Alternatively Editing an Email link

Pretty straightforward and simple stuff when compared to the extra things Facebook is doing.  Gmail doesn’t add meta-data about the links or make their inclusion visible in the message list.

Links as Attachments

If in Thunderbird we wanted to start treating links more like we treat attachments…

  • How do we present that to the user?
    • Both in terms of composing messages and when receiving links in messages.
  • Do we grab meta data for links sent to us?
    • assuming some kind of policy about what links we can do that with
  • And should we be making links available somehow in Firefox?

You had me at hello

I spent some time on Friday and Monday writing a script to do some analysis of the Enron Email Dataset.  I’m working on a new type of message list view for thunderbird, well a whole new layout actually, but for the message view I wanted to have an idea of message size and content.

Email Data

It turns out that decent email data is relatively hard to come by.  Because of privacy concerns it’s nearly impossible to have access to a companies email where you can see the full exchange between a number of different people.  Luckily the Enron dataset has become publicly available exactly for this kind of research into email problems.

The enron dataset is broken down into directories for many of the people involved and sub-directories of their emails.

  • maildir
    • taylor-m
      • all_documents
      • archive
      • australia_trading
      • boat
      • brazil_trading
    • mclaughlin-e
      • all_documents
      • calendar
      • contacts
      • deleted_items
      • discussion_threads

The script I wrote is designed to read in email files in the directory and analyze the message body for its content.  Then is spurts out the numbers with median and averages computed.

Mail Trends

If you’ve seen Mail Trends, you know that Mihai Parparita analyzed the enron emails for time, size, threading, and people comparisons.  If you download the code you can run it against your own email and will likely see some amazing results (someone should pull this into Thunderbird!).

However the information I was looking for was not available in the mail trends analysis.  Mail trends analyzes only email headers to create relationship statistics between emails.  And while it does have the size of messages in terms of KB I was looking for the size of message in terms of the number of words.

You had me at Hello?

I’ve had this hypothesis or assumption that within the first 2 sentences of an email I can tell what it’s going to be about without reading the rest.  Please try this out on your own!  Read the first two sentences of any email and take a second to think if you can at least prioritize your response required for the message.

Combine this assumption with the my other assumption that it’s more important for me to process my mails than it is for me to actually read the entirety of any message.  I know people are probably thinking, “you should read the whole message”; but in all honesty more than half the messages I get aren’t important to me at all so reading them would just waste time.  This second part of my hypothesis stems from ideas like Inbox Zero and GTD where processing all those “things” is the most important part to being productive.

45 is Median Number of Words Per Message

Analyzing all those emails gave a bit of a statistics problem.  On average it turned out to be something like 120 words per message.  This high average number came from a few outliers of 500+ word messages that were skewing the results towards the high end, when the numbers should really be reflecting the low end where more results were present.  So on average the median number of words per email message was 45.  That’s the average of all the medians… rounded.  Probably should have just included the standard deviation and called it quits.

I didn’t analyze the kinds of words or their length, which would be something else that’s pretty interesting to know.  A next step could be to simply analyze the number of characters per message, that could give interesting hints on how to display the message in it’s entirety.

Back to the Message List View

Here’s a rough breakdown of what GMail gives me when I look at any given message.  It’s just enough to understand who this message is from and what it’s probably about.

It’s possible with the [x] checkbox and the actions menu that I could process this mail and move on.  However usually I end up opening every message to make sure there’s nothing else I should see.  I’m not sure if that’s because I really need to read the rest of the message or what.

So my question continues to be this:  Given a little bit more of the message itself, or a little bit more of the context of the message… is there a better way for me to process my emails?  I have some mockups and ideas on how I think it could be done, but they need more refining.  Will post soon.

Searching for a new find

It’s time to start looking into a new search method for Thunderbird. One of the major changes planned for Thunderbird is a new and improve search, but what does that mean?

What do we have?

First lets look at what we have for a search system.  At a very simple level most search systems break down into two pieces, a search interface for filtering and a results interface for listing.  Thunderbird does this in a couple places.

Quick Search

The quick search entry is always at the top right of the Thunderbird window and allows people to search over the current view.  The results of a quick search fill into the current view, replacing whatever listing was previously shown.

The Quick Search defaults to searching only the Subject or Sender and will only search mail that Thunderbird has downloaded already.  Messages that are not listed in the current view (like in another folder) will not be searched unless that folder is selected, otherwise a person needs to use the Advanced Search.

Advanced Search

Hidden under the Edit Menu and Find Sub-Menu is an advanced search dialog that can make use of the remote mail or news protocol to perform a full search instead of just a local search.   The Search Messages dialog provides it’s own search interface as well as it’s own results view directly below the search.  While the Search Messages dialog provides some more advanced search methods over the quick search it’s hard to find and difficult to use effectively.

The Search Messages dialog allows for complex search queries to be built with multiple search terms composed of a number of different field type selectors.  The queries require a lot of input from the user because of the tight structure used to create them.  The same search and results interface code is used for creating mail filters.

Edit -> Find -> Search Messages…

Advanced Search Dialog

What do we want?

I was lucky enough to chat with Andrew Gilmartin yesterday and he framed a future goal very well.  “We’re not looking to make search an added feature box on the side of Thunderbird“, we’re looking to make search the definitive method for viewing mail.

What does “Search as the definitive viewing method for your mail” mean?  That’s a good question and I’m not sure exactly what a good answer is yet. A search would help you find the message you’re looking for, and perhaps a search view never lets you lose that message in the first place.  There’s a lot to explore.

Here are two important pieces of a search system and view that need to be examined and somehow exposed in the interface.

Search and Filter

An impediment of the current search system is requiring people to choose a search type (Subject or Sender) before they even enter any text.  To help people hunt for the correct item you want to allow for starting their search very broad and then allow them to narrow down that broad search with filters like subject or sender.

The current search system has some speed issues that likely prevented a broad to filter system of searching to be implemented.  The mail client Mail.app provides a decent filter bar when searching mail that allows people to see what the current filters are (folder, account) and change them.

Browse and Filter

The SEEK extension is an excellent example of how offering a system of browsing mail by grouped attributes from the start can help people find the item or group of items they were looking for.  Instead of starting with a search term you give the person a list of attributes they might use to filter the list of messages.

An inspiring system for a similar searching, browsing, and filtering methods is things, you should try it if you haven’t already.

Getting What we Want

Moving towards a new search based paradigm will take some adventurous steps and it’s important not to disturb current usage while making those steps.  Here are a number of changes to look at making.

Merging Search Interfaces

Each of the two current search interfaces provide some needed features and capabilities, however having two separate interfaces for searching is confusing and difficult to understand.  We need to combine the ability to do a quick search with the ability to perform a full search into a single interface with an improved results view.

With a single search interface Thunderbird will be searching the local and remote mail (like IMAP) at the same time.  However local results will be listing quickly and remote results will likely take a little more time.  Both sets of results, local and remote, can be merged into the same search results view by showing local results instantly and filling in remote results as they arrive.

Offline Cached and Indexed Mail

In order to have a fast search system even while offline Thunderbird needs to do a much better job of caching and indexing mail as it encounters it.  With new messages instantly cached and indexed they can be made available to search queries, filters, and views immediately.

This is an excellent time to start thinking about the data mining mail in a way that helps searching messages later.  It’s also time to think about making the defaults tuned towards offline usage while still allowing people to control online / offline caching.

Auto Complete

With mail data indexed locally and quickly available Thunderbird should be able to provide a slick and fun auto-complete on search terms it knows about.   Auto complete when searching for items you’re already aware exists helps with miss-spelling errors and more complete matching.  The awesomebar shows how with just a little broken memory of a title or url you can easily find the page you saw once before.

Fetching Results

Our current drive is to investigate some indexing on messages (at least subjects), pull the new auto-complete into Thunderbird, and get a search bar using that fancy auto-complete on mail subjects and hopefully the addition of a couple more fun things.  Leave some comments or jump on the newsgroup to participate.

Search Yesterday and Attachments

A wire frame of a possible mail search auto-complete

aboot

This is the blog personality of Bryan Clark. I'm a designer in a world of open source. This blog reflects mostly writing about Design, Open Source, Economics, Beer, Wine, and Dogs. There's more information about me on this site or you can contact me directly at clarkbw@gmail.com.

scategories