Stampy

Dev Blog

Paperless Post Tech Talks: John Resig

Please join us as we welcome John Resig as our latest speaker in our Tech Talks series on Nov 5th at 7PM.

John is the Dean of Computer Science at Khan Academy and the creator of the jQuery JavaScript library. He’s also the author of the books Pro JavaScript Techniques and Secrets of the JavaScript Ninja.

John is a Visiting Researcher at Ritsumeikan University in Kyoto working on the study of Ukyo–e (Japanese Woodblock printing).  He has developed a comprehensive woodblock print database and image search engine located at: Ukiyo-e.org.

John will be talking about his passion for art and how it bridges with technology. It should be a great talk and we’re excited to have this be the first event in our new office in Lower Manhattan.

Security restrictions are fairly tight in the new office so you please RSVP and make sure to bring a photo ID to check in.

About Paperless Post Tech Talks

Paperless Post Tech Talks are our chance to bring developers and artists whom we admire into our New York office to talk and share knowledge about their work with our team and the larger New York tech community. They are free and open to the public but RSVP is required as space is limited. There is always time set aside to talk and mingle with other attendees and the speaker over craft beer and snacks.

How We Networked 4 Years of Accounts Using Just the Tools at Hand

If you’ve looked at your account dashboard recently, you might have seen a list that looks something like this:

This is a list of everyone you’ve attended events with and how many times you’ve done so. What follows is a story of how we made this possible, and some of the decisions and changes we made along the way. The feature begins from a desire to give users new ways to use their accounts, and to expose the network that already existed between our users.

The first part of the process was to shift the way we think about our accounts. Previously, someone had an “account” when they registered for the site. The reality, however, is that most users on the site are already some kind of account, whether they are logged in to create and send a card or viewing as a guest. Even if someone has never registered on the site they have a history: past events, an email address, probably a name given by a host, and connections to other people. We realized it would be helpful to have a single unique identifier for each of these users, regardless of their state. Since we would have to make an account when someone registered anyway, we decided to expand our definition of account to have an idea of registered vs. unregistered.

When we started, we had a little over 40 million account-like items we wanted to turn into accounts, about ~7% of which were already registered accounts. What weren’t already accounts were mainly email addresses tied to people’s address books or guests on events. We added a registered flag to our accounts, and inserted the 40 million accounts in about 19 hours (that’s about 585 accounts per second). The speed of this script was greatly helped by SQL’s batch inserts, allowing us to get a group of account ids that we then associated to rows in other tables.

Once we had all the accounts associated with email addresses and guests, we could focus on how we were going to combine all the data. We discussed three main ideas for how to go about doing this: a separate application, a table in our main database, and a separate database in our main application. Whatever we decided upon would be responsible for serving links between accounts, keeping the links up to date, and being autonomous of our main application. In terms of autonomy, we wanted our main application’s availability to be unaffected by the status of this new service. In other words, if the links database goes down or we decide to bring it down, we should be able to do that without affecting the rest of our applications.

Option 1: Separate application

We love Go, and discussed the idea of writing an application dedicated to these responsibilites to live in our universe. The advantages of this approach are particularly in the autonomy. Because everything would be served over HTTP, a failed request would be easy to catch and handle, and all the application’s resources would be separated from our main application. However, the HTTP requirement would have been a restriction in both speed and how much data we could feasibly pass at once. Given the tone of this paragraph and the title of this post, you have probably already guessed that we did not choose this option.

Option 2: Same application, same database

This is kind of the “easy option.” From a development perspective, adding some classes and a new table is an easy task, and having it all in the same application makes things as fast as the code you write. However, we expected this table to be very large initially, and grow very quickly (about 80 million new links per month). The fact that this would live in our database paired with the total loss of autonomy made this option undesirable.

Option 3: Same application, different database

This option ended up being sort of the best of the first 2. Rails makes it very easy to have a second database connected to the application, and have models that live in that database. Performance-wise, we are able to communicate with a local SQL database and leverage SQL’s aggregation strengths to make queries fast. Furthermore, the way we set it up, we can disable the link service without affecting the availability of our main application. Having an Active Record model also gives us the convenience of being able to use all the rails model methods, without having to re-write functionality like named scopes.

We have one main table for the links, which stores two accounts, an event and the type of link. In terms of the types of links, we store different relationships; attending an event with someone is different than attending an event with the person that invited you. In the future we will keep track of more link types as well, allowing people to see and use their data in even more ways, potentially being able to make lists and track their connections.

Links are indexed every hour by a cron job that picks up the last hour of events and links all the accounts accordingly. Again, this technique utilizes the infrastructure we already had in place, rather than requiring us to expand or create new things.

To backfill the four years worth of links, we essentially ran a few long running instances of the cron job, and disabled the normal schedule until everything was filled. In one week, we inserted about 169 million link rows. At that point, we enabled the cron job and things began indexing normally.

Once this was all done, it became easy to quickly produce the list above (it’s just one SQL query with a count of events and a group by account id). In just about a month, we were able to change some fundamental ways we view things in our application and introduce a new service, all with techniques and technologies that already existed in our application universe. SQL is immensely powerful, even when faced with hundreds of millions of entries (at time of writing we have close to 190 million links), and leveraging both its strengths and the strengths of Rails, we were able to produce an elegant, fairly simple solution that is accessible to the whole development team and as easy to maintain as the rest of our code.

Write/speak/code: A Lesson in Having a Voice

Back in June, my manager asked me to write about my experience attending write/speak/code. write/speak/code is an immersive conference that aims to give women in tech a voice by giving them confidence, motivation, and actionable goals. Ironically, I felt hesitant to write this blog post–worrying I wouldn’t have enough to say–until I realized this insecurity was exactly what the conference was trying to cure.

write/speak/code was unlike any conference I have attended before. The entire group of attendees, volunteers, and staff fit in one large classroom. Each day of the conference had a project-driven agenda, all coming together under the theme of empowering women and helping them find their voice.

write >> draft an op-ed column

The first day was focused on voicing your opinions with the written word - specifically through op-ed columns. As a technologist, my initial reaction was a bit incredulous, but by the time The OpEd Project finished their program I was sold.

They started by presenting us with numbers: 10-20% of opinion pieces in traditional media and 33% in new media are authored by women. This is roughly on par with submissions, which is why they believe that increasing submissions is an effective and actionable way to increase contributions of women in the media.

To explain why that matters, they drove home their point with the successes of previous attendees. It turns out that writing an op-ed is one of the most accessible strategies for anyone wanting to increase their visibility. A successful op-ed in a local newsource or a popular blog could get you landed on HuffPo or the New York Times. TV news networks use published op-ed writers to recruit news commentators.

If you’re a woman or anyone who just wants to be heard, definitely go check out their website for more information about this wonderful organization.

speak >> as a group, write a talk proposal and present it

The thought of speaking in public puts chills down my spine, and I know I’m not alone. The Speak day of the conference was the most stressful day for many of us, but likely the most valuable. One day was definitely not enough time to become comfortable but it was a valuable chance to dip our toes in and get feedback. In fact, speaker after speaker told us “You don’t get less scared. You have to practice, be confident and your material, and just do it.”

I have a bit of experience with this myself. I once decided I wanted to conquer my fear of public speaking by giving a talk at Ignite! Seattle. The experience was a near disaster thanks to a slip-up with my slides but coming out the other end a survivor did wonders for my confidence. Even with the serious derailment of my talk I got tons of positive feedback and earned serious tech community rep. Which is all just to say that no matter how scared you are, do it. The rewards are worth it.

code >> contribute to an open source project

I was pretty nervous about the Code day. I consider myself a highly technical person but my coding skills are underdeveloped. As a QA Engineer at Paperless Post, I’m fortunate enough to work with an extremely talented test automation engineer so I can focus on finding bugs.

Open-source participation by women is low, and some of that is credited to women feeling like they don’t have the skill level to be of help. I was quickly assured that any contribution can be valuable, with or without a code submission. Encouraged, I decided to try my hand at We The People, the White House’s petition site. By the end of the day, I had managed to submit three bug reports and officially became an open source contributor.

takeaway

The ladies of write/speak/code deserve a lot of credit for putting together such a constructive, immersive conference. Self-censorship is a hard habit to break, but the personal implications are huge and I’m grateful for any encouragement to put myself out there. Hopefully, this blog post is just the beginning!

Ricon|East 2013

Earlier this week, MRB and I attended Ricon East not that far from Paperless HQ. It was advertised as a “Distributed Systems Conference” and that’s really what it was. It was an awesome experience; The talks were of an extremely high quality and the crowd was full of smart and friendly people. More than anything, I was overwhelmed with all the brilliance and made to feel like a distributed noob (Personally, I love this feeling and it left me with a strong desire to level up). There were a lot of great talks, but here were our favorites:

Call Me Maybe - Kyle Kingsbury on Partition Tolerance

Slides

Kyle did an insane amount of work on tooling specifically for this talk to simulate network partitions against a number of databases and then calculate data recovery/healing (or lack thereof). The talk highlighted some tightly held misconceptions and really emphasized how databases often ignore the P in CAP in favor of CA.

Bloom - Neil Conway

Slides

Neil presented the work that he and his colleagues are doing at UC Berkeley on creating a language for writing “disorderly distributed programs”. The culmination of the work is Bloom a language embedded in a Ruby gem DSL that allows easy construction of complex distributed programs. Wether or not Bloom is “production ready” is irrelevant - the beauty is in how simply it can render these normally very large distributed programs.

Smarter Caching - Neha Narula

Slides

Neha showed off Pequod, an intelligent caching system she has been developing with colleagues at MIT and Harvard. The central idea was moving the idea of joins from the DB layer up into the caching layer. With a Twitter-like system as an example, she showed how simple range GETs into Pequod could be used to fetch and collate data at the cache level without touching the database. Though Pequod is not open source yet, the concepts are very interesting and I’m sure well see similar implementations in OSS soon.

Just open a Socket - Sean Cribbs

Slides

Sean gave a detailed rundown of how he improved some common problems in TCP client libraries, specifically for the Riak Ruby client. The primary solution/idea was putting load balancing in the client itself. This spoke directly to me as we’ve been doing a bunch of work on this at PP and Sean’s ideas were novel and very straight-forward.

MORE!

There were so many mind blowing talks at Ricon, I feel bad that I’m not writing them all out. I’m sure Basho will publish the videos soon and we’ll all have a field day. My biggest takeaway from the two days was that, at least in this community, researchers and developers are really starting to work more closely together and the rate at which research is being implemented and integrated into production systems is speeding up. As an implementer, this makes me extremely excited. We have an insane thirst for ideas in this field, and the faster we can test those ideas, the sooner we can solve bigger problems.

When I Ship

Lets be clear that as a team we have many theme songs. They range from MOP to Boyz || Men to Culture club to …. I know of few that have struck such a cord (especially with the ops team) as Da Dip.

I gave a talk a couple of weeks ago where I tried to explain the methodology of Paperless Post’s culture of shipping. It is sung to the tune of Da Dip: when I ship you ship we ship. You can check out the slides and video below:

Automate or Die - Aaron Quint from devopsdays on Vimeo.

Also be sure to check out Bethany’s Talk about Sensu

I wanted to explain a little further about our process and why I think it’s been successful. You don’t have to look far to see or hear people talk about continuous delivery and deployment these days. While I fully believe in the tenants and real benefits of all of these practices and systems, I also feel like everyone is missing the point a bit. When I’ve talked to people at conferences, meetups, etc, a lot (definitely the majority in my very small and unscientific test group) are excited about the ideas and potential results of implementing any kind of automated deployment system but are frankly lost as to where to begin. Worse, the effort that has to go into building or implementing such a huge system seems insurmountable. I want to say that this shouldn’t be the case and I think we’ve proved that at Paperless Post.

Iterative Devops

The first assumption that is incorrect about these kind of projects is that it has to be all or nothing. We, and a lot of other teams, when working on our product are most comfortable and successful when taking an iterative approach to shipping features. We split big projects into small deployable steps, look for the 80% solution, and ship one thing at a time. Why then, on the Ops side are we obsessed with silver bullets? Why are we doing large rewrites and replacements of existing infrastructure? For our case we’ve been able to take the same iterative approach with our developer infrastructure. What I was trying to show in the presentation above was that we didn’t get to where we are today overnight. Rather, it’s taken years, and most importantly, each step was worthwhile and an improvement. We’ve been benefiting since the beginning with even the most simple layer of automation that we added years ago. Don’t wait for the ultimate solution.

Pick the right problems

Of course we all want the ‘push a button’ ideal of an infrastructure. One click deploys, scaling, beer delivery. However, as DevOps Engineers and Leaders that might not be what is actually needed now. While the end result of all that kind of automation can not be argued with, that doesn’t mean it’s the most important problem for your team to solve. At PP, we started with the whining.

It turned out that our developer setup process and our git workflow were actually the most difficult and slow parts of our daily process, and hence what was taking the majority of our dev and ops time. So instead of doing what we wanted first (one click deploys), we focused on the pain points one at a time and made them better. It’s important to remember what I see a the real goal of DevOps - to support the developers and the product they build. In the talk, I presented a ‘DevOps Oath’:

I, [YOU], Do solemnly swear to help developers solve their problems, quickly and without [too many] yak-shaves and make their lives easier, so they can ship code, so that the company can improve, so that we all can be successful. I will do what it takes to avoid repeated tasks, help unblock the blocked, and generally make things work better.

Every team is different

A big question I get whenever I talk about our devtools or other internal tools is ‘is this open source?’ To me that brings up a fundamental point of how I hope people can think about deploying automation into their workflow - every team is different. Even though a lot of people see this as a given, they have no problem grabbing and forcing tools and workflows from other companies and team into their own. This isn’t to say that OSS isn’t valuable and doesn’t have a place in Ops. I really believe, however, that in order to really have a process and tools that work, you need to start with how your team works organically, and then gently apply the tools to that process. Teams are living and breathing organisms, and applying solutions that are too abstract or worse too specific (but in the wrong direction) can be disastrous.

Keep moving

Our process and tools continue to evolve as we run into new blockers or new problems. We’re proud of how far we come, but we know that this is a moving target. Our key to success in this realm has been to just acknowledge that fact, and work with it.