DPE Lowdown - How Meta does Developer Productivity Engineering with Adam McCormick
Sintesi
TLDRThe session features insights from Adam McCormick of Meta discussing the intricacies of maintaining a high level of developer productivity within the tech giant, specifically for the Facebook Android app. Adam shares his experiences in productivity engineering, the role of tooling like Buck and custom versions of various development environments, and the systems in place to ensure timely and reliable releases. Key topics include the need for robust testing frameworks, the handling of crashes, and the emphasis on developer happiness as a metric of productivity success. The integration of internal tools to streamline processes, the importance of data-driven decision-making, and Meta's unique approach to developer experience are highlighted.
Punti di forza
- 🏡 Adam recently moved just north of Seattle.
- 🌧️ The weather in Seattle is often rainy but beautiful.
- 💻 Meta's team size for productivity engineering is around 1,000 people.
- 📈 Developer happiness is a key focus for improving productivity.
- 📅 Major updates for Facebook are released weekly, with minor updates even more frequently.
- 🔍 The Sapiens system helps simulate human users for testing.
- 🛠️ Meta uses a customized Android Studio and tools like Buck for builds.
- 🔄 Rollbacks are handled through a system called unlanding.
- 📊 Metrics are constantly monitored to drive improvements in developer experience.
- 💪 Developers at Meta are viewed as key partners in building scalable products.
Linea temporale
- 00:00:00 - 00:05:00
In the opening segment, Adam shares his new living situation and discusses the rainy weather in Seattle, while reminiscing about past events and introductions made through Jay Zimmerman at various conferences. Justin highlights the purpose of the webcast, which is centered around productivity engineering and the experiences shared by Adam McCormick from Meta. They emphasize the need for enhancing developer productivity, specifically in large systems such as Facebook with nearly 3 billion users.
- 00:05:00 - 00:10:00
Adam describes his role on the Facebook app reliability engineering team, explaining how they work to minimize crashes for a massive user base. He explains the complexity of maintaining reliability at scale and the importance of getting code quality assurance integrated into the development process. The team collaborates with multiple product engineers to establish best practices and investigate issues as they arise, particularly for Android applications.
- 00:10:00 - 00:15:00
The discussion shifts to how frequently and the nature of app releases at Facebook. Adam details their build and release processes, including the number of builds they manage daily and the stages of releasing major and minor updates. With about 30 to 50 thousand builds created every day, their approach allows for better deployment and testing of all changes, emphasizing the need for a reliable infrastructure to manage such scales.
- 00:15:00 - 00:20:00
Adam clarifies the structure and extent of the productivity engineering group at Facebook, mentioning that they have about 1,000 dedicated engineers within a larger workforce, facilitating developers through tools, testing, and environment optimization. The complexity of the monorepo approach is discussed, highlighting how it contributes to the challenge of handling such large code bases and the subsequent investments made to efficiently manage them.
- 00:20:00 - 00:25:00
The focus is on the daily experiences of Meta's software engineers as they work on the Android app. Adam discusses the tools and customized systems they have in place, such as Android Studio and their build systems, to streamline coding, testing, and CI/CD processes, showcasing the heavy reliance on simulations and tests to catch issues before reaching broader audiences.
- 00:25:00 - 00:30:00
In this segment, Adam emphasizes the significance of productivity engineering as part of their success in software development and shipping at scale, underscoring that it's not merely about adding more engineers but about investing in structured approaches and solutions that improve overall productivity.
- 00:30:00 - 00:35:00
They delve into how Meta measures developer happiness and productivity, sharing that they conduct regular surveys and micro-check-ins to gather feedback and analyze the impact of changes on developer efficiency. They also appreciate the importance of consent in data collection, ensuring developers can opt out of specific productivity tools if desired.
- 00:35:00 - 00:40:00
Adam shares insights into notable productivity engineering wins at Meta, explaining how they have standardized processes to reduce the complexity of coding and analysis through consolidated tools and services that streamline common tasks, enhancing overall developer experience.
- 00:40:00 - 00:45:00
The conversation touches upon employee feedback regarding potential pitfalls of excessive measurement leading to micromanagement perceptions. Adam notes that they focus on the impact of tooling and systems on productivity without imposing numerical goals on individual engineers, emphasizing that the goal is to foster a culture of effective and happy developers.
- 00:45:00 - 00:59:31
Audience questions are covered, including one about build and test acceleration technologies at Meta. Adam explains their customized build systems that allow for rapid incremental builds by caching and optimizing compilation processes to ensure code quality while maintaining performance and minimizing resource waste. He highlights the conscious effort to reduce both financial and ecological impacts while still pushing forward with advancement.
Mappa mentale
Video Domande e Risposte
What is the role of productivity engineering at Meta?
Productivity engineering at Meta focuses on facilitating developers in releasing code efficiently, improving productivity, and ensuring high quality.
How often does Meta release updates for the Facebook app on Android?
Meta releases major updates weekly and minor revisions multiple times a day, depending on the urgency of fixes.
What tools does Meta use for building and testing its code?
Meta uses a customized version of Android Studio, a system called Buck for builds, and maintains a large monorepo for code management.
How does Meta ensure product reliability during rapid development cycles?
Meta employs a number of measures, including extensive automated testing, monitoring user metrics, and a simulated user testing system called Sapiens.
What is the significance of the developer experience in productivity engineering?
A positive developer experience leads to happier, more productive engineers who can focus on building rather than dealing with distractions.
How does Meta handle rollbacks or reversions in their coding process?
Meta uses a system called unlanding, where a revert commit is created to reverse changes, with automated tools aiding in this process.
What types of engineering roles does Meta seek to fill?
Meta seeks Java, C, and Swift developers, emphasizing that coding for mobile is not the sole focus but rather part of a larger engineering perspective.
How does Meta measure developer happiness and productivity?
Meta uses a survey called 'Pulse' to gauge developer satisfaction, alongside monitoring various metrics related to code quality and production impacts.
Visualizza altre sintesi video
How I Make $29,361 Per Month Using This SIMPLE Swing Trading Strategy (PROOF)
SEMUA SIYOK HARI INI! ADE RAI EMOSI.. VIDI KETAKUTAN.. EBEL MINTA MAAP!! 🤣🤣
Goes To Campus 2 (semarang, surabaya, & malang) | #campusvisit
8 Ruthless Mindset Shifts to Get Rich in 2025
1959, the Year that Changed Jazz
Create VLAN's + enable Port Security Cisco Switch
- 00:00:03[Music]
- 00:00:06adam where are you dialing in from today
- 00:00:09so i am in my brand new living room uh
- 00:00:12we just moved house so
- 00:00:14uh congratulations 15 15 miles north of
- 00:00:17seattle and how's the weather in seattle
- 00:00:20so it's a rainy day but on a rainy day
- 00:00:22in seattle around dawn it looks like
- 00:00:23this behind me this is actually taken
- 00:00:25from the transit center near my house
- 00:00:28beautiful like it's this isn't like some
- 00:00:29fancy this isn't like some fancy
- 00:00:32downloaded background this is just a
- 00:00:33cell phone snap
- 00:00:35so
- 00:00:36i i love the story of of how you all met
- 00:00:39while everybody still uh joins here
- 00:00:41we're seeing the participants kind of
- 00:00:44trickle in
- 00:00:45i can i can fill in a little bit of
- 00:00:47backstory there
- 00:00:48uh
- 00:00:49so i've been going to
- 00:00:51events for with no fluff just stuff
- 00:00:54with uh jay zimmerman great great series
- 00:00:56by the way
- 00:00:57uh but i've been going to that for
- 00:01:00like a decade at this point um from back
- 00:01:03when they were a little one show game in
- 00:01:05uh denver
- 00:01:07but so i've been going to these forever
- 00:01:09and i started going to arch comp when it
- 00:01:11moved to florida
- 00:01:12uh
- 00:01:14i don't know what five years ago
- 00:01:15something like that
- 00:01:16uh but so
- 00:01:18we're all we're all hanging out after
- 00:01:20one of the after one of the days
- 00:01:22and jay decides he needs he really needs
- 00:01:25justin to meet me
- 00:01:27so he's a good lady around his bar
- 00:01:30and we're and i'm standing there we've
- 00:01:32been chatting with a bunch of the other
- 00:01:34presenters and and just introduced
- 00:01:36himself
- 00:01:37um and i don't think he thought a lot
- 00:01:39was going to come of it we just chatted
- 00:01:41you know people you know people at a
- 00:01:42conference you're always nice to
- 00:01:43everybody
- 00:01:45but then justin
- 00:01:46i was in one of his sessions later
- 00:01:48and
- 00:01:50i i kept i kept raising my hand like
- 00:01:52every time he wanted input because
- 00:01:54that's kind of who i am i don't know
- 00:01:56just if you want to if you want to talk
- 00:01:57to that a little bit no i mean i think
- 00:01:59you i think you know i mean it was it
- 00:02:00was great it was one of these things
- 00:02:01where we're having the way that you
- 00:02:03really want a talk to go at one of these
- 00:02:05sessions where it's not just you with
- 00:02:08slides talking at the audience but you
- 00:02:09really got participation and uh and adam
- 00:02:12was was like yeah you know into the
- 00:02:14conversation adding a lot of good like
- 00:02:16detail and color but at one point i was
- 00:02:18like wait a minute who are you and where
- 00:02:19do you come from
- 00:02:21and we got kind of more into this you
- 00:02:23know real discussion about the work that
- 00:02:26that adam's been doing for meta
- 00:02:28specifically and um and how that related
- 00:02:30so closely to what we were talking about
- 00:02:32with with productivity and it was like
- 00:02:34you've gotta you gotta come on our
- 00:02:36loadout and so here we are
- 00:02:38and i and i remember uh justin messaging
- 00:02:41me and saying hey i met this gentleman
- 00:02:43we gotta have him so i think he was
- 00:02:46standing in the room at the time and
- 00:02:48he's he's slacking back and forth with
- 00:02:50you while people are like coming up
- 00:02:51after the presentation and you know
- 00:02:53getting swag
- 00:02:55we had some gradle stickers and and a
- 00:02:57couple cool shirts
- 00:02:59that's kind of something
- 00:03:01yeah so we're five minutes after we
- 00:03:03should get started uh and thanks
- 00:03:05everyone for joining uh ruse and justin
- 00:03:08from gradle on the develop productivity
- 00:03:11engineering webcast you know why are we
- 00:03:14here just want to spend a few seconds
- 00:03:16there right from gradle side right we
- 00:03:19wanted to share
- 00:03:21with the whole world about productivity
- 00:03:23engineering
- 00:03:25our wins findings experiences we want
- 00:03:28you all to invest and develop
- 00:03:31productivity engineering developer
- 00:03:32experience
- 00:03:34and why that's important
- 00:03:36right and today we have adam mccormick
- 00:03:39works on the facebook mobile
- 00:03:42reliability engineering team and they
- 00:03:45essentially have one of the hardest
- 00:03:46developer productivity engineering
- 00:03:48problems in developer experience
- 00:03:50problems on the planet uh with their you
- 00:03:53know just facebook alone like almost
- 00:03:55three billion active users right and so
- 00:03:59it's and it's no secret that how fast
- 00:04:02facebook is able to
- 00:04:05innovate and ship software at scale it's
- 00:04:07really hard when you get to that size
- 00:04:10and and that's one of the reasons why
- 00:04:12everyone should invest in productivity
- 00:04:14engineering developer experience to keep
- 00:04:15your top talent and keep your top
- 00:04:17engineers more productive and that's why
- 00:04:19we're here today wanted to share that
- 00:04:21with the whole world
- 00:04:23thank you adam uh for joining
- 00:04:26there's also a gradle community
- 00:04:29slack workspace it's open anybody can go
- 00:04:32there there's a developer productivity
- 00:04:34channel in there
- 00:04:35um we're going to take we're not going
- 00:04:37to get to all the questions right a
- 00:04:40super hot topic everything that we were
- 00:04:42chatting yesterday with that of
- 00:04:44everything they talk about it's gonna be
- 00:04:45super interesting there's gonna be a lot
- 00:04:46of a lot of questions there so we're
- 00:04:48gonna try to
- 00:04:49take adam into that channel and try to
- 00:04:52get him to answer as much as he can
- 00:04:54right you know it's uh you know no no
- 00:04:57promises already given us an hour which
- 00:04:59is more than we can expect but anyway
- 00:05:02adam why don't you uh tell us a little
- 00:05:04bit about what you're working on
- 00:05:08and your team
- 00:05:10at uh at meta
- 00:05:13well sure uh so i'm sure everybody's
- 00:05:15familiar with meta or as people used to
- 00:05:17call it facebook
- 00:05:19uh i work on the facebook app
- 00:05:21reliability and engineering team what
- 00:05:24our big goal is is to minimize the
- 00:05:26number of users who experience crashes
- 00:05:28in a given day
- 00:05:30and that sounds like a simple task but
- 00:05:33when you think that across facebook
- 00:05:35we've got 3.8 billion users uh you have
- 00:05:38to do that at humanity scale uh and so
- 00:05:41that's a question of
- 00:05:44working with a lot of
- 00:05:45different product teams working with a
- 00:05:47whole lot of data
- 00:05:49and trying to optimize build in that
- 00:05:52kind of reliability and
- 00:05:55good practice before the code ever
- 00:05:58reaches the point where it's got to be
- 00:06:00reviewed like we don't have the the
- 00:06:02luxury of
- 00:06:04a lot of soap time or long release
- 00:06:06cycles
- 00:06:07so we we live out we live in that um
- 00:06:10world where you have to
- 00:06:12build in quality my team specifically
- 00:06:15spans a lot of products
- 00:06:17and what we do is we work with our
- 00:06:19product engineers to
- 00:06:21establish those real
- 00:06:23best practices and to help them
- 00:06:25investigate when they mess something up
- 00:06:27part of moving fast which is something
- 00:06:29that we pride ourselves on
- 00:06:31is that you're going to make mistakes
- 00:06:32and so a big thing we do is try to help
- 00:06:35you figure out where you've made
- 00:06:36mistakes
- 00:06:38and then avoid them in the in the future
- 00:06:41i work specifically on android
- 00:06:44for the facebook app
- 00:06:46but our team is kind of the the masthead
- 00:06:49of the reliability organizations so we
- 00:06:52work heavily with the reliability groups
- 00:06:54in instagram in whatsapp in messenger in
- 00:06:58oculus to try and share a lot of that
- 00:07:01knowledge because the facebook app
- 00:07:03people maybe maybe don't realize that
- 00:07:05facebook apps the most complex android
- 00:07:07app there is
- 00:07:08period
- 00:07:10there isn't a more complex world to work
- 00:07:13in in terms of the android space and so
- 00:07:16it takes a certain amount of
- 00:07:19hard work to get to the point where
- 00:07:20you're where we can even release as fast
- 00:07:22as we do
- 00:07:23i was reading one of our dev blog posts
- 00:07:25the other day and it and we used to
- 00:07:28release monthly and it took the better
- 00:07:30part of years
- 00:07:32to get us to the point where we can
- 00:07:34release
- 00:07:35every week
- 00:07:36and then we start butting up against the
- 00:07:38actual infrastructure problems of our
- 00:07:40release systems where google and the
- 00:07:42oems and all the devices won't update
- 00:07:45faster than that
- 00:07:46like there's there's a point at which
- 00:07:47we've pushed those needles as far as we
- 00:07:49can possibly push them
- 00:07:51and that owes to a lot of investment
- 00:07:54when it comes to productivity and you
- 00:07:57know dev
- 00:07:59the actual dev life cycle great
- 00:08:01introduction uh
- 00:08:03one one question about you know just
- 00:08:05releasing when you talk about release
- 00:08:07every week are we talking about major
- 00:08:08releases are we talking about minor
- 00:08:10releases you know how how big are these
- 00:08:13releases well let's put this way uh
- 00:08:16on android alone
- 00:08:18we do between 30 and 50 000 builds a day
- 00:08:22and that's not an exaggeration that's a
- 00:08:24solid number from two years ago it's
- 00:08:26probably more now
- 00:08:27uh
- 00:08:28the
- 00:08:30we do about that many builds
- 00:08:33every particular every particular change
- 00:08:36what we call diff
- 00:08:37to our system
- 00:08:39gets between three and five of those
- 00:08:40builds on its way into the master branch
- 00:08:43we block them at diff time we block them
- 00:08:45at land time and then we roll them out
- 00:08:48slowly into a master branch which we
- 00:08:50then cut
- 00:08:51alpha beta and prod from
- 00:08:55so
- 00:08:55over the course of a week a particular
- 00:08:58revision
- 00:08:59will go into an alpha which goes to a
- 00:09:02lot of
- 00:09:03internal users and a huge automated
- 00:09:06testing community called sapiens
- 00:09:08which is sort of simulated humans
- 00:09:11we do a brick we do a beta cut and then
- 00:09:13that sits for beta for one week
- 00:09:16and then we push to our stores
- 00:09:18that means we push to a bunch of the oem
- 00:09:21stores where they pick up the actual
- 00:09:24newest release and that goes directly
- 00:09:26into their devices that they're selling
- 00:09:28on a given week
- 00:09:30it goes into the google play store and
- 00:09:31it goes into a system we call oxygen
- 00:09:34which is how we work with
- 00:09:36uh android devices that don't have
- 00:09:38access to google play services things
- 00:09:40like amazon devices
- 00:09:42and that lets us roll out those new
- 00:09:44updates once a week
- 00:09:46most oems won't let you update more than
- 00:09:48once every seven days so we're really
- 00:09:50talking about what we look at as a major
- 00:09:52revision
- 00:09:54every week
- 00:09:55minor revisions
- 00:09:57once a day to two or three times a day
- 00:10:00in alpha
- 00:10:01and then up to four and five times a
- 00:10:03week in beta as we discover
- 00:10:06major things that we won't don't want to
- 00:10:08release without getting side events
- 00:10:10and our team is hands-on through that
- 00:10:12entire process we're the ones that
- 00:10:14govern all of the check-ins that want to
- 00:10:16go into beta and all of the hot fixes
- 00:10:20should you ever need to hotfix prod
- 00:10:23and that that's a very hands-on process
- 00:10:25and it requires us to know
- 00:10:27the entire universe of our app
- 00:10:30to a pretty to a pretty deep extent
- 00:10:33and that's
- 00:10:34thanks for sharing that by the way and
- 00:10:36that's a great segue kind of to the next
- 00:10:38topic is
- 00:10:40to
- 00:10:41ship at that scale that frequently with
- 00:10:44that those types of features and
- 00:10:46cadences and whatnot you need support
- 00:10:48you know it's not just it's you can't
- 00:10:50just throw thousands of engineers at it
- 00:10:54tell us about productivity engineering
- 00:10:57uh reliability engineering and developer
- 00:11:00experience kind of orgs behind it
- 00:11:03supporting all these developers to
- 00:11:05getting them to be that successful what
- 00:11:07kind of investment are we talking about
- 00:11:10how many folks are in those
- 00:11:11organizations right and part of what i
- 00:11:13want to share with the community is hey
- 00:11:17there's a reason why facebook can ship
- 00:11:19at this scale
- 00:11:21and innovate this fast you can't just do
- 00:11:24it with just throwing a thousand
- 00:11:25engineers at it you need this investment
- 00:11:27in productivity engineering what are we
- 00:11:29looking at there from here so how is
- 00:11:31so it's actually how is it structured
- 00:11:33how many people are in it how does that
- 00:11:34work at your organization so it's
- 00:11:36actually funny that you say throw a
- 00:11:37thousand engineers at it because that's
- 00:11:39about the size of our dedicated
- 00:11:42productivity engineering group
- 00:11:45and i won't go into exact numbers
- 00:11:46because it fluctuates quite a lot but
- 00:11:48within our infrastructure org which is
- 00:11:5030 000 of our 100 000 employees
- 00:11:53there's a thousand people who we
- 00:11:55dedicate just to facilitating
- 00:11:58developers releasing code that's
- 00:12:00everything from tools that we use for
- 00:12:02testing to our mod our internal
- 00:12:05modifications to all of our development
- 00:12:07environments uh it's sort of an
- 00:12:09interesting world because we run a huge
- 00:12:12monorepo
- 00:12:13we run a world where all the code for
- 00:12:16facebook all of the code for messenger
- 00:12:18all the code for instagram
- 00:12:20and oculus and whatsapp and
- 00:12:22a hundred other little products that
- 00:12:24you've probably never heard of
- 00:12:26all live in this one gigantic repo
- 00:12:29right there's no world in which you can
- 00:12:31build all of that or even download all
- 00:12:33of that
- 00:12:34on most machines hundreds and hundreds
- 00:12:36of gigabytes of code
- 00:12:39right so the kinds of investments that
- 00:12:41we've had to make are everything from
- 00:12:43rewriting our source control system
- 00:12:47because it couldn't because git can't
- 00:12:49handle
- 00:12:50it just cannot handle a repo the size of
- 00:12:53ours working with open source groups in
- 00:12:55this case mercurial
- 00:12:57to fundamentally rebuild
- 00:12:59the source control systems that we use
- 00:13:01and integrate with our internal build
- 00:13:03system a system called buck that again
- 00:13:06we had to rework because of the sheer
- 00:13:08scale of what we're producing
- 00:13:10and then all of these servers and
- 00:13:12clients and systems
- 00:13:15that make it so that you can reliably
- 00:13:17build code
- 00:13:19on your local system or even on what we
- 00:13:21call on-demand servers which are
- 00:13:23pre-warmed images that then you don't
- 00:13:26have to download a lot of code onto you
- 00:13:29start them up and they build
- 00:13:32and
- 00:13:32all of that is is
- 00:13:36a huge investment that nobody else has
- 00:13:38to make at some point
- 00:13:41we have
- 00:13:42a dedicated group who does nothing but
- 00:13:44keep our system
- 00:13:46running on what we call stable which
- 00:13:48follows master by something like two
- 00:13:50hours
- 00:13:51and make sure that we are generating all
- 00:13:53of the various artifacts that go with
- 00:13:55that
- 00:13:56so that when a
- 00:13:57typical dev wants to do a build we're
- 00:14:00only building the very very small parts
- 00:14:02of the system that have to be rebuilt
- 00:14:05for them and all of those other
- 00:14:07artifacts are coming down
- 00:14:09at 50 000 builds a day it is not
- 00:14:12possible for you to follow master as
- 00:14:14yourself you can't constantly keep up
- 00:14:17with master
- 00:14:18you can keep up with stable usually
- 00:14:20because you know up to two hours lag
- 00:14:24but even then you're never going to be
- 00:14:26able to rebase onto master and hope to
- 00:14:28ever merge it
- 00:14:29so we have to have systems that do these
- 00:14:31things automatically that do
- 00:14:34merges when it's possible without you
- 00:14:36having to intervene we have to have
- 00:14:37automated testing we have to know
- 00:14:39exactly how much of the code you touched
- 00:14:42and which artifacts have to be changed
- 00:14:44we have to know
- 00:14:45all of these things just to be able to
- 00:14:47check in one piece of code
- 00:14:50now you add to that that we've got
- 00:14:52thousands and thousands of developers
- 00:14:53all working simultaneously in the same
- 00:14:55repo
- 00:14:57and all moving master even when it's not
- 00:14:59your group or your uh specific piece of
- 00:15:02the baseline
- 00:15:04and the investment involved ends up
- 00:15:06being the full-time job of whole orgs of
- 00:15:09people
- 00:15:11we've got direct we've got director
- 00:15:12level
- 00:15:14uh folks who are
- 00:15:16solely responsible for organizing those
- 00:15:18efforts and there's a reason we put it
- 00:15:20into our infrastructure org
- 00:15:22the 30 000 people in our infrastructure
- 00:15:24or all have the same goal
- 00:15:26and that is to facilitate
- 00:15:29everybody who's building facebook to
- 00:15:32deliver facebook and not just facebook
- 00:15:34but whatsapp and messenger and instagram
- 00:15:37and everything under the meta
- 00:15:39umbrella
- 00:15:42that's a that's 30 of our workforce
- 00:15:45is completely dedicated to making
- 00:15:48everybody else successful it's not one
- 00:15:51guy it's not five guys
- 00:15:53it's not just everybody kind of does a
- 00:15:55piece of it and all the developers have
- 00:15:57to know what they're doing it's
- 00:15:59dedicated workforce
- 00:16:01nice no that's a
- 00:16:03great kind of overview of
- 00:16:05what it takes right and of that scale
- 00:16:09right uh
- 00:16:10you have to build your own in many cases
- 00:16:13to because it hasn't been done before um
- 00:16:16we're getting some good questions justin
- 00:16:18around
- 00:16:19uh building and testing before we get to
- 00:16:23those maybe we jump into this next topic
- 00:16:25of
- 00:16:26wha what's it like in the day in the
- 00:16:28life of a meta software engineer working
- 00:16:30on the android uh the facebook android
- 00:16:32app right yeah they write code
- 00:16:35how do they write builds and tests cicd
- 00:16:39well so here's here's a
- 00:16:41let's take a section out of more of a
- 00:16:43product engineer side of things because
- 00:16:46on reliability on the reliability
- 00:16:48engineering side our day is atypical we
- 00:16:51are in we we do more investigation than
- 00:16:53we do writing code
- 00:16:55we write enough code but when you're
- 00:16:57going to look at when we're looking at
- 00:17:00delivering a piece of the system
- 00:17:03we have
- 00:17:04there's a couple different ways one is
- 00:17:06you can build the code locally you can
- 00:17:08write whatever changes you want deploy
- 00:17:10them to a phone
- 00:17:12and it works
- 00:17:14that that is
- 00:17:16facilitated by the fact that we have a
- 00:17:19customized version of android studio
- 00:17:21that lets you bring in all of these
- 00:17:24artifacts using buck
- 00:17:26we have dedicated configurations of that
- 00:17:30system that bring in all of our lint
- 00:17:32tools because normal linters can't
- 00:17:34handle our code that bring in our
- 00:17:36compilation tools that bring in all of
- 00:17:39our handwritten
- 00:17:41uh
- 00:17:43handwritten modifications of the ast
- 00:17:47and then also integrate with
- 00:17:50an emulator in a way that doesn't break
- 00:17:52facebook
- 00:17:54it's
- 00:17:55because facebook is such a complex
- 00:17:57application it's actually pretty
- 00:17:59difficult to run an emulator that will
- 00:18:01support facebook
- 00:18:03and not crash
- 00:18:04and we're and we're adding in the
- 00:18:06complexity of a bunch of debug
- 00:18:09information and non-optimized builds and
- 00:18:11not minimized code size and that pushes
- 00:18:14the boundaries of what a lot of devices
- 00:18:16even new devices are really capable of
- 00:18:18running
- 00:18:20so as a dev you might
- 00:18:22take a take a new feature you're trying
- 00:18:23to build
- 00:18:24you might build that locally or you
- 00:18:26might check out what we call an
- 00:18:27on-demand server
- 00:18:29an on-demand server is basically
- 00:18:31pre-warmed code
- 00:18:33where you can do whatever development
- 00:18:34you need either mostly in uh our again
- 00:18:38customized version of vs code
- 00:18:41that
- 00:18:42you would then build whatever you need
- 00:18:44to do be able to run your test locally
- 00:18:46run as run your android in server mode
- 00:18:50and then run whatever tests against it
- 00:18:52we have a whole end-to-end testing
- 00:18:53framework wherein any individual test
- 00:18:56can be run against your code or suites
- 00:18:58of tests
- 00:18:59on the server side without explicitly
- 00:19:01having to run an emulator
- 00:19:04because again there's just so much
- 00:19:06infrastructure if every individual had
- 00:19:08to run an emulator we'd have to buy
- 00:19:10everybody mac pros
- 00:19:11and that is you know not really feasible
- 00:19:14but while we're running them all on
- 00:19:16linux
- 00:19:17we can split off our relatively sizable
- 00:19:19infrastructure and all of these
- 00:19:22uh
- 00:19:23all of these these uh
- 00:19:25data warehouses and server
- 00:19:27infrastructures that we have and we can
- 00:19:29shard those out to individual devs now
- 00:19:32something that's interesting i used to
- 00:19:33work at amazon
- 00:19:34uh at amazon every team has to own their
- 00:19:37own machines right you have to know
- 00:19:40about each of your instances in each of
- 00:19:42your clusters and you have to set up
- 00:19:44your own pipelines at facebook
- 00:19:47there's the trade-off of well you don't
- 00:19:48get as much control as you get when you
- 00:19:51own all of your own machines
- 00:19:53but in exchange for that you don't have
- 00:19:54to worry about it if i want a new if i i
- 00:19:57need to work on three projects at once i
- 00:19:59can pull three individual
- 00:20:02on-demand instances
- 00:20:04and they'll get pre-warmed either with a
- 00:20:06diff in progress uh diff being what we
- 00:20:08use
- 00:20:09to talk about a change request or a code
- 00:20:13review
- 00:20:15in progress
- 00:20:16i can pull a diff down i can run it in
- 00:20:18and on demand i can put doesn't matter
- 00:20:20whether it was mine or not
- 00:20:22and that takes you know two three
- 00:20:24minutes as opposed to the hours that it
- 00:20:27takes to download our entire baseline
- 00:20:29when i was in boot camp the
- 00:20:32first five weeks when you're employed at
- 00:20:34facebook is kind of a boot camp to get
- 00:20:36you familiar with all of this tech
- 00:20:38uh it took me eight hours
- 00:20:41to download and compile the android
- 00:20:44baseline that's because i didn't know
- 00:20:46what i was doing and i wasn't leveraging
- 00:20:48any of this
- 00:20:49infrastructure but eight solid hours
- 00:20:52doing nothing but downloading and
- 00:20:54compiling
- 00:20:55where when we compile with the normal
- 00:20:57system five ten minutes
- 00:21:00and you've got it and you've got a
- 00:21:01working system when you pull from a bit
- 00:21:03from the stable baseline
- 00:21:05if you pull it raw
- 00:21:06zero you press go and it just comes up
- 00:21:10because if you don't have changes we
- 00:21:12don't have to rebuild and all those
- 00:21:13artifacts already exist right
- 00:21:15so you've you've built your code you've
- 00:21:17either built it locally and you've run
- 00:21:18it on a hand on your own device
- 00:21:21or you've built it on demand well so
- 00:21:23then you push it
- 00:21:24now we have a system that we call diffs
- 00:21:27uh again completely homegrown because
- 00:21:30everything we do is is just a little bit
- 00:21:33uh more complicated
- 00:21:36because sometimes we run what we call
- 00:21:38code mods and you can have thousands and
- 00:21:40thousands of files touched
- 00:21:41right so that breaks almost every online
- 00:21:46code revision system there is
- 00:21:48so we have our own which we call diffs
- 00:21:51and diffs then can be reviewed by any
- 00:21:54number of people
- 00:21:55um
- 00:21:56every piece of code that a dev touches
- 00:21:59gets reviewed
- 00:22:00uh and that it and we've actually added
- 00:22:02an additional step now where we call
- 00:22:04what we call final review where if you
- 00:22:06modify your code after it's been
- 00:22:08accepted
- 00:22:09you it has to be re-reviewed before it's
- 00:22:12considered okay to have in the baseline
- 00:22:15when you submit a diff
- 00:22:17in the background the infrastructure is
- 00:22:19already going to run a bunch of basic
- 00:22:21tests
- 00:22:22it's going to check linting standards
- 00:22:24it's going to check whether all the all
- 00:22:26of the products that you touched compile
- 00:22:28it's going to check whether you are
- 00:22:30significantly affecting various
- 00:22:32performance metrics and it's going to
- 00:22:34run a basic suite of tests
- 00:22:37then once you have approval and
- 00:22:39someone's reviewed all of your code and
- 00:22:40you've gone back and forth and you've
- 00:22:42done all of the all of the niceties of
- 00:22:44code of code review
- 00:22:47you go to land what we call land uh
- 00:22:50that's the process by which you go from
- 00:22:51a diff to a merge
- 00:22:53right
- 00:22:54and so we've got another more strenuous
- 00:22:57group of tests at that point where then
- 00:22:59we're doing a full compile and we're
- 00:23:01making sure you don't regress app size
- 00:23:02then we're doing
- 00:23:04smoke tests across our entire fleet of
- 00:23:06apps to make sure that you didn't break
- 00:23:08a bunch of other applications
- 00:23:12that's when we're making sure that you
- 00:23:14are significantly regressing production
- 00:23:16metrics as we push that into the into
- 00:23:19the master
- 00:23:20then you've landed that process can take
- 00:23:22anywhere from an hour to four hours and
- 00:23:24we have ways of making that faster
- 00:23:26when we have reason to when we're in the
- 00:23:28midst of a sev when we're trying to fix
- 00:23:30something or move very quickly
- 00:23:32but
- 00:23:33you got to think we're building all of
- 00:23:35this infrastructure and then we're
- 00:23:36running tests against it just like you'd
- 00:23:38run tests on an emulator
- 00:23:40right
- 00:23:41and at 50 000 of those a day
- 00:23:45that's a huge amount of processing power
- 00:23:48that we're throwing at the problem of
- 00:23:50speeding up
- 00:23:51this end to end diff to baseline
- 00:23:54now most of the time that diff to
- 00:23:56baseline is going to run around two
- 00:23:58hours
- 00:23:59but that means that for a
- 00:24:01modest change especially if your team
- 00:24:03already knew about it and the people who
- 00:24:05were affected by it already kind of knew
- 00:24:07it was coming
- 00:24:08you can post a diff
- 00:24:11make a chat make the change get it
- 00:24:13reviewed and deliver
- 00:24:15in two or three
- 00:24:16hours and then we start going live right
- 00:24:20so on the android side
- 00:24:22uh we go we we split out uh alpha
- 00:24:25revision at least once a day often two
- 00:24:27and three times a day depending on
- 00:24:30how
- 00:24:31much how broken the system is right now
- 00:24:33because again we move fast things break
- 00:24:36and we we try to make sure that the most
- 00:24:38recent version of the alpha build is
- 00:24:40working
- 00:24:41at least mostly working because it is
- 00:24:44alpha after all every week we cut a beta
- 00:24:48and then we cherry pick into that beta
- 00:24:50to fix things that are wrong with the
- 00:24:53beta
- 00:24:53and then a week after that we released a
- 00:24:55prod we cut the beta and the prod on the
- 00:24:57same day
- 00:24:59generally
- 00:25:00there are places where we've started
- 00:25:01short of that and the distance from beta
- 00:25:03to prod is
- 00:25:06product to product now but anywhere from
- 00:25:08three days to seven days
- 00:25:10and then once it's in prod that goes up
- 00:25:12to the store and we do a progressive
- 00:25:14rollout one percent 10 100
- 00:25:18now we have a breakdown in that it's not
- 00:25:20quite as simple as rolling all of
- 00:25:22facebook out but that's
- 00:25:24you know that's that's way down in the
- 00:25:26weeds
- 00:25:27but you can see at each of these steps
- 00:25:29it's always gradual it's always
- 00:25:32deliberate every single one of these
- 00:25:34releases is a little bit of a canary
- 00:25:37check the metrics a little bit more
- 00:25:39check all the metrics a little bit more
- 00:25:42check all the metrics
- 00:25:44and at every one of these steps every
- 00:25:45one of these branch cuts when we're
- 00:25:47going to prod
- 00:25:49we've got a lot of
- 00:25:51instrumentation we know exactly how
- 00:25:53often users are crashing we know exactly
- 00:25:57how many stalls there are how many cpu
- 00:25:59spins whether various top line product
- 00:26:02metrics are changing for these users
- 00:26:05and that's why it's critical to have
- 00:26:07these people who are participating in
- 00:26:08our betas and our alphas
- 00:26:10because otherwise you're guessing and we
- 00:26:13don't guess we really don't if we had no
- 00:26:15data we would we would not move
- 00:26:19every single week we have these meetings
- 00:26:20and they're entirely driven by all of
- 00:26:23these measurements and we
- 00:26:25we stick to them until we can just
- 00:26:28until we can actually explain a
- 00:26:31discrepancy we don't ship
- 00:26:33that's
- 00:26:34i mean it's uh
- 00:26:36that's scale i mean this is what you
- 00:26:38need these kind of processes in these
- 00:26:39tools one thing that resonated really
- 00:26:41well with me was when you mentioned hey
- 00:26:43when you first started you try to just
- 00:26:45download and compile this it took eight
- 00:26:47hours
- 00:26:48but then after with facebook's
- 00:26:51investment in these acceleration
- 00:26:53technologies and and tooling they're
- 00:26:56able to do it in five minutes which is
- 00:26:59you know why we brought you here today
- 00:27:01to say hey if you're trying to do things
- 00:27:03the normal way at that scale
- 00:27:06good luck shipping you know good luck
- 00:27:08trying to compete with facebook right
- 00:27:10without investment and in the crazy like
- 00:27:14hearing these stories and kind of just
- 00:27:15seeing it made real you know the
- 00:27:17techniques that you're using i loved
- 00:27:19hearing about that like kind of like
- 00:27:21slicing a little bit more canary at a
- 00:27:23time as opposed to like
- 00:27:25full canary it's like no no no we're
- 00:27:27just introducing a little bit it's
- 00:27:29almost like like like features kind of
- 00:27:31trickling
- 00:27:32um let's take can we take a quick break
- 00:27:34here and kind of hit a few points let's
- 00:27:36say we have so many justin i don't know
- 00:27:38what we're going to get to and we're not
- 00:27:40going to get to them yes um a specific
- 00:27:43one that came in was just a little bit
- 00:27:45more clarity around the sapient system
- 00:27:47that you mentioned uh just just to
- 00:27:49clarify you do mean that these are
- 00:27:51simulated human users yeah and and
- 00:27:53wondering what technology that's based
- 00:27:55on specifically is it based on monkey
- 00:27:57like the chaos well
- 00:27:58so uh and that's the thing is
- 00:28:01i'm sure that the name was originally a
- 00:28:03little bit of a play on chaos monkey
- 00:28:05there's a lot of there's a lot of
- 00:28:06cross-pollination between netflix and
- 00:28:08facebook and all the other
- 00:28:10manga companies
- 00:28:12but the big thing about sapiens right is
- 00:28:14that we have what we call qpl markers
- 00:28:16and qpal markers stands for quick
- 00:28:18performance logger has nothing to do
- 00:28:20with quick or performance anymore uh but
- 00:28:23the point is that what they do is they
- 00:28:24record human action
- 00:28:26when you interact with our application
- 00:28:28in various ways
- 00:28:30we anonymize that data
- 00:28:32and then we can aggregate
- 00:28:36who you how much of which parts of the
- 00:28:39app are used
- 00:28:41and so we take the most firing qpls and
- 00:28:43the biggest workflows
- 00:28:45and we've taught a simulated human
- 00:28:48system and this these aren't humans
- 00:28:50it is a
- 00:28:51effectively an end-to-end test
- 00:28:53that runs against an emulator
- 00:28:55but it's not scripted
- 00:28:59it's trying to tie together all of those
- 00:29:01qpl markers
- 00:29:03and it learns how to walk through them
- 00:29:05and simulate the way a human would use
- 00:29:08the application
- 00:29:10and by doing that we're able to take our
- 00:29:12alpha
- 00:29:14which is a very small group of users
- 00:29:16that's really employees
- 00:29:18uh who have opted to take this somewhat
- 00:29:20broken baseline that is
- 00:29:23really actually often quite broken
- 00:29:26and
- 00:29:29increase the number of users and the
- 00:29:31number of hours used and really we think
- 00:29:33of this in terms of the number of hours
- 00:29:35interacted with when we're talking about
- 00:29:37these alpha releases
- 00:29:39and up it by an order of magnitude
- 00:29:42actually in in the case of what we call
- 00:29:43virtual alpha we're up to three orders
- 00:29:46of magnitude we have a thousand times as
- 00:29:48many hours as we
- 00:29:50from our sapien system as we have from
- 00:29:53humans
- 00:29:54on the on our alpha builds and without
- 00:29:56that
- 00:29:58there are so many features that are
- 00:29:59never used by employees
- 00:30:01or just not used enough
- 00:30:03to drive out bugs and and these changes
- 00:30:07with the sapien system we've been able
- 00:30:08to
- 00:30:09find a lot of those regressions
- 00:30:12far earlier than we would ever have a
- 00:30:13chance to if we waited for our beta
- 00:30:15population and we've got a lot of beta
- 00:30:17we have a public beta that folks can
- 00:30:19sign up for from the google play store
- 00:30:22and that allows you to see some of this
- 00:30:24stuff a little early
- 00:30:26uh it's a little bit crashier than than
- 00:30:28the production version because it is our
- 00:30:30soak in our
- 00:30:31mature it's the maturation phase of the
- 00:30:34app but every week
- 00:30:36we're shipping that and that's where we
- 00:30:39get a
- 00:30:40the lion's share of our of our um
- 00:30:45i don't know i don't know exactly what
- 00:30:46the word is the polish the polish of on
- 00:30:49the on the end of the app but without
- 00:30:51the alpha without sapiens running on
- 00:30:53alpha
- 00:30:54that beta build
- 00:30:56would have to take
- 00:30:57weeks
- 00:30:58because we just don't have the ability
- 00:31:01to force
- 00:31:03people to use the alpha
- 00:31:05build
- 00:31:06and we wouldn't want to at some point
- 00:31:09our goal
- 00:31:10is to try and make a clean experience we
- 00:31:13goal on things like touch responsiveness
- 00:31:16on how fast the app comes up
- 00:31:18on the number of users who ever
- 00:31:20experience a foreground app death the
- 00:31:22number of warm starts which is to say
- 00:31:25how good we are using memory and thus
- 00:31:27how
- 00:31:28infrequently we are killed by the
- 00:31:30operating systems like our goal comes
- 00:31:33down to sad users and bad experiences
- 00:31:36almost 100 percent
- 00:31:37there's one other thing several people
- 00:31:39are wondering about how you handle
- 00:31:40rollbacks in this process and what that
- 00:31:42looks like
- 00:31:44well so i think what we would talk about
- 00:31:46we actually call it unlanding um so
- 00:31:50just the way the same way you would
- 00:31:51revert in a git repo you check in a
- 00:31:53revert commit
- 00:31:55and then those land just like anything
- 00:31:58else
- 00:31:59but the trick is is that because we're
- 00:32:01working on a monorepo and it is
- 00:32:03effectively linear we don't really have
- 00:32:05a proper branching model except for the
- 00:32:07cuts
- 00:32:08for releases
- 00:32:11we roll everything on the end of master
- 00:32:13so if you have a change that needs to be
- 00:32:16reverted we build a inverse change and
- 00:32:19we land that
- 00:32:21now that can get tedious because again
- 00:32:23the master the the tip of master moves
- 00:32:25really fast so again this is a place
- 00:32:28where our productivity engineers have
- 00:32:29come in and helped us build these
- 00:32:31systems that automatically create these
- 00:32:33things
- 00:32:34uh when your
- 00:32:35tech if you have a particular revision
- 00:32:38that breaks a lot of tests even if you
- 00:32:40didn't notice that we're doing something
- 00:32:43we call multisect which will find which
- 00:32:45revision broke a particular
- 00:32:49set of tests on our master branch and it
- 00:32:52will proactively generate revert diffs
- 00:32:55where all all anybody has to do is hit
- 00:32:58go
- 00:32:59and your diff on lands
- 00:33:01it's not
- 00:33:03there's a problem in in a lot of
- 00:33:05uh software that
- 00:33:07is kind of addressed with the monorepo
- 00:33:09by saying
- 00:33:10look breaking changes are going to
- 00:33:11happen there's going to be times when
- 00:33:14the sequence of of changes in master are
- 00:33:18not going to be usable that they're just
- 00:33:20going to be breaking changes
- 00:33:22we don't try to create the fiction that
- 00:33:24master was always stable and that master
- 00:33:26was perfect we don't try to revert out
- 00:33:28the history we don't try to get rid of
- 00:33:30those changes we keep them we keep
- 00:33:32everything
- 00:33:33and we just have this very very long
- 00:33:35linear path that is our master branch
- 00:33:38and so when you break something we
- 00:33:40reverted that we revert out only the
- 00:33:42parts that you touched in a later
- 00:33:43revision
- 00:33:44and that means that every build after
- 00:33:46that since everything is based on all
- 00:33:49these stable artifacts and nobody is you
- 00:33:52know building these long-term branches
- 00:33:53that are based on some ridiculously old
- 00:33:56version of master
- 00:33:58we're constantly doing rebasing we're
- 00:34:00taking everybody's diffs and we're
- 00:34:02constantly bringing them up we're
- 00:34:04constantly moving them forward we don't
- 00:34:06even allow people to land if their syste
- 00:34:09if their current diff is based on a
- 00:34:11master that we don't have those
- 00:34:14artifacts for it's not worth the
- 00:34:16compilation time to try and do that
- 00:34:18everything gets rebased and up and moved
- 00:34:21up to stable versions
- 00:34:23uh one of the beauties of the on-demand
- 00:34:24system that i mentioned is if you pull a
- 00:34:27diff down in an on-demand
- 00:34:29it is rebased as part of that process so
- 00:34:32the first thing you do when that on
- 00:34:33demand comes up is fix any conflicts in
- 00:34:37the rebase which usually there aren't
- 00:34:39many unless you are working on a very
- 00:34:41hot piece of the system
- 00:34:44but that means that as we move forward
- 00:34:45we revert by fixing
- 00:34:48effectively
- 00:34:50and if you've got a fix that can do it
- 00:34:52faster than the revert or that's smaller
- 00:34:54than the revert you fix forward
- 00:34:57and that
- 00:34:58the difference between the two is only a
- 00:35:00question of whether we know that the fix
- 00:35:02is going to revert what you did
- 00:35:04so we live in the we live in this world
- 00:35:06of
- 00:35:07if you regress the metrics we have to
- 00:35:09grow back but also
- 00:35:11we don't want to be in the business of
- 00:35:14everybody doing that manually
- 00:35:16that's i mean
- 00:35:18it's like straightforward hearing the
- 00:35:19answers but then it's like you know you
- 00:35:21think about these problems with scale
- 00:35:22and then it's like okay
- 00:35:24what else are you going to do so anyway
- 00:35:26yeah rumors maybe we go back to the
- 00:35:29so we got three topics that we want to
- 00:35:31go go into and we got
- 00:35:3420 minutes left we're usually we usually
- 00:35:36wrap up at this point and go into q and
- 00:35:38a but we haven't even gotten to our
- 00:35:40favorite topics of productivity
- 00:35:41engineering metrics and whatnot uh
- 00:35:44before we get into that adam in in two
- 00:35:47minutes
- 00:35:48can you talk about the highlights of the
- 00:35:50tooling stack
- 00:35:52you mentioned buck and the you know buck
- 00:35:55two is now you know buck one didn't
- 00:35:57scale up get to have buck two and and
- 00:36:00you know customizing mercurial for for
- 00:36:03version control system
- 00:36:05in two minutes give us you know
- 00:36:07what what other you know highlights of
- 00:36:09the tool i mean i know
- 00:36:10we wanted to go through all the tools
- 00:36:12and probably you should have another uh
- 00:36:14webcast just for that but uh what can
- 00:36:17you give us in two minutes highlights
- 00:36:19well so the biggest thing is
- 00:36:21we have a customized version of vs code
- 00:36:24and it let us integrate all of our
- 00:36:26internal tooling that includes custom
- 00:36:28auth
- 00:36:29uh buck integration uh something we call
- 00:36:32interactive interactive um
- 00:36:35was it
- 00:36:36state logger isl whatever isl stood for
- 00:36:39in our universe
- 00:36:40but that lets us look at branches and
- 00:36:42look at revisions and pull diffs
- 00:36:45uh we have a system called jellyfish
- 00:36:47that is sort of the link between
- 00:36:50uh code revisions and
- 00:36:52diffs and that had to be again custom
- 00:36:55built we've got mercurial that actually
- 00:36:57is tracking the data
- 00:36:59we've got
- 00:37:00customized version of android studio
- 00:37:02that does our actual android builds
- 00:37:04using buck but also understands enough
- 00:37:07about the android ecosystem to let you
- 00:37:09uh debug
- 00:37:11those builds when they go onto a device
- 00:37:14we have on-demand servers
- 00:37:15which are the
- 00:37:17bread and butter of our development
- 00:37:19process
- 00:37:20then you've got all of the other
- 00:37:22ephemera that goes around this stuff so
- 00:37:24this is so then you get into the really
- 00:37:25neat stuff when we do a code search
- 00:37:28in our repo because everybody wants to
- 00:37:30be able to search from code all of our
- 00:37:32code searches are directly integrated
- 00:37:34with those development tools
- 00:37:37so if you're looking at a piece of code
- 00:37:39on the web there's a button that'll take
- 00:37:41you to that same piece of code in vs
- 00:37:42code
- 00:37:44if you're looking at code inside of our
- 00:37:46error reporting tools or inside of our
- 00:37:50search tools or data analytics tools
- 00:37:53you can link directly to that line of
- 00:37:54code in your editor
- 00:37:57and if you've got it on demand up it'll
- 00:38:00be the version of that that's in the on
- 00:38:02demand
- 00:38:03so i mean all taken all together what
- 00:38:05this means is that we don't have 57
- 00:38:07workflows for 57 different tools you
- 00:38:10have the way you do business and we've
- 00:38:12tried to enable as many of those as we
- 00:38:15possibly can
- 00:38:16which just lowers the friction of the
- 00:38:18whole system
- 00:38:19uh jumping into productivity engineering
- 00:38:22and developer happiness
- 00:38:24how does
- 00:38:26meta and maybe specifically your team
- 00:38:29write measure productivity engineering
- 00:38:31and happiness what what data are you
- 00:38:34looking at
- 00:38:35um
- 00:38:36and what
- 00:38:37where does that data come from
- 00:38:40yeah so let's start with really simple
- 00:38:41stuff um we have a
- 00:38:44developer-wide survey that we call pulse
- 00:38:46now everybody does developer surveys and
- 00:38:49they're basically nonsense for most
- 00:38:51people
- 00:38:52right but pulsate meta is different
- 00:38:55we
- 00:38:56actually ask people do you like the way
- 00:38:59that the your the
- 00:39:01tools we have are serving their purposes
- 00:39:04and then we give those that feedback
- 00:39:07directly to those tools team and those
- 00:39:08become actionable we ask people
- 00:39:11why that why they're
- 00:39:13not being able to do things the way they
- 00:39:15want to and then we move those systems
- 00:39:18based on those numbers
- 00:39:19we take our
- 00:39:21broad surveys as the core goaling metric
- 00:39:24of our entire management organization
- 00:39:26they're not gold on features delivered
- 00:39:28they're not gold on how many people do
- 00:39:30how much code or any of that their gold
- 00:39:32on on the pulse on the the literal pulse
- 00:39:35of the system right
- 00:39:37and then throughout the haves
- 00:39:39throughout the half throughout the
- 00:39:40quarter we do what we call micro pulse
- 00:39:42which is just little check-ins
- 00:39:44how how is how are things looking today
- 00:39:46was today better than yesterday was this
- 00:39:48week better than last week
- 00:39:50how how is this looking has the have we
- 00:39:52improved on this that is our thing we're
- 00:39:55trying to work on this half
- 00:39:57constant influx of data right
- 00:40:00but we know that when you ask somebody
- 00:40:02something in a survey you get conflated
- 00:40:04results every time people are going to
- 00:40:06conflate i'm happy with my job with the
- 00:40:09tools are good
- 00:40:11so we're also looking at ways of trying
- 00:40:13to lower the friction of the day-to-day
- 00:40:15development experience we have a bot
- 00:40:18called balancebot which monitors how
- 00:40:20often you're getting notified
- 00:40:22how often things are flowing into your
- 00:40:25your queue be that on our internal chat
- 00:40:28app
- 00:40:29you'll notice the theme here
- 00:40:30everything's everything's custom our
- 00:40:32internal chat app called workplace
- 00:40:34our internal
- 00:40:36uh social media system which
- 00:40:38unlike most companies we actually
- 00:40:40use it as our core communication
- 00:40:42framework whether you're getting too
- 00:40:44many notifications on that whether
- 00:40:46you're getting too many notifications
- 00:40:47via email whether you're
- 00:40:50getting too much getting too many
- 00:40:51reminders set from variant from various
- 00:40:54sources
- 00:40:55and because of this we can it actually
- 00:40:57will turn the tune those down
- 00:41:00we've set it up so that if you've joined
- 00:41:01a lot of groups it'll ask you do you
- 00:41:03want me to just mute these for you you
- 00:41:06you don't really interact with these a
- 00:41:07lot do you want me just turn off these
- 00:41:08notifications for you
- 00:41:10and it try and that automatically sort
- 00:41:13of does a lot of the housekeeping stuff
- 00:41:15that you do for yourself now i don't
- 00:41:17want to hear from this one i want to
- 00:41:18hear from this one i never look at these
- 00:41:21we don't lose any of that productivity
- 00:41:23because we've got a bot that's doing it
- 00:41:24for us
- 00:41:27we then add what we call focus bot which
- 00:41:29is to say we watch
- 00:41:32when our users are not listening to
- 00:41:34their notifications
- 00:41:36not immediately responding in their
- 00:41:38notifications and when they're actively
- 00:41:40working
- 00:41:41and we start
- 00:41:43muting
- 00:41:44those notifications and we let them work
- 00:41:47and we'll establish what we call focus
- 00:41:49blocks
- 00:41:50where if you are really in the groove
- 00:41:53and you're starting to code and you're
- 00:41:55just
- 00:41:56doing the work and you're not focusing
- 00:41:58on any of that stuff
- 00:42:00the volume of all that stuff just goes
- 00:42:02down
- 00:42:03and more than that it communicates to
- 00:42:04others who try to contact you that you
- 00:42:07are in a focus block
- 00:42:09and that maybe they ought to not maybe
- 00:42:10they shouldn't expect a response right
- 00:42:12now because
- 00:42:14you're busy you're doing work
- 00:42:17and what that does is it makes it so
- 00:42:19that doing the right thing is easy
- 00:42:22doing
- 00:42:22software the way that is effective is
- 00:42:26easy
- 00:42:27it's harder to make sure that you're
- 00:42:29answering every email it's harder to
- 00:42:31make sure that you are constantly on the
- 00:42:34the social media or the chat system
- 00:42:37trying to talk to people
- 00:42:39you can still do it
- 00:42:41but
- 00:42:42it emphasizes the the work of building
- 00:42:45the system
- 00:42:46over
- 00:42:47those distractions which is fantastic
- 00:42:51from uh from a sheer overall
- 00:42:53productivity perspective and we measure
- 00:42:55this like we measure everything we know
- 00:42:58when a ch when the group that has
- 00:43:00balancebot
- 00:43:02is producing more and getting better
- 00:43:04ratings
- 00:43:05than the group that doesn't use
- 00:43:07balancebot and i mean all this stuff is
- 00:43:09optional you can turn them all off you
- 00:43:10can say i don't want this
- 00:43:12you can disable these things and they
- 00:43:14actually ask you whether you want to
- 00:43:16because for us consent is everything
- 00:43:19but
- 00:43:20when you have these things enabled we're
- 00:43:21able to show
- 00:43:23you get more code written you spend more
- 00:43:24hours writing code your code has less
- 00:43:26diff comments your code doesn't
- 00:43:30need to be reverted as often on the
- 00:43:32aggregate
- 00:43:34all of these measures that can say can
- 00:43:36you focus
- 00:43:37are you building what we are intending
- 00:43:39to build
- 00:43:41and
- 00:43:42on the aggregate are you a happier
- 00:43:43developer
- 00:43:45because a happy developer is a good
- 00:43:46developer
- 00:43:48and that that means a different thing to
- 00:43:49everybody but
- 00:43:51for most of us i think it means can we
- 00:43:53do the job that we like doing
- 00:43:55nice yeah i mean we
- 00:43:57we know context switching is is a killer
- 00:44:00and
- 00:44:01you know thinking of
- 00:44:02how to decrease that noise is on
- 00:44:05facebook scale very interesting thanks
- 00:44:07for sharing there um
- 00:44:11any
- 00:44:12interesting productivity engineering
- 00:44:15wins that you i know you talked about
- 00:44:18the the
- 00:44:19the no code stuff and the safety and
- 00:44:21stuff any other interesting wins based
- 00:44:24on data that you all have seen
- 00:44:26and things that you implement that you
- 00:44:28can kind of share with us
- 00:44:30well so i think the biggest i think the
- 00:44:32biggest productivity win that i've seen
- 00:44:34here
- 00:44:36is the integration of the apps
- 00:44:39themselves and i mean i know i've
- 00:44:40touched on the monorepo a little bit
- 00:44:43but
- 00:44:44the thing is is that when we're building
- 00:44:46these systems we've kind of everybody
- 00:44:48kind of eventually acknowledges that
- 00:44:51you're not going to reuse as much of the
- 00:44:52code as you think you're going to use
- 00:44:55right you're not going to actually reuse
- 00:44:58that method you wrote 15 years ago
- 00:45:02today for what you're writing
- 00:45:04but a big chunk of what we've tried to
- 00:45:07do
- 00:45:08is to take all of the common
- 00:45:10stuff that we don't want people reusing
- 00:45:13and make it transparent
- 00:45:16so
- 00:45:17we have tools that make it easy to write
- 00:45:20one-off sql scripts
- 00:45:22we have tools that make it really easy
- 00:45:26to build
- 00:45:27complex data flows to do analytics
- 00:45:30without establishing
- 00:45:32new hardware or new infrastructure to do
- 00:45:35it
- 00:45:36we have focused on flexibility over
- 00:45:40purity of ideal
- 00:45:41so we have a system called bento that is
- 00:45:43one of the big things that productivity
- 00:45:45engineering came out with that lets you
- 00:45:47just run data processing and you can run
- 00:45:50it in c sharp or c
- 00:45:52or python you can run a sql query you
- 00:45:55can pull the data into your system and
- 00:45:57you can build these very useful scripts
- 00:46:00with a limited certain
- 00:46:02limited amount of libraries that are
- 00:46:04available
- 00:46:05and do simple visualization and then
- 00:46:08save those and share them
- 00:46:10that keeps us from having people build
- 00:46:13custom data
- 00:46:14data systems trying to download their
- 00:46:16own versions of every single library in
- 00:46:18existence having their need to write
- 00:46:21custom sql code everywhere all those
- 00:46:24integrations to get the data
- 00:46:26unified because it's actually easier
- 00:46:28than them writing themselves
- 00:46:30the code is easy is easy to port and the
- 00:46:32solutions can be shared so we're not
- 00:46:34having everybody run their own
- 00:46:38so that's
- 00:46:39that's one of the biggest wins in my
- 00:46:41opinion is that kind of infra it's infra
- 00:46:43targeted at the facilitation of what
- 00:46:47your users are going to do whether you
- 00:46:48want them to or not
- 00:46:50i mean and by users here i mean the
- 00:46:51developers
- 00:46:53thanks for thanks for sharing that um i
- 00:46:55have one question from our folks justin
- 00:46:59i had to get this one in our folks have
- 00:47:01been
- 00:47:02you said
- 00:47:03don't let adam leave without him
- 00:47:05answering this one question
- 00:47:08you know and of course we're the company
- 00:47:10behind gradle right so obviously
- 00:47:11interested in build test acceleration
- 00:47:14technologies and whatnot
- 00:47:16um you know
- 00:47:17how how
- 00:47:19how long does a
- 00:47:21android pr build take and what are you
- 00:47:25all doing as far as acceleration
- 00:47:27technologies go
- 00:47:29to make this thing and that that process
- 00:47:32and the feedback cycles to be as fast as
- 00:47:35it can be with tools and technologies
- 00:47:37have you done so i think we've i think
- 00:47:38we've talked to that a little bit yeah
- 00:47:40um but here's the thing right if you
- 00:47:43were to check out an on demand
- 00:47:45right now just a blank on demand no did
- 00:47:48no code changes
- 00:47:49and you were to try and build the
- 00:47:51facebook app it's going to be less than
- 00:47:53a minute
- 00:47:55seconds
- 00:47:56now that's building
- 00:47:58the most complex android app in
- 00:48:01existence
- 00:48:03in a few
- 00:48:04seconds now
- 00:48:08getting there
- 00:48:09means that we have a custom build system
- 00:48:11that relies heavily on artifacts that
- 00:48:14are all cached and built constantly into
- 00:48:17what we call the stable and they're
- 00:48:20really checkpoints our stable points are
- 00:48:22checkpoints for each given app where we
- 00:48:24say this is a version that everybody
- 00:48:26should be basing their work on
- 00:48:29and everything past that that you're
- 00:48:31building be that from master or be that
- 00:48:34through your own changes
- 00:48:36has to be recompiled
- 00:48:38so we take as much of stable as we
- 00:48:40possibly can we take the little bit that
- 00:48:42you've added and or changed
- 00:48:44and only that little bit
- 00:48:46is being actually recompiled
- 00:48:49so on a typical change you might do a
- 00:48:52lot of work and you might do
- 00:48:54need to recompile a lot and it might
- 00:48:56take 20 30 minutes to fully recompile
- 00:48:59that but then the next build will be
- 00:49:00five seconds
- 00:49:03because we can't we literally can't
- 00:49:06afford to build all of the code for
- 00:49:07every change that's made
- 00:49:09and i mean that in a literal monetary
- 00:49:11sense running all of these builds from
- 00:49:14scratch
- 00:49:15is financially irresponsible and
- 00:49:18honestly it's it's uh
- 00:49:20an irresponsible thing as stewards of
- 00:49:22the work of the planet
- 00:49:24the sheer i think there was a study by
- 00:49:26google a few years ago that
- 00:49:28their ai system
- 00:49:30was generating something like the entire
- 00:49:33uh heat footprint of a small country
- 00:49:37just to run their ai models
- 00:49:39when we're talking about humanity scale
- 00:49:41data processing
- 00:49:43we can't it isn't
- 00:49:45just
- 00:49:46you know financially bad it's
- 00:49:48irresponsible
- 00:49:49to run that excessive computation
- 00:49:52and so we actually i mean we
- 00:49:55measure those things every single
- 00:49:56feature we look at we look at it in
- 00:49:58terms of how much
- 00:50:00server-side cpu it requires in order to
- 00:50:04do its job we look at how much on basis
- 00:50:08a dev environment is costing in terms of
- 00:50:11how long it's taking to compile how long
- 00:50:14diffs are taking on server-side
- 00:50:16computation hours
- 00:50:18not just human hours but energy usage
- 00:50:21because those are end up being the core
- 00:50:23drivers of success for the system as a
- 00:50:26whole even if it's not the best thing
- 00:50:29monetarily and adam uh this will
- 00:50:31actually help with one of the audience
- 00:50:32questions too
- 00:50:33how much of that time saving is caching
- 00:50:36versus i build caching specifically like
- 00:50:39not dependency cash or anything like
- 00:50:40that how much of that time saving is
- 00:50:42build caching versus some of the other
- 00:50:44accelerating technologies that you've
- 00:50:45got in there
- 00:50:46well so i will admit that we've had
- 00:50:48entire groups uh 10 and 20 people at a
- 00:50:50time who have optimized the actual
- 00:50:54compilers
- 00:50:57to a to an
- 00:50:58obscene degree
- 00:50:59we have custom linting frameworks custom
- 00:51:02compilation frameworks
- 00:51:04we do
- 00:51:05uh app size
- 00:51:07magic that
- 00:51:09nobody else has
- 00:51:10and we we have this problem where we
- 00:51:12clever our way out of everything but at
- 00:51:14some point
- 00:51:16that is really foundational cutting edge
- 00:51:18work that has been done to facilitate
- 00:51:21these
- 00:51:22gigantic products
- 00:51:25being broken down in these tiny little
- 00:51:27bite-sized chunks that can both be
- 00:51:29compiled locally
- 00:51:31developed against
- 00:51:33debugged but also shipped to our you
- 00:51:35shipped to our users without it being
- 00:51:37the only thing their phone can do
- 00:51:39it's i mean it takes a minute to even
- 00:51:41like know how to respond to to say like
- 00:51:43that right because just the complexity
- 00:51:45of each of those little pieces all
- 00:51:47working together but then the fact that
- 00:51:48it actually works is kind of um what's
- 00:51:51what it works so well i think is kind of
- 00:51:53what's so impressive
- 00:51:55um
- 00:51:57when i say it's the most complex
- 00:51:59android code base that i've ever seen
- 00:52:02and i've seen a lot and that was one of
- 00:52:04the first questions actually was
- 00:52:06elaborate and i feel like you've done
- 00:52:08that kind of for the whole
- 00:52:10session it's like yeah there's
- 00:52:11complexity here there's complexity here
- 00:52:13there's complexity here it's not just
- 00:52:14the code it's not just the architecture
- 00:52:16it's the whole process of manufacturing
- 00:52:18and and everything that goes into it
- 00:52:19yeah
- 00:52:20justin how many more can we take before
- 00:52:22oh my god we wrap it up we have i know
- 00:52:24we have so many i know and we're going
- 00:52:27to spill stuff over into the slack so
- 00:52:28again i'll post that before we close out
- 00:52:30again so make sure that and you can
- 00:52:31scroll up in the chat now if you see
- 00:52:32that now um maybe pick uh two of the top
- 00:52:35ones justin and um you know adam will go
- 00:52:39over a bit and we don't want to abuse
- 00:52:41your time we could you know probably
- 00:52:43extend this to all day i could just
- 00:52:45cancel the rest of my meetings and uh
- 00:52:47well i mean it's not the joe rogan
- 00:52:49podcast but
- 00:52:52you know i think this here's one that i
- 00:52:54think is really good because i think it
- 00:52:55ties into some ambiguities that people
- 00:52:57get around the process of developer
- 00:52:58productivity engineering versus kind of
- 00:53:01management um and i think this is a good
- 00:53:03opportunity to disambiguate that what
- 00:53:05we're talking about here is using
- 00:53:06technology to make productivity better
- 00:53:09through happiness and less frustration
- 00:53:12but one of the questions was you know
- 00:53:13don't you think too much measurement
- 00:53:15feels like micromanagement for engineers
- 00:53:17maybe you could tell me a little bit
- 00:53:18about culturally yeah so i've got i've
- 00:53:21got the perfect answer this one if you
- 00:53:22haven't ever heard of goodheart's law
- 00:53:24um you should look it up um what good
- 00:53:27heart's law says
- 00:53:28is that as soon as a metric
- 00:53:31becomes a goal
- 00:53:33it ceases being a good metric
- 00:53:36now we take that to heart and we take it
- 00:53:39to a to an extreme really
- 00:53:41where we don't say
- 00:53:43you need to write this much code or
- 00:53:46spend this number of hours we don't say
- 00:53:49there are this many requirements for how
- 00:53:51many lines of code get written or how
- 00:53:53many diffs get put through what we do is
- 00:53:56we look at
- 00:53:57the effect of a given change
- 00:54:00on all of those metrics
- 00:54:02and we don't say they have to be a
- 00:54:03certain number or arbitrarily say they
- 00:54:05need to improve
- 00:54:07we look at it from an evolutionary
- 00:54:08perspective where those things are
- 00:54:10positive direction
- 00:54:12so when we put in a change like balanced
- 00:54:14bot we say okay
- 00:54:16are these things that are generally
- 00:54:18neutral goods
- 00:54:20improving are we getting
- 00:54:23gifts through faster are diffs landing
- 00:54:25better
- 00:54:26are there being
- 00:54:28more thorough review and less reverse
- 00:54:31are our pro or
- 00:54:33the average regression that a diff
- 00:54:36causes against all the all of the
- 00:54:38company financial
- 00:54:41active user those kinds of product
- 00:54:43metrics
- 00:54:44is the average effect of a diff getting
- 00:54:46better
- 00:54:48we don't care
- 00:54:49how it gets better we don't really care
- 00:54:52if the way it got better was that you
- 00:54:54had to stand on one foot and dance under
- 00:54:56the moonlight
- 00:54:58that doesn't matter what matters is that
- 00:55:00you ran an experiment that shows that by
- 00:55:03doing this thing whatever it is
- 00:55:06all of those other things
- 00:55:08net went better and better is the whole
- 00:55:11point
- 00:55:12better is the way we get to functional
- 00:55:15and the way we get too efficient
- 00:55:18even little steps if you do them long
- 00:55:21enough will get you where you're going
- 00:55:23now that always brings up the problem of
- 00:55:24how do you choose
- 00:55:26which thing will give you the most
- 00:55:27impact and that's a whole problem in and
- 00:55:30of itself and it's sort of the the most
- 00:55:32difficult thing that we ever have to ask
- 00:55:35but rather than asking how can i
- 00:55:36possibly do that we're asking which of
- 00:55:39these things that i could do
- 00:55:41are going to have that that impact
- 00:55:44and because we're constantly measuring
- 00:55:47we can decide whether we were right
- 00:55:50and that helps us tune that whole
- 00:55:52process
- 00:55:53but we never goal
- 00:55:55we never goal never never never never
- 00:55:58goal on
- 00:55:59productivity
- 00:56:00or on individual pro on individual
- 00:56:04output as a thing a person needs to do
- 00:56:08that's not how we measure our engineers
- 00:56:10it's not how we measure our people
- 00:56:12we measure them on the impact that they
- 00:56:14have to the organization and to the
- 00:56:17products and to our users
- 00:56:19that's
- 00:56:20quite possibly a perfect answer i mean i
- 00:56:23i love that it's like no these metrics
- 00:56:25are not here because we want to drive
- 00:56:26them necessarily in any particular
- 00:56:28direction these metrics are here so that
- 00:56:31we actually know what we're doing we can
- 00:56:32make informed decisions so i know we're
- 00:56:34up on time i just posted the community
- 00:56:37slack channel again to the main chat
- 00:56:39uh
- 00:56:40and and we can fold in there i'm gonna
- 00:56:42do my best to get these things the
- 00:56:44remaining questions in there and then uh
- 00:56:46adam can asynchronously as he has time
- 00:56:50respond to so many of these
- 00:56:52amazing questions i mean really we need
- 00:56:54another session i feel like uh because
- 00:56:56there's so much that we can still
- 00:56:57discuss so but why don't i head on back
- 00:56:59over to you for and
- 00:57:02just wanted to thank adam for his time
- 00:57:04and of course you know their their team
- 00:57:06is hiring right thank you for facebook
- 00:57:08engineering for allowing
- 00:57:11adam to
- 00:57:12uh speak and share all this with us
- 00:57:15right there there's a link
- 00:57:17qr code to link for the jobs within
- 00:57:21uh adam's
- 00:57:24org
- 00:57:25right if you're interested and adam
- 00:57:27thanks again i mean we we go all day and
- 00:57:29thanks for you know giving your time to
- 00:57:32go into the gradle community slack and
- 00:57:34and share all that there and with that
- 00:57:37said we'll
- 00:57:38uh you got anything for our folks before
- 00:57:41we go
- 00:57:42i'll just say that uh
- 00:57:44the way we think about android and ios
- 00:57:47is not like the way everybody else
- 00:57:49thinks of android and ios
- 00:57:51we want talented java and c developers
- 00:57:56the fact that that our code is running
- 00:57:58on mobile is not the critical part of
- 00:58:00any of this as much as that's valuable
- 00:58:04if you've ever wanted to work at meta
- 00:58:06and you know java or you know c or you
- 00:58:08know swift
- 00:58:09mobile is a great way of coming in
- 00:58:12because we look at mobile the way most
- 00:58:15people look at servers
- 00:58:18that's awesome and folks
- 00:58:21the moral of the story if you don't
- 00:58:24invest in tooling productivity
- 00:58:26engineering developer experience
- 00:58:28and one day you ended up competing
- 00:58:30against facebook
- 00:58:33you know send thoughts and prayers right
- 00:58:35yeah that's perfect and that's why we're
- 00:58:37here today and thanks again everyone for
- 00:58:40joining and spending this time with us
- 00:58:42and thank you adam thank you justin
- 00:58:44thank you the great old team and we'll
- 00:58:46see you all on the next one which um
- 00:58:50android x how the android the google
- 00:58:53android x productivity uh well how the
- 00:58:56google android x team does productivity
- 00:58:58engineering uh we're doing a webcast on
- 00:59:01that with armis from the android x
- 00:59:04libraries team they're the folks behind
- 00:59:07jetpack thanks all thanks adam and we'll
- 00:59:11next time
- 00:59:30you
- Meta
- developer productivity
- engineering
- Android
- Sapiens
- Buck
- reliability engineering
- developer experience
- team structure