00:00:00
[Music]
00:00:12
my name is crystal so I have been at
00:00:16
go-jek for two years now I'm gonna speak
00:00:20
in English so I'm sorry but my Indonesia
00:00:23
is not that great
00:00:25
I can understand it but I have grown up
00:00:29
in the States my whole life but I came
00:00:31
back to Indonesia because of gojaks
00:00:33
potential so when I first heard about go
00:00:36
Jack there was go ride and this was a
00:00:39
really interesting opportunity because
00:00:42
in Southeast Asia
00:00:43
there are so many opportunities for
00:00:45
growth in terms of infrastructure
00:00:47
mobilizing the the informal labour
00:00:51
economy we had go son go ride go Mart
00:00:56
and go food so I thought that my job
00:01:00
here would be very easy there were only
00:01:02
4 products we had pretty sizeable demand
00:01:05
but it wasn't anything crazy
00:01:07
we had a monolith database and
00:01:10
everything could easily be accessed in a
00:01:12
my sequel dB and then a couple months
00:01:16
went by and then we had three more
00:01:18
services and I said that's okay I can
00:01:20
handle three more services you know
00:01:23
we'll have a couple more data points we
00:01:24
have to add a few new dimensions but
00:01:26
that's ok everything's still stored in
00:01:28
one place let's just keep pumping the
00:01:30
data into a single area and a couple
00:01:33
more months go by and I said ok now we
00:01:37
have quite a few more services expanding
00:01:40
in our ecosystem suddenly there are new
00:01:43
stores of data they're not all in a
00:01:46
monolith service some of it is now in a
00:01:49
MongoDB
00:01:50
some of it is in Postgres some of it is
00:01:53
in a log stash that isn't even being
00:01:54
stored yet and so today you have a
00:01:59
couple more and you'll start to see even
00:02:03
more coming pretty soon and because of
00:02:06
this our philosophy at the very
00:02:08
beginning was always about let's just
00:02:11
make sure we store the data and create
00:02:14
an environment where at least we have
00:02:16
data because even bad data is better
00:02:19
than no data provide an environment
00:02:21
where at least people can make decisions
00:02:23
be data-driven
00:02:25
and understand what's happening in their
00:02:27
products even if it isn't as user
00:02:29
friendly as it could be right because
00:02:32
go-jek was moving so fast and so
00:02:34
fearless that we had no choice it was
00:02:36
either accept the data as it is and
00:02:38
figure out the standardization later
00:02:40
so as the micro-services grew and our
00:02:43
kind of wealth of product offerings grew
00:02:46
so did the scale and so we were pumping
00:02:50
all of this data in a raw JSON format
00:02:53
maybe it was unstructured somehow we
00:02:55
didn't care we had clickstream data
00:02:58
coming through our systems we had order
00:03:01
management system data coming through
00:03:04
we had driver locations and every time
00:03:07
someone thought of a new data service or
00:03:09
a new feature to add to a product they
00:03:11
would build a new micro service and so
00:03:14
with the level of data being created we
00:03:17
were starting to have a really big
00:03:18
problem our data ended up looking a lot
00:03:22
like this just thrown into a single
00:03:26
repository with no organization
00:03:28
whatsoever we told ourselves ok we'll
00:03:31
just take it from every micro service
00:03:33
we'll store it in the database let
00:03:36
people figure it out later at least
00:03:38
everyone has access to something data
00:03:40
scientists can tap in developers can
00:03:43
look at their feature product owners
00:03:45
will oh crap what about the product
00:03:48
owner so product owners would actually
00:03:50
go into our visualization tools or
00:03:53
they'd go into even just the sequel
00:03:55
front-end and they'd say well it's a lot
00:03:58
of data but I don't actually know how to
00:04:00
use this because you have so many micro
00:04:02
services I get it there are products
00:04:04
that we've built that are serving a
00:04:06
particular function but I only know how
00:04:09
to use what you guys have provided if I
00:04:12
have a very specific question about a
00:04:13
very specific feature if I want to look
00:04:16
at the feedback ratings of all our
00:04:19
orders I know how to find that but what
00:04:22
do I do with that information how do I
00:04:24
make better decisions about it so we had
00:04:27
come into basically an area where we
00:04:31
knew we had data and it had so much
00:04:33
potential but no one was able to use it
00:04:35
in the structure that it was
00:04:38
we started to think about how we might
00:04:40
want to organize our data how could we
00:04:42
make it easy for people to discover new
00:04:44
metrics and how could it we make it so
00:04:47
that people could explore new data
00:04:50
potentials rather than just asking the
00:04:52
same questions over and over again
00:04:54
because like we'll except to build
00:04:57
growth you need to be extremely creative
00:04:58
you have to find those nuggets of data
00:05:01
and those insights that no one else
00:05:04
would know how to find so now we had
00:05:08
this data closet and I think the
00:05:12
metaphor here is is that when you
00:05:14
organize your closet you're trying to
00:05:16
optimize for something you don't want to
00:05:18
wear the same things every day you want
00:05:20
to be able to mix and match different
00:05:22
clothing items you want to be able to
00:05:24
accessorize with all of the different
00:05:26
things in your closet and not have a
00:05:29
static outfit and so when you organize
00:05:32
your closet you can organize it in
00:05:35
several ways so you can categorize it by
00:05:39
outfit so you can match all of the
00:05:41
things that you know you wear together
00:05:42
you can categorize by color so that you
00:05:45
can understand okay these things match
00:05:47
well together or you can categorize by
00:05:49
clothing type putting all your t-shirts
00:05:51
in one area all your shoes in one area
00:05:53
and the way that we thought about how we
00:05:56
were organizing our data at gojek was a
00:05:58
bit similar so you could organize by
00:06:01
outfit right so go food will have its
00:06:03
perfect data set where it has all of it
00:06:06
booking data has the drivers attached to
00:06:08
go food it has the prices of the go food
00:06:11
orders there are merchants in this tidy
00:06:14
data set so that a go food person can
00:06:17
just walk into the data closet and say
00:06:19
oh there that's my go food data let me
00:06:21
find out you know the answers to my
00:06:23
question about Jessica food you could
00:06:26
also organize it by business unit right
00:06:29
so all the finance people will come in
00:06:31
and say oh here's my accounting data
00:06:34
here is the data that is just about
00:06:36
surge pricing here is the data that
00:06:39
tells me about my revenues or you can
00:06:42
organize by clothing type
00:06:45
so in this it would be for every booking
00:06:47
being stored in a specific location all
00:06:50
of the bid data being stored in a
00:06:53
specific location and all of the
00:06:56
cancellation orders being in a specific
00:06:57
location we ended up thinking based on
00:07:02
these different categories that people
00:07:04
were used to finding their data in how
00:07:08
can we structure it such that people are
00:07:10
using the data in a way that is
00:07:12
efficient and effective for them to
00:07:15
build on their product so it leads us to
00:07:18
the North Star metric completed
00:07:21
transactions so the North Star metric is
00:07:23
something that you want to align the
00:07:25
organization behind so that everyone in
00:07:27
the company knows what the company's
00:07:29
goal is and so at go-jek that would be
00:07:32
completed transactions as we increase
00:07:34
the number of completed transactions we
00:07:36
are making our goal and every product
00:07:39
owner wants to know how can I contribute
00:07:41
to the North Star metric at go Jack so
00:07:44
now you need to go into the metrics that
00:07:47
matter so what metrics matter when we
00:07:50
want to increase the number of completed
00:07:52
transactions well you have to increase
00:07:55
go ride completed transactions go food
00:07:58
completed transactions we'll add to that
00:08:00
North Star metric go pay p2p
00:08:02
transactions we'll add to that metric so
00:08:05
you start to break down the categories
00:08:06
and the layers that the data needs to be
00:08:09
aligned by now for go food completed
00:08:13
bookings what are the metrics that
00:08:14
matter for that because you can have the
00:08:16
product owner caring about go food
00:08:19
completed orders but what about all of
00:08:21
the PM's
00:08:22
who are working on specific features and
00:08:24
they don't know exactly how their
00:08:27
feature works towards the North Star
00:08:29
metric well then you go into total
00:08:33
bookings right if there are no bookings
00:08:36
on go food obviously there can't be go
00:08:37
food completed bookings there will be
00:08:41
allocation for every booking that does
00:08:44
happen how do we ensure that there is a
00:08:46
driver to accept that order and complete
00:08:48
it and then cancellations for every
00:08:52
order that is placed and then gets a
00:08:55
driver how do we ensure that a
00:08:56
cancellation does not occur
00:08:58
because when he goes to those restaurant
00:09:00
or he goes to the store the item is out
00:09:03
of stock so how do we prevent
00:09:05
cancellations so now PMS can rally
00:09:08
around these specific metrics that
00:09:10
matter to the metric that matters to the
00:09:13
metric of the Northstar metric and they
00:09:15
understand how their features are
00:09:17
helping either reduce cancellation rates
00:09:20
improve allocation or increase total
00:09:23
bookings now the metrics that matter to
00:09:26
the measures that matter are things that
00:09:27
we align the business units around as
00:09:30
well as build our tableau dashboards or
00:09:33
our visualization tools around so when
00:09:37
you go into a visualization tool at
00:09:39
go-jek it's always centered around one
00:09:41
of these kind of concepts so when you
00:09:45
look at total bookings what you need to
00:09:47
have in order to increase total bookings
00:09:50
are obviously active users on your
00:09:52
platform to complete orders as well as
00:09:55
merchants for people to book from for
00:09:58
when you want to understand how to
00:10:00
improve allocation you need to know
00:10:03
where drivers are located and you need
00:10:06
to know whether or not they are
00:10:08
incentivized well enough to complete
00:10:10
these orders because just it just
00:10:13
because you have supply doesn't mean
00:10:15
that these drivers are actually going to
00:10:17
complete these orders and for
00:10:20
cancellation rates we would need to
00:10:22
understand how long it takes a customer
00:10:24
to get a driver so that they don't
00:10:26
cancel because they're being impatient
00:10:29
and we need to understand when a driver
00:10:32
goes to the restaurant and there's a
00:10:33
stock out how could we have prevented
00:10:35
that content quality issue so that this
00:10:39
issue doesn't happen anywhere and so now
00:10:40
you have all of these sub features of
00:10:43
the product that different PM's and
00:10:46
business owners can rally around because
00:10:48
they understand exactly how what they
00:10:51
are doing impacts the metric that
00:10:53
matters to the metric that matters and
00:10:54
so on and now there's always I guess on
00:11:00
the technical side people are find it a
00:11:02
bit harder to understand like oh how is
00:11:04
my work you know helping complete
00:11:06
transactions so for us on the developer
00:11:09
side we're always focused on
00:11:11
service of time driver supply hours so
00:11:15
back to this now we knew what our
00:11:18
Northstar metrics were we had aligned
00:11:20
the organization around it the BI team
00:11:22
had gone to each product owner and said
00:11:24
hey this is our Northstar metric this is
00:11:27
what we expect in terms of KPIs from
00:11:29
each product how should we organize this
00:11:32
so that we can efficiently enable
00:11:35
product owners and business units to
00:11:37
find their own aha moment on their own
00:11:39
so that we don't have to constantly go
00:11:41
to them and say hey are you looking at
00:11:43
allocation rates are you looking at
00:11:45
cancellation rates and instead they
00:11:47
would be able to go into this and
00:11:49
understand it a bit better so what we
00:11:52
decided to do was a couple of things we
00:11:54
decided to organize it in different ways
00:11:57
but use the same data so we weren't too
00:12:01
concerned about duplication of data
00:12:03
being represented in our data warehouse
00:12:05
because we're mostly on a cloud platform
00:12:07
and because of that storage is cheap you
00:12:10
can duplicate data as long as it
00:12:11
improves the references so we decided to
00:12:15
look at one style of organization where
00:12:19
we consider what product owners were
00:12:22
commonly coming into the data warehouse
00:12:23
for they would say oh I want to
00:12:25
understand something about my customer
00:12:26
or I want to understand something about
00:12:28
my drivers but they would often miss out
00:12:30
on what was in between those two things
00:12:32
which are you know feedback ratings that
00:12:35
tie them together the bookings that they
00:12:37
complete together or things as silly as
00:12:40
the weather which everyone probably
00:12:42
noticed this morning so on this what we
00:12:46
would have a product owner do is they'd
00:12:48
come in and they say ok I want to look
00:12:49
at you know the customers who are using
00:12:51
go food or I want to look at the
00:12:53
customers who are using go points
00:12:55
vouchers and from here they would be
00:12:58
forced to almost to find those
00:13:00
relationships between every single
00:13:02
possible shared property between a
00:13:04
driver because on our platform were
00:13:07
really interested in not just promoting
00:13:11
the customer experience but also the
00:13:13
driver experience with it because the
00:13:14
drivers are always our agents towards
00:13:17
the customers so the other way that we
00:13:20
wanted to organize this was by event
00:13:23
type and shared properties now everyone
00:13:27
at the company is kind of focused on
00:13:28
different streams of work and in doing
00:13:31
this we kind of forced different mmm
00:13:38
shared properties so that they would be
00:13:41
forced to not look at just a single
00:13:43
problem when people were looking at go
00:13:48
food allocation rates they noticed that
00:13:50
the cancellation rate was very high so
00:13:52
they would look into the bookings they'd
00:13:54
only look at bookings and say oh wow
00:13:55
constantly know we have a lot of
00:13:57
cancellations but they weren't really
00:13:59
looking at things like location or they
00:14:02
weren't looking at things like driver
00:14:03
incentives because that was very far
00:14:05
from the concept of bookings so by
00:14:09
combining you know things like a go-cart
00:14:13
booking or a go food booking - driver
00:14:16
statistics like his performance overall
00:14:19
on the platform and then - the unit
00:14:21
economics so what was the price that a
00:14:24
driver was being given for every order
00:14:26
that he completed and for canceled
00:14:28
bookings how did that compare to
00:14:30
completed orders now this was a feature
00:14:32
that they could look at as a data metric
00:14:35
and compared to all in one place one
00:14:40
example of an aha moment that we looked
00:14:42
at here was in looking at how go points
00:14:45
vouchers were being adopted so we looked
00:14:48
at all of the adoption rates of go
00:14:50
points vouchers you can buy vouchers in
00:14:53
our app and redeem them at a store for
00:14:57
users who are using alpha Mart belcher's
00:15:00
we didn't just look at how they redeemed
00:15:03
them in store we don't just look at the
00:15:04
time stamp but we link it to the user's
00:15:06
profile level what was he doing at that
00:15:08
time and by looking at his user profile
00:15:11
level you could see that oh he had just
00:15:14
completed an order so this person was
00:15:16
literally at the store he bought a go
00:15:18
points voucher for alpha Mart and
00:15:20
redeemed it in the store and now this
00:15:23
made us wonder couldn't we tie go points
00:15:26
voucher data more closely with our go
00:15:28
ride data go points is a completely
00:15:30
separate team while go ride is
00:15:34
focus mostly on transport but this data
00:15:36
could be easily linked together you
00:15:38
could give a more contextual experience
00:15:40
to the users and these are two
00:15:42
completely different product lines that
00:15:44
most people wouldn't have considered as
00:15:46
I grow the strategy that you could push
00:15:48
and incentivize a user and contact them
00:15:51
at the right time based on two
00:15:52
completely different data points having
00:15:58
the experience of setting up our data
00:16:01
way that we could explore and be very
00:16:04
creative has allowed us to do a lot of
00:16:07
different data blog posts in a very
00:16:09
short amount of time for us to use the
00:16:14
data warehouse as it is now essentially
00:16:16
we can just start with a single question
00:16:17
what is go Jackson impact for Indonesia
00:16:20
well a lot of people would look at that
00:16:22
and say well we've uplifted a lot of
00:16:25
people's income right we've created a
00:16:28
lot of jobs and opportunities for
00:16:29
drivers so let's look at that one
00:16:31
feature drive our unit economics and
00:16:35
let's see where that story takes us so
00:16:38
for us it becomes an exploratory
00:16:40
experience where you just start with a
00:16:41
single question and it leads you towards
00:16:43
other different unexpected relationships
00:16:48
with data points that you hadn't even
00:16:49
considered in the beginning but because
00:16:51
the data was so linked together you're
00:16:54
almost forced to realize that there are
00:16:56
connections there that you hadn't
00:16:58
expected before do you know how many
00:17:01
more topics we have delivered in the
00:17:03
past year yes so we've delivered three
00:17:07
million more taluk in the past year and
00:17:10
this was kind of a data point that we
00:17:11
didn't even have to really search for
00:17:15
it's just something that occurred when
00:17:17
we were looking at basic go food data
00:17:23
when we wanted to write our most recent
00:17:26
blog posts on sudhir Mun it was not hard
00:17:29
for us to find all of these different
00:17:31
data points we didn't have to take a lot
00:17:34
of time to expand on ok what does all
00:17:37
the data in Sudan like how do we pull
00:17:39
this data because all of our existing
00:17:43
data points are matched with low
00:17:46
station based data wherever they can be
00:17:48
all we had to do was literally type in
00:17:50
Sudhir maan as a location point and all
00:17:53
of the data that was represented on a
00:17:56
location based on a location based
00:18:00
granularity would pop up so you didn't
00:18:03
need to say I want specifically okay how
00:18:05
many go food orders are being sent in
00:18:08
Sudirman all you had to do was say
00:18:10
what's happening in Sudhir maan I think
00:18:14
on the traffic side it was quite easy
00:18:17
for us as well so in understanding what
00:18:20
happened in Sudhir maan and what its
00:18:22
effect what the ban might have an effect
00:18:25
on we initially looked at how many
00:18:30
orders when that were in the area we
00:18:32
took a look at how many drivers were
00:18:36
being picked up and dropped off in that
00:18:37
area and because we had standardized our
00:18:39
data to the point where it's using these
00:18:41
s2 ids which are a open source Google
00:18:45
library product that translates latitude
00:18:48
and longitudes into geographical cells
00:18:50
we had standardized our data so that you
00:18:53
could look up Geographic data across all
00:18:55
different products across all different
00:18:59
understandings and looking event levels
00:19:03
like driver location pings these weren't
00:19:06
even on a booking level but we could
00:19:09
still identify that drivers were in that
00:19:11
location at that time so to read more
00:19:15
please go to our go check data blog
00:19:20
[Music]
00:19:24
Okubo