00:00:00
hello everyone uh I think today you have
00:00:02
heard a lot about hgun H3 Etc so next
00:00:06
thing I'm going to talk about like how
00:00:07
we are going to process the headon data
00:00:10
particularly I'm going to focus on how
00:00:11
we want to smooth the headon data and
00:00:14
then how can we use the headon data from
00:00:16
for forecasting by the way my name is TR
00:00:19
I'm an engineer from the marketplace
00:00:20
forecasting
00:00:22
team uh so I'm going to talk about uh
00:00:25
first let me show you some examples uh
00:00:27
here is one illustration about uh
00:00:30
uh Uber uh trip request uh in San
00:00:34
Francisco uh in this example I want to
00:00:36
show uh if you see the different colors
00:00:39
basically right means that we have more
00:00:40
request and blue which we have uh fewer
00:00:43
requests and what I really want to show
00:00:45
here is that we have different
00:00:47
categories of has guns for some has gun
00:00:49
we can have lot request for some has gun
00:00:51
we may not have basic we have very few
00:00:55
request so now let me come to the uh
00:00:58
problem right so actually not before the
00:01:00
problem let me talk about this bit more
00:01:01
about like a Hason data so as the camera
00:01:04
already mentioned like uh we are moving
00:01:07
from this Kindle uh D Fence to hasun one
00:01:11
with the big advantage that is more
00:01:13
flexible basically we're going to have
00:01:15
many small head guns and we are going to
00:01:17
have all kind of uber data in the
00:01:19
aggregation of small head guns so like
00:01:21
we can make a very flexible business
00:01:23
decision for example the price we can
00:01:25
change the price for each hasun instead
00:01:27
of like using one price for the entire
00:01:29
city
00:01:30
so uh because of the has gun for the
00:01:33
data we can have thousand of has gun for
00:01:35
example in San Francisco we have more
00:01:37
than 50,000 H guns the problem as show
00:01:40
in the previous example is that uh we
00:01:42
can have some H Gun have lots of data
00:01:44
which is good but if we also have lots
00:01:46
of head guns basically the data is very
00:01:49
uh
00:01:50
Spar uh for example maybe we have a few
00:01:52
head guns just have a few request for
00:01:54
the whole day right so when we have this
00:01:57
kind of data spacity issue for head guns
00:01:59
the big problem problem is that how can
00:02:01
we make the decision to be more stable
00:02:03
or more solid um because I'm from the
00:02:06
forecasting team next I'm going to talk
00:02:07
about uh uh the data spity issue uh
00:02:11
which uh in the forecasting problem so
00:02:15
for S time I'm not going to get into all
00:02:17
detail of the real time forecasting uh
00:02:19
let me just give you a very high level
00:02:21
overview about the problem and how we
00:02:23
try to solve the problem so in Uber we
00:02:26
want to predict the future we want to
00:02:28
predict uh the future Market Marketplace
00:02:31
state so like we can use the future
00:02:33
state to help us to at least get us to
00:02:36
make the decision in the
00:02:38
present so uh apparently this is a very
00:02:41
classical machine learning problem right
00:02:44
we want to build a prediction models we
00:02:46
want take in U data features to train
00:02:48
the model to do the
00:02:50
prediction uh some common feature like
00:02:52
we can use the historical data of the
00:02:54
has gun for example what happened in the
00:02:57
last few weeks for this has gun and we
00:03:00
also want to use a recent data let's say
00:03:02
last 60 Minutes how many trips how many
00:03:05
demand how many supplies in this has gun
00:03:07
we said that we also want to consider
00:03:09
some external signals for example we
00:03:12
look at um maybe there's some events in
00:03:14
this hun maybe um today is holiday maybe
00:03:18
there are some better some bad weather
00:03:20
like uh raining or snow storm Etc right
00:03:24
so the basic idea is that we want to use
00:03:26
all this data uh to uh build our model
00:03:30
to train our model to do the prediction
00:03:32
so now the problem comes so because we
00:03:34
are doing the high SC level prediction
00:03:37
if one has G has very SP data basically
00:03:39
maybe just a few data point in the long
00:03:43
history uh when we build a model become
00:03:45
very uh the performance gener very bad a
00:03:48
think about like if there almost no
00:03:50
request for last few days how what I
00:03:53
going to predict for next 60 minutes
00:03:55
right so this problem have been very
00:03:58
hard so so another observation is that
00:04:01
what we found that uh normally uh
00:04:03
neighboring head guns share the similar
00:04:05
pattern right so if we have one has gun
00:04:08
very few trips maybe there are not many
00:04:10
trips nearby H guns uh as a result one
00:04:13
solution what we take we have been
00:04:16
taking is that uh we try to smooth the
00:04:18
data the basic idea is that how can we
00:04:20
use the the neighbor has guns uh the
00:04:23
data the information to consider as the
00:04:25
current value of the hasun so that we
00:04:27
can use them for building models for
00:04:30
prediction so the generally there are
00:04:33
two approach for data smoothing one is
00:04:36
the we call the he classing the basic
00:04:39
idea is very simple we try to first
00:04:42
group High guns into clusters and then
00:04:47
uh what we do for this SM in basally we
00:04:50
are we are going to use all the values
00:04:52
of the hasun in one has cluster make
00:04:55
some aggregation and use the value as
00:04:57
the value as a value of the Hason we
00:04:59
looking at it so for this approach uh
00:05:03
the advantage is that uh we reduce the
00:05:05
competition we have much less data after
00:05:08
doing aggregation right so it's much
00:05:10
easier for us to train the model it's
00:05:13
much easier for us to do prediction the
00:05:15
trade off the trade off is that uh the
00:05:18
quality of the prediction model will
00:05:20
havly depend on the classroom quality I
00:05:23
think a cam also talk about the geofence
00:05:25
at the beginning right so if you do not
00:05:27
have a good clustering basically your
00:05:29
predi can also be affected another
00:05:32
problem is uh we may have some
00:05:34
neighboring head guns because they
00:05:36
belong to different he clusters right
00:05:39
and then we may use different models as
00:05:41
a result we may find we are having this
00:05:44
kind of arbitrary clustering boundaries
00:05:46
similar as what's shown in Cameron talk
00:05:49
we may have the kind of uh boundaries
00:05:52
just very close neighbors but they have
00:05:54
they they're using different
00:05:56
prices so um another approach uh because
00:06:00
of the disadvantage of the uh clustering
00:06:03
approach that's why actually a lot of
00:06:05
applications in Uber we use this King
00:06:08
smoothing so um let me use one example
00:06:11
to to show you what we mean by caring
00:06:14
smoothing a seem like we have this green
00:06:16
has gun and we are going to have
00:06:19
different layers of neighbors I just um
00:06:22
label them with one two three the rough
00:06:24
idea is of caring smoo that uh whenever
00:06:28
we look at one has gun we also look at
00:06:30
all this uh King neighbors uh K can be
00:06:34
uh different values all the Valu of the
00:06:36
neighbors and then use aggregation do
00:06:39
some like a u maybe average or some
00:06:43
other Dynamic waiting and then use that
00:06:46
value for the green has gun
00:06:48
here so uh uh compared with the
00:06:51
clustering approach right this approach
00:06:53
is much more
00:06:54
flexible uh because we are going to
00:06:57
process each hasun separately in that we
00:07:00
we have this kind of predefined
00:07:02
classing uh another um big Advantage is
00:07:06
that because of the flexible there are
00:07:08
no arbitrary clustering boundaries right
00:07:11
we can avoid this kind of uh boundary
00:07:14
issue so what's the trat off here the
00:07:17
trat off is that uh uh we can have much
00:07:21
more computation sybol when for each has
00:07:24
gun we are going to take our neighbors
00:07:27
and in many ubber application we may use
00:07:30
like caring maybe 10 or even 20 which
00:07:32
means that for each has gun we need
00:07:34
aggregate our neighbor maybe hundreds of
00:07:37
has gun
00:07:38
values and uh just look at San Francisco
00:07:42
we may have like more than 50,000 headon
00:07:44
in total think about like we are we want
00:07:46
to do this realtime forecasting right
00:07:48
actually many other realtime
00:07:50
applications which means that for every
00:07:52
minute we need do lot competition do the
00:07:55
smoothing and uh not only one city right
00:07:58
think about like uber we have uh
00:08:00
currently we have 700 plus cities uh in
00:08:02
production right so I all of them
00:08:05
together is a lot a lot more
00:08:08
computation so uh in the rest of the
00:08:10
talk I be focus a little bit more on how
00:08:13
can they uh have one efficient way to do
00:08:15
the
00:08:16
curing so I'm going to introduce the
00:08:19
hasun contion approach before I talk
00:08:22
about hasun convolution let me first
00:08:24
intro I mean at least bring the idea
00:08:25
again for the uh General convolution uh
00:08:28
concept I'm sure like most of you may be
00:08:30
familiar with convolution because
00:08:32
convolution is a foundational component
00:08:34
for uh convolution neural network it
00:08:37
been so popular now so uh in case you're
00:08:40
still not familiar let me just use one
00:08:42
example to show how the general
00:08:44
convolution
00:08:45
works I'm use a two- dimensional data uh
00:08:49
to show how it works so the rough idea
00:08:51
for conclusion is that assume like we
00:08:53
have input Matrix we Define a small
00:08:56
kernel Matrix or future maybe just a
00:08:58
small Matrix
00:09:00
and we want to run the convolution to
00:09:01
get the output
00:09:03
Matrix so the basic idea is that we want
00:09:05
to slide the small or cor kernel Matrix
00:09:08
on top of the big Matrix at each
00:09:10
position we are going to do the pair
00:09:12
multiplication and addition so like we
00:09:15
can get a result let me use one example
00:09:18
to show it how it works assume like we
00:09:20
have the green one which is a kernel
00:09:22
Matrix which is a small Matrix and we
00:09:24
can slide small SL small Matrix on top
00:09:27
of this uh uh big input Matrix for
00:09:30
example in the blue error what we do is
00:09:32
that we just do the uh Paris U value
00:09:35
multiplication and then we just do
00:09:38
some uh this is just another example we
00:09:41
just slide the kernel on top of this
00:09:43
input Matrix we can get all the com
00:09:45
result of course there are many details
00:09:48
for convolution I'm not going to get the
00:09:50
detail you can find a lot of tutorial
00:09:51
how to do the
00:09:53
convolution so uh what's the next
00:09:55
property for conclusion the next
00:09:57
property is that this is highly
00:09:59
paralyzable another thing that because
00:10:01
this is a so common uh fundamental
00:10:03
component of many application or
00:10:06
computation we have lots of material uh
00:10:09
implementation in different package for
00:10:11
example we have it in CI uh just here a
00:10:14
few example tender flow py Etc because
00:10:18
this is so fundamental there lots of
00:10:19
optimization have been done to optimize
00:10:22
the efficiency of conclustion right so
00:10:25
and also like we can use dpu to
00:10:27
accelerate our competition
00:10:30
so we have this contion on grid Matrix
00:10:32
so the question is that how can we do
00:10:34
this thing on the Hason
00:10:35
data so the idea actually is very
00:10:38
similar what we want to do is that we
00:10:40
want to run the conclusion on Hon data
00:10:43
assume like we have input set of H guns
00:10:46
we also want to Define our own kernel
00:10:49
this kernel we find like uh formar
00:10:52
basically the kernel like our caring
00:10:54
kernel and we want to run the
00:10:56
convolution uh using the uh kernel on
00:10:58
top the input Matrix right so uh the
00:11:02
question is that what's a challenge
00:11:03
what's a challenge the challenge here is
00:11:06
that there no uh existing or available
00:11:09
package which take in the hasone
00:11:12
coordinate data because most this kind
00:11:14
of conclusion package they just assume
00:11:17
you are going to have this uh grid
00:11:19
Matrix or array or tensor right so the
00:11:24
question for us is that how can we use
00:11:26
leverage existing convolution imple
00:11:29
ination so like we can get all the
00:11:31
benefit of either um the optimization or
00:11:35
some DP acceleration
00:11:37
right so the next I'm going to talk
00:11:39
about how we're going to do the
00:11:41
transformation so like we can leverage
00:11:43
existing convolution
00:11:46
implementations before I talk about uh
00:11:48
how we do the transformation how we U um
00:11:51
uh Implement hun conclusion let's me
00:11:53
first introduce a few uh Hasan
00:11:56
coordinate system because we need a
00:11:59
represent each Hason so like we know how
00:12:01
we can do the uh
00:12:03
transformation here are just a few
00:12:05
common very commonly used coordinate
00:12:07
system this one is called Cub coordinate
00:12:10
the basic idea is that for each Hason we
00:12:12
use three dimensional data to represent
00:12:15
three axes to represent the hon data
00:12:18
basically XYZ uh there's some TR off for
00:12:21
this approach uh first thing that we use
00:12:24
threedimensional data to represent two
00:12:26
dimensional data is not very memory
00:12:28
efficient uh another thing that uh we
00:12:31
have three values right so it's not
00:12:33
directly mapping to a grid system so
00:12:35
that's why we are not using this
00:12:37
coordinate
00:12:38
system another another coordinate system
00:12:41
is called like a double
00:12:42
coordinate uh I'm not going get into our
00:12:45
details but the rough idea is that for
00:12:47
each hasun we are using two numbers or
00:12:49
two AES to
00:12:51
represent uh so let's just I mean the I
00:12:54
mean theoretically it's easy to map to a
00:12:56
grid Square Matrix right so the question
00:13:00
is that after we do the direct mapping
00:13:03
we find like Oh The Neighbors in the
00:13:05
Hason data may not be direct Neighbors
00:13:07
in a grid or Square grid so the question
00:13:11
uh recall like when we do when we run
00:13:13
conclusion we normally need a solid
00:13:15
error right if have lots of gaps which
00:13:18
means that not efficient and we also do
00:13:20
lots of filtering to remove the the um
00:13:23
the value we do not need so we are also
00:13:26
not using this one uh this is a another
00:13:29
coordinate system which is AO coordinate
00:13:33
so uh for this coordinate we use two
00:13:35
numbers to represent for each has and
00:13:38
the good thing that uh after mapping we
00:13:41
can clearly say like uh uh The Neighbors
00:13:44
in the uh Hason data also roughly
00:13:47
Neighbors in the great Matrix and more
00:13:50
important this is a solid error and we
00:13:53
only have a few cells in the corner
00:13:55
which we can easily use the filter to uh
00:13:58
filter out
00:13:59
so the approach we have chosen or the
00:14:01
coordinate system we have chosen
00:14:02
basically based on the axle coordinates
00:14:05
coordinate next let me talk about like
00:14:07
how we can use this system to do Hason
00:14:10
conclusion now s like we have this uh
00:14:13
input data b a big input uh hcon data we
00:14:16
also Define our own small filter right
00:14:20
so the first step what we want to do is
00:14:22
that we first use the XO coordinate to
00:14:25
represent each
00:14:27
hun uh after we have this a coordinate
00:14:30
the next step we do is that we just do
00:14:33
the uh transformation similar as what we
00:14:35
have done for the AO data to map to uh
00:14:39
this uh grid
00:14:40
Matrix after we have this grid Matrix uh
00:14:43
sorry the square grid Matrix uh Things
00:14:46
become much easier we can leverage all
00:14:48
this uh implementations available right
00:14:52
so we just run conclusion on this Square
00:14:54
green Matrix we get this Con result
00:14:58
after we have this uh C result in GD The
00:15:02
Next Step that we can easily map it back
00:15:04
to uh hon
00:15:06
data so so far I just talk about how we
00:15:10
want to do the Hason conclusion just try
00:15:11
to recall we have Hason data we
00:15:14
represent in uh using our H using the H3
00:15:17
library and then the first step is that
00:15:20
we want to use uh the AO coordinate or
00:15:23
something similar we map to
00:15:24
corresponding representation for each
00:15:26
hasum and then we do the transformation
00:15:29
mapped into this uh Square grid Matrix
00:15:32
and then we run the convolution after we
00:15:35
get a convolution result we just map it
00:15:39
back so next let me just talk little bit
00:15:42
more about like how we are going to use
00:15:44
Hason conclustion for this uh um
00:15:48
carrying data smoing right so uh recall
00:15:51
like how we do the carrying uh this
00:15:53
thing rough idea is that we just need
00:15:55
the only thing we need to change that we
00:15:57
just need to set different filters for
00:16:00
example here I just show some example
00:16:02
the first one is a one ring filter in
00:16:04
this filter we just use equal weight if
00:16:07
we use this filter to do this Hason
00:16:08
conclusion basically means that we are
00:16:10
just doing a average oh sorry just a sum
00:16:15
but be beyond that we can also do this
00:16:16
kind of dynamic uh um Futures which
00:16:20
right so uh look at the example uh by
00:16:24
using this filter essentially we are
00:16:26
doing a weighted version of the s when
00:16:29
we do the king
00:16:30
smoothing another thing I want to
00:16:32
mention that when we talk about like the
00:16:34
advantage of U has gun uh is more
00:16:37
flexible right we want to handle H Gun
00:16:41
separately so that means that we may
00:16:43
want to use different keing size uh for
00:16:46
different head guns for example some
00:16:48
head gun in the city we may use a small
00:16:50
keing some head gun in the nonb ER maybe
00:16:53
you want to use a l
00:16:55
king so how can we handle this uh
00:16:58
actually the solution is very Brute
00:17:00
Force because the Hasan convolution is
00:17:03
so fast we just run aring um smothing in
00:17:08
parallel together and then we just put
00:17:10
data together later let me next I will
00:17:12
show you like how fast the
00:17:15
convolution yeah here here the P result
00:17:19
so for the P comparison we just
00:17:21
Implement three basic approach the first
00:17:23
one is the basic implementation we just
00:17:26
for each hasun we just get neighbers and
00:17:28
then we just uh uh do the aggregation or
00:17:32
do the sum do the mean Etc another
00:17:35
implementation we use a tender flow it
00:17:37
can be any uh available package but we
00:17:39
chose tender flow we use tender flow
00:17:42
implement the Hasan conion we draw it on
00:17:45
CPU the third solution is the same code
00:17:48
we draw on
00:17:49
GPU we can clearly say like by using
00:17:52
this tender flow implementation of hon
00:17:55
conclusion uh uh with GPU we can ch more
00:17:59
almost like hundred times performance
00:18:02
Improvement even with CPU we still
00:18:04
canchi like a 10 times performance
00:18:07
Improvement so to summarize a little bit
00:18:09
I just want to show here that
00:18:11
U uh by using this hasun conclusion we
00:18:15
can D dramatically improve our
00:18:17
efficiency because this kind of data
00:18:19
smoothing is so fundamental for many
00:18:21
applications actually this can help us
00:18:23
to uh really reduce the competition
00:18:26
bottom neck so like we can uh we can put
00:18:30
our more effort on this kind of business
00:18:32
logic part so I talk about like how we
00:18:35
can um do uh uh data smoothing using
00:18:39
Hasan conclusion but actually Hasan
00:18:41
conion is more than that because Hason
00:18:44
contion is just a so fundamental
00:18:47
component think about like in um deep
00:18:49
learning right so contion your network
00:18:52
has been uh basically a foundational
00:18:54
component for many de uh deep Learning
00:18:57
Network structure
00:18:59
so it's similar for our forecasting
00:19:01
problem just recall like for the real
00:19:03
time forecasting problem what we do is
00:19:05
that we try to uh uh predict for every
00:19:09
hasun in the world for every for every
00:19:12
minute for next 16 minutes in the future
00:19:14
different quantity for example Supply
00:19:16
demand uh driver waiting time Etc so uh
00:19:20
the input is all kind of data historic
00:19:22
data Rec data external signals and the
00:19:25
output is basically our
00:19:27
prediction so so uh if just look at each
00:19:30
has gone separately the big problem is
00:19:32
that because of data sparity it's very
00:19:35
hard to really improve the perance um
00:19:38
and here we can clearly use the Hun
00:19:40
conclusion right so the first step in I
00:19:44
mean this is just one uh neuron Network
00:19:46
structure I'm showing here for example
00:19:48
we have Tred different uh architectures
00:19:50
in this architecture so the first step
00:19:52
we going to do is that we're going to
00:19:54
run the hasun conclusion first basically
00:19:57
we can aggregate all kind of signal
00:19:59
together and then the aggreg signal will
00:20:02
fit into the later part of the deeper
00:20:04
new network for six time I'm not going
00:20:06
to talk about our detail of this network
00:20:09
structure but roughly this network
00:20:11
structure was inspired by residual
00:20:13
Network where we can increase uh Network
00:20:16
layer and also it's a u a lot of idea is
00:20:19
borrowed from wnet basically how you can
00:20:22
process sequence
00:20:23
data so uh there are some other uh
00:20:26
interesting part like in in this network
00:20:28
we are not really predicting one value
00:20:30
we're predict predicting a distribution
00:20:32
I'm not going to talk about R detail but
00:20:34
the rough idea here I want to mention
00:20:36
here is uh by using hasun convolution is
00:20:40
such a fundamental component for uh uh
00:20:43
this deep learning model and uh actually
00:20:46
it open us many uh many opportunities
00:20:49
because a lot of if you look at Deep
00:20:51
learning literature or research or even
00:20:53
industry just like so many component
00:20:56
work buil on top of conclusion before
00:20:59
because we have we use Hason data it
00:21:01
been very hard for us do the conclusion
00:21:03
a lot of things we cannot leverage but
00:21:05
now with hasun conclusion is more like
00:21:07
open the door so like we can leverage a
00:21:09
lot of existing
00:21:11
work so uh with that let me just Briefly
00:21:15
summarize summarize the talk uh so
00:21:19
uh uh in Uber we build H3 we build
00:21:22
hasone um C sorry has aggregation of
00:21:25
this Uber business data we have our kind
00:21:28
of data which are aggregated in the H
00:21:31
Gun level and and we also use small has
00:21:34
guns which mean that we're going to have
00:21:35
some B has gun non B has guns uh data
00:21:38
sparti can become isue for us to do uh
00:21:41
to make a you know solid stable decision
00:21:44
that's why we need a lots of data
00:21:46
smoothing algorithm and uh to make this
00:21:49
data smoothing more efficient we build
00:21:51
this has gun convolution which really uh
00:21:54
improve the computation cost um by using
00:21:58
he conclusion actually it opens the door
00:22:01
for us to uh do a lot of more fancy
00:22:04
stuff building machine learning model
00:22:06
deep learning models and we are trying
00:22:08
to use a hasun solution to build our
00:22:10
realtime forecasting our model um with
00:22:14
that I would just like thank you all to
00:22:16
come to talk