00:00:00
hi so today what I want to talk about is
00:00:03
Unity train specifically having lots and
00:00:06
lots of them as a lot of people do
00:00:09
so people come on my Discord every day
00:00:11
and they tell me about their you know
00:00:14
eight by eight grid of trains which is
00:00:16
64 trains uh I've the largest I've had I
00:00:19
think was 644 trains now
00:00:24
you know obviously they want to use
00:00:25
microverse to edit that in real time and
00:00:29
there's some sort of limit on that but
00:00:31
it turns out the limit has a lot to do
00:00:33
with unity train and to really delve
00:00:37
into this I want to talk about why this
00:00:40
is probably a terrible idea
00:00:43
so first off
00:00:46
a lot of the people coming in who are
00:00:48
building these giant trains
00:00:50
um you know they're starting a new game
00:00:52
et cetera they know they need a large
00:00:53
world for some reason
00:00:55
um and my first thought is to question
00:00:57
that do you really need a large world
00:00:59
right now uh because working with very
00:01:03
large numbers of trains makes everything
00:01:06
slower makes your builds much larger to
00:01:09
give you an idea let's just talk about
00:01:10
the data size that we're dealing with
00:01:12
this is a
00:01:14
64 trains in this scene here that I'm in
00:01:18
and if you'll notice over here it's 10
00:01:21
24 for the alpha map and uh 10 25 for
00:01:25
the height map now let's assume our
00:01:27
terrain is going to have 16 textures on
00:01:29
it and that means it needs four Alpha
00:01:32
Maps plus a height map it'll generate a
00:01:34
normal map and maybe it has a whole
00:01:36
texture maybe it doesn't but just let's
00:01:40
just talk about just the Splat maps in
00:01:42
the height map okay for 64 terrains
00:01:45
that's 1.8 gigabytes of data
00:01:48
and that's on the GPU side which uses
00:01:52
uncompressed textures but of the right
00:01:55
bit Precision on the CPU side at least
00:01:59
from what I can tell from the API it's
00:02:01
using floats for everything or integers
00:02:03
when it could be using bytes or halves
00:02:06
so it may be that internally that's
00:02:08
optimized but given that the train
00:02:10
system is 12 years old and written at a
00:02:12
time when Unity was mainly focused on
00:02:15
small indie games
00:02:17
and trying to ship to the web and to the
00:02:22
nascent mobile phone scene that was
00:02:24
starting up I somehow doubt that that's
00:02:27
all optimized internally
00:02:29
and these games that people play that uh
00:02:32
you know have these large streaming
00:02:34
worlds they have custom terrain renders
00:02:36
designed for streaming for large scale
00:02:38
rendering
00:02:39
um and the unity system is very much not
00:02:41
designed for that so I suspect on the
00:02:44
CPU this is even more memory and uh and
00:02:49
you have again you have the
00:02:50
representation on both the CPU and the
00:02:52
GPU and loading 1.8 gigabytes and by the
00:02:55
way this is before you add a single tree
00:02:57
or piece of grass and all that takes a
00:03:00
lot of data too so
00:03:02
before you drag out anybody grid of
00:03:05
terrain think if if you really need it
00:03:07
and if you really want all the hassle of
00:03:10
getting that to work now there are
00:03:12
compression systems out there if there's
00:03:14
streaming systems out there but I will
00:03:16
tell you the second that that hits the
00:03:18
unity train system it is no longer
00:03:20
compressed right so if you are going to
00:03:23
have large view distances and things
00:03:25
like that then uh you're gonna have to
00:03:27
load all that stuff and it's going to be
00:03:29
uncompressed and in unity formats and I
00:03:31
will also tell you that it takes a long
00:03:33
time to set that data on a Terrain
00:03:36
so you're even with a streaming system
00:03:39
and a compression system you might get
00:03:40
the data size down somewhat but you're
00:03:42
probably going to be creating hitches uh
00:03:45
and then the other thing is just like is
00:03:47
your game really does it really need
00:03:49
that now
00:03:50
um you know a lot of a lot of people
00:03:52
will build the world first before they
00:03:54
build their gameplay but it's really
00:03:56
hard to fill a very large world and uh
00:03:59
with any kind of meaningful content and
00:04:02
you know you may just just find that
00:04:04
you've just created a big empty world
00:04:06
with a couple small pieces of content at
00:04:08
the end of the day and you've spent a
00:04:10
ton of time managing this giant train
00:04:12
checking in and out of per force you
00:04:14
know backing it up all the things even
00:04:17
if everything worked fine
00:04:19
so uh since I know everyone's going to
00:04:21
do this anyway because again I see this
00:04:23
on my Discord on a constant basis
00:04:27
um I decided that I would look into
00:04:29
optimizing this process a bit to make
00:04:31
microverse able to work with these this
00:04:33
many trains uh at you know a better clip
00:04:37
than it does now and I also love
00:04:39
optimization problems so it was fun for
00:04:41
me to to do this so
00:04:44
uh what I did was I ended up building my
00:04:46
own train renderer which again if you're
00:04:48
going to actually ship a game with large
00:04:51
streaming worlds is one of the options
00:04:53
that you might want to consider is
00:04:55
replacing Unity terrain and one of the
00:04:57
wonderful things about Unity is that you
00:04:59
can actually do things like that you can
00:05:01
write your own train system and get it
00:05:02
to work without pulling your hair out
00:05:05
um you know it's a very flexible engine
00:05:08
and you don't have to use their stuff
00:05:10
and in fact most of the uni games I've
00:05:11
shipped have used Unity stuff where it
00:05:14
didn't matter so much where we didn't
00:05:16
need performance where it wasn't about
00:05:20
the core of our game and I've replaced
00:05:23
most of the unity systems for things
00:05:24
that are really needed the performance
00:05:26
in core
00:05:28
um and you know you can do that because
00:05:31
you know the constraints of your game
00:05:32
and can make a less General system and
00:05:34
optimize for that so what I've done is
00:05:36
I've written a train system that's
00:05:38
designed to optimize out uh some of the
00:05:41
overhead of setting up data on a Unity
00:05:43
train so that microverse could be faster
00:05:46
this doesn't render more efficiently
00:05:48
it's not a better training system it's
00:05:51
it's a trained system designed to be
00:05:53
updated quickly
00:05:55
so
00:05:57
um I'm gonna show off a little bit of
00:06:00
unity terrain first and then we're gonna
00:06:02
time it and I'm going to show off my
00:06:03
train system and we're going to talk
00:06:05
about the differences
00:06:07
um so if I switch this to wireframe
00:06:11
and I deselect this uh what you'll see
00:06:13
that Unity train does
00:06:16
um is that when you set a height map on
00:06:19
it it actually generates this data to
00:06:21
tell it how much to tessellate in a
00:06:22
given area
00:06:24
and it does that by looking at the
00:06:26
differences in the height map if there's
00:06:27
a lot of differences if it's really
00:06:30
deviating from being a plane it'll
00:06:32
tessellate that area more and so you can
00:06:33
kind of see how it's tessellating these
00:06:35
areas here where there's more detail and
00:06:38
leaving other ones where it's more flat
00:06:41
um uh you know alone
00:06:44
so
00:06:46
um
00:06:46
that is a nice Advantage there's some
00:06:49
other things they do as well but it
00:06:51
means that when I set the data on the
00:06:53
train uh or anybody does at runtime it
00:06:56
has to recompute all of that and it's
00:06:58
expensive so
00:07:01
uh and again it's a very old system
00:07:03
not optimized in modern ways so
00:07:08
um let's time some stuff and see the
00:07:11
difference uh so right now
00:07:14
um if you look on the microverse
00:07:16
component if you have this all set up
00:07:18
correctly it requires microsplat and
00:07:21
I'll get into what you have to do there
00:07:22
in a minute but right now I have it set
00:07:25
to be always Unity which means always
00:07:27
use the unity terrain update it render
00:07:29
with it etc etc
00:07:33
um let's time that so again this is a 64
00:07:37
uh terrain scene it's massive uh it's
00:07:40
not fully detailed or anything but it's
00:07:42
enough to kind of show uh show the
00:07:44
issues here
00:07:46
so if I move
00:07:49
um
00:07:50
this slider uh while we're timing it it
00:07:54
will update the uh tray now microverse
00:07:58
builds terrain from the very beginning
00:08:00
of a flat empty terrain all the way to
00:08:03
the final result every time you make a
00:08:05
change
00:08:06
this gives it a non-destructive workflow
00:08:08
and uh it's a little it can be a little
00:08:11
slow when you have 64 trains and 1.8
00:08:14
gigabytes of data to update
00:08:16
um but microverse actually isn't the
00:08:19
bottleneck here it's actually
00:08:22
um I mean it you know it's it's not free
00:08:25
uh but the bigger bottleneck tends to be
00:08:27
the unity train and uh so by writing my
00:08:31
own train system I'm able to get around
00:08:32
some of this so if I drag this slider
00:08:34
around
00:08:37
uh we'll start getting some timings here
00:08:39
you can see the scene fully updating now
00:08:41
it's rebuilding everything it's
00:08:42
rebuilding the height Maps you know et
00:08:44
cetera et cetera so there we go
00:08:47
got some timings and let's scan over
00:08:49
these timings and see kind of what we're
00:08:51
spending here
00:08:54
um you'll get ones that are lower
00:08:55
because it didn't update in that frame
00:08:56
but you'll get these ones here like this
00:08:58
one's 180 there it updated
00:09:02
um 192.
00:09:05
149 it's not not terrible
00:09:08
194. so that's kind of our Peak right
00:09:12
now and if we look at the timings of
00:09:13
this you'll see that there's this
00:09:15
Graphics wait for present on Graphics
00:09:17
thread now this includes all the
00:09:20
microvas verse work plus all the work
00:09:23
Unity train does in like rendering base
00:09:25
maps and things like that when you set
00:09:27
it
00:09:28
so that's kind of the the bit that
00:09:32
um we're going to chip away at but we're
00:09:35
also going to chip away at the rest of
00:09:36
it because it was 194 milliseconds for
00:09:39
everything uh in total and
00:09:42
um I'm trying to think of something else
00:09:44
I should show here if I actually look at
00:09:46
well no I can't show you that
00:09:48
um but what we're seeing here is we're
00:09:50
seeing that there's 40 milliseconds of
00:09:53
overhead on the CPU that are not just
00:09:55
waiting around for the graphics card and
00:09:58
the graphics card is taking 151
00:10:00
milliseconds to do its work
00:10:03
so let's switch this to my proxy
00:10:05
renderer
00:10:09
I go here and we change this to always
00:10:10
proxy and that means it's always going
00:10:12
to render with my rendering system
00:10:15
and
00:10:16
um
00:10:17
I'm going to go to the mud here
00:10:23
and here we go and also keep in mind it
00:10:26
has to render the whole terrain as well
00:10:29
um
00:10:31
so
00:10:33
let's drag that layer weight around
00:10:39
go
00:10:40
where I lose it
00:10:43
and now we see
00:10:46
122 131
00:10:50
130
00:10:52
so we were at like we we knocked a good
00:10:55
50 milliseconds off of most of these
00:10:56
updates
00:10:58
um which is nice that's a nice gain and
00:11:01
I think I can probably get that a little
00:11:03
faster too
00:11:05
um
00:11:06
and what you're also seeing here is
00:11:08
because I'm I'm in a particularly Long
00:11:09
View it's also rendering uh the scene
00:11:12
which takes time it has to call it all
00:11:14
and render it which is not fast either
00:11:18
and we'll see that the graphics wait for
00:11:19
presents on sync went down by 30
00:11:22
milliseconds and then the overhead on
00:11:25
the C plus plus or sorry C C sharp and
00:11:28
unity terrain system side is is almost
00:11:30
gone as well so that went from this went
00:11:34
down by about 30 milliseconds and this
00:11:36
went down uh the remainder by another I
00:11:40
don't know another 40 or 50. so we saved
00:11:44
quite a bit of time uh on updates with
00:11:47
this
00:11:48
now there is a slight difference between
00:11:50
my renderer and that because my render
00:11:53
is not Computing that uh LOD map and and
00:11:56
doing the tessellation the way that uh
00:11:59
Unity does mine just uses Hardware
00:12:01
tessellation and so if you look at
00:12:03
something like this sometimes let me see
00:12:05
if I can find an area where it's doing
00:12:06
let me make sure I am on the proxy
00:12:09
renderer right
00:12:10
it's great when you can't tell them
00:12:12
apart all right there we go I am on the
00:12:13
proxy renderer so
00:12:16
um
00:12:17
I like it on it
00:12:20
there we go
00:12:22
um sometimes when you move in and out
00:12:24
you can see there we go the tessellation
00:12:27
will swim a little bit and that's
00:12:29
because uh the tessellation for
00:12:33
um Hardware tessellation is a dynamic
00:12:35
tessellation and so it's kind of
00:12:38
constantly changing
00:12:39
and the other thing you'll notice is
00:12:41
that if I go into a wireframe mode it's
00:12:45
uh
00:12:46
it's not just squares right it kind of
00:12:49
is formed by this triangle shape
00:12:52
uh so it doesn't line up with the splits
00:12:55
where the unity train is uh and it
00:12:57
actually goes a fair bit higher res and
00:12:59
probably could be optimized for
00:13:00
rendering a little bit it's not designed
00:13:02
for rendering it's really designed for
00:13:04
fast updates
00:13:06
um
00:13:07
so anyway
00:13:09
um that's the basic idea of this is
00:13:12
let's just eliminate a bunch of the
00:13:13
train bottlenecks and get faster editor
00:13:15
performance and then there's one other
00:13:17
feature that I should show off
00:13:20
and uh
00:13:22
that is there's this mode here called
00:13:26
proxy while updating and what that means
00:13:29
is that I'm going to use a Unity train
00:13:31
renderer and then when you're done
00:13:34
oh here's another problem that you need
00:13:36
train does I am just slightly above the
00:13:38
maximum height of the height map and so
00:13:41
it Clips there whereas my renderer does
00:13:43
not do that because it doesn't care
00:13:47
um
00:13:48
so let me just get that
00:13:51
uh oh here we go pull this down a little
00:13:55
bit
00:13:55
there we go
00:13:58
um
00:14:00
so yeah uh
00:14:05
there we go
00:14:10
um so proxy while updating what it does
00:14:13
is it turns on the proxy rendering when
00:14:16
you're doing an adjustment and turns it
00:14:18
off when it thinks you're done with it
00:14:21
and so here I can just slide this layer
00:14:24
back and forth you can see that the
00:14:26
train does look a little bit different
00:14:28
but as soon as I stop messing with it
00:14:30
it'll pop back to Unity terrain when it
00:14:33
thinks I'm done
00:14:35
and uh you can see this with the
00:14:38
wireframe view pretty clearly
00:14:42
um
00:14:44
there we go so you can see it's Unity
00:14:46
train right now
00:14:47
and then it switches to my proxy
00:14:49
rendering
00:14:51
and sometimes it doesn't always get it
00:14:54
back
00:14:55
but if you save the scene or or hit the
00:14:58
sync button uh one of the problems with
00:15:01
sliders is I don't know when you're
00:15:02
really done with them
00:15:04
so it'll snap back to Unity rendering
00:15:07
when it when it does have a moment and
00:15:09
it's like hey I don't think you've done
00:15:11
anything in a little while I think I can
00:15:12
I can I can do the work now and it'll I
00:15:15
call it sneaky save back I do it with uh
00:15:18
the height editing as well to speed
00:15:19
things up
00:15:20
so anyway
00:15:22
um that was a nice Improvement for large
00:15:24
terrain supports uh if you're
00:15:27
that time that Unity spends uh on the
00:15:30
CPU is pretty constant
00:15:33
meaning that
00:15:35
a beefier machine is a little bit faster
00:15:37
but it's not world's a difference faster
00:15:39
I'm running on a laptop right now so if
00:15:42
you're on a desktop machine with like a
00:15:44
you know a modern 40-90 or whatever uh
00:15:48
that all that GPU time is going to be
00:15:50
cut way way down and this probably you
00:15:53
know will edit in real time again
00:15:56
so especially with the proxy rendering
00:15:59
um that helps a lot
00:16:01
um so anyway right now this requires
00:16:04
micro Splat because
00:16:06
um
00:16:07
uh I have to use a special Shader and so
00:16:10
I can generate that Shader for me and uh
00:16:13
it can render it all in one pass and
00:16:15
using the formats that I use and
00:16:17
um I may extend this to work with
00:16:19
unity's Shader uh if it's useful enough
00:16:22
to people
00:16:24
that they want it to but it will require
00:16:26
a whole different rendering path because
00:16:28
of the way Unity Shader renders terrain
00:16:30
it actually requires drawing in a whole
00:16:32
bunch of times so
00:16:34
um so yeah the way this is set up right
00:16:36
now
00:16:37
turn this back to shaded
00:16:41
is that once you convert to microsplat
00:16:46
you can turn on a setting in the
00:16:50
microsplat Shader right here my reverse
00:16:54
microverse preview export microverse
00:16:57
preview if you turn that on it'll output
00:16:59
the special Shader and then the
00:17:03
controls here for it will show up
00:17:06
and let you uh let you
00:17:11
choose the mode and use the new renderer
00:17:13
so
00:17:15
anyway
00:17:16
uh that's a gist of what I wanted to
00:17:18
talk about
00:17:20
um and again just trying to make my
00:17:22
stuff not the bottleneck in your crazy
00:17:24
adventure for your 2000 terrain game so
00:17:28
all right bye