00:00:00
hey there so today i thought it would be
00:00:02
fun to go through some unity
00:00:04
optimization techniques that uh you hear
00:00:06
around to see how much impact they
00:00:08
actually make and if they're even worth
00:00:09
using i've just set up this uh little
00:00:12
test bay here all right so the first one
00:00:14
up is send message when i first started
00:00:16
unity i found this send message function
00:00:18
and i'm like well this is super
00:00:19
convenient as long as you've got a
00:00:20
reference to the object you can just
00:00:21
call send message
00:00:23
but
00:00:24
as you hear time and time again send
00:00:25
message is not good practice and it's
00:00:27
slow
00:00:28
in the first test here i've got uh it's
00:00:30
going to be running send message in the
00:00:32
second test i'm going to be grabbing the
00:00:34
uh component directly and then calling
00:00:36
calling the function directly
00:00:39
all right so let's run send message and
00:00:40
we'll do max iterations
00:00:47
and there we go so as you can see get
00:00:49
component is two times faster than send
00:00:51
message and on top of that i mean you
00:00:55
also get the class ready to go so so we
00:00:58
can now manipulate and uh call other
00:00:59
functions in this class if we want
00:01:01
whereas send message you have no idea
00:01:03
you're kind of just like
00:01:04
shooting in the in the dark
00:01:06
there's no way to check here to see if
00:01:08
this object that we've got actually has
00:01:10
this function uh there's really no
00:01:12
reason to use send message ever right
00:01:14
like you should always you should always
00:01:16
just grab the class well at least i
00:01:17
haven't found a good reason to use send
00:01:19
message maybe someone else hasn't you
00:01:20
could leave it in the comments i'd be
00:01:22
interested to know okay next is the
00:01:24
intern call caching i've just started
00:01:26
using writer over the last few months
00:01:28
and i had no idea this was even a thing
00:01:30
but every single time you try to use
00:01:32
transform if we just decompile this here
00:01:35
you'll see that it's actually an
00:01:36
external call so basically this is uh
00:01:39
them shooting this across to the c plus
00:01:41
plus side um and then it would shoot
00:01:43
back and then we would get our transform
00:01:46
so
00:01:47
for example if we were just doing this
00:01:48
in an update loop right
00:01:51
this would be three calls to the c plus
00:01:53
plus side and then back again for us to
00:01:55
get the position rotation and scale
00:01:57
so a more favorable way of doing this is
00:02:00
caching it locally and then grabbing it
00:02:03
from that cache but an even better way
00:02:05
is to globally case it and then uh and
00:02:09
then grab them from the global cache and
00:02:10
i will show you
00:02:12
so if we do 900 000 calls
00:02:15
so as you can see there is actually a
00:02:16
decent difference here
00:02:18
fully caching it is
00:02:20
two times as fast as not caching it at
00:02:21
all
00:02:22
so yeah this is 900 000 calls uh sorry
00:02:25
that honestly the the difference is not
00:02:28
too major like if you're talking about
00:02:29
something like
00:02:30
3700 calls
00:02:32
uh you're looking at you know not even
00:02:35
one millisecond and uh i highly doubt
00:02:37
you're doing that many uh api calls to
00:02:40
the to the unity api per per frame so
00:02:43
uh
00:02:44
you know
00:02:45
take it with a grain of salt don't do it
00:02:47
for performance do it for maybe just the
00:02:50
reason that you could
00:02:52
possibly rename this to t or something
00:02:54
and make your code a little bit shorter
00:02:55
like that i know a lot of people don't
00:02:57
like doing that they like descriptive
00:02:58
names so do i but
00:03:01
you know performance wise it's not gonna
00:03:03
save your game okay so this next one is
00:03:05
interesting uh and interesting because
00:03:08
it goes against what i see everybody's
00:03:10
suggesting and that is that you should
00:03:13
not use vector3 distance
00:03:15
right in this top one here because
00:03:17
inside here if we decompile it
00:03:20
it makes use of a square root which is
00:03:22
known to be slow
00:03:24
you should use this alternative here
00:03:26
which is just a simple square magnitude
00:03:30
but
00:03:31
according to all of my tests it is
00:03:34
like one or the other okay so 900 000
00:03:36
calls uh square root is actually faster
00:03:39
there
00:03:41
slower there
00:03:42
right slightly slower there they're
00:03:44
honestly neck and neck uh so there to me
00:03:48
also this is far more readable
00:03:52
than this right this distinctively tells
00:03:55
me i'm trying to find the distance here
00:03:57
so when it comes to distance in my
00:03:58
opinion just use vector3 distance
00:04:01
scrap this square magnitude nonsense uh
00:04:04
because it's it's simply just like not
00:04:05
routinely faster sometimes it's slower
00:04:08
uh
00:04:09
obviously i'm this is one simple test
00:04:11
right with with uh hard coded numbers
00:04:13
here while these are random but
00:04:15
you know there could be a time where
00:04:17
square magnitude might be faster but i
00:04:19
have not found it
00:04:20
so there you go okay so this next one is
00:04:23
quite interesting uh find objects so
00:04:26
on startup here uh when this test starts
00:04:29
up all i'm doing is just generating a
00:04:30
bunch of objects so you can see under
00:04:32
here a bunch of trees
00:04:34
in the tree there's a bunch of layers
00:04:36
and then in the layers there's a bunch
00:04:37
of objects on the objects
00:04:39
i've got them tagged find and i've also
00:04:41
got this find helper
00:04:43
which is just an empty class
00:04:45
so
00:04:46
in this first benchmark i'm just finding
00:04:47
them by tag in the second benchmark
00:04:50
finding by type
00:04:52
so let's do
00:04:54
that's right so this is actually very
00:04:55
slow
00:04:56
so i'm doing recommended 1000 iterations
00:05:02
as you can see it's quite slow
00:05:05
and there we go find object of type is
00:05:07
significantly slower so i actually don't
00:05:10
know how these work behind the scenes
00:05:11
but someone in my discord made a good
00:05:13
point in that find objects with tag
00:05:15
probably just looks at the transform
00:05:17
level checks the tag and says yup good
00:05:19
to go
00:05:20
whereas the find object of type probably
00:05:22
has to go through every single object
00:05:23
looking at every single component right
00:05:25
the transform whatever else it's got
00:05:27
sprite renderer image uh collider all
00:05:30
this stuff going through them and then
00:05:32
finally returning it so it has to do an
00:05:33
exhaustive search of every single object
00:05:36
of every single component so uh that
00:05:38
would make sense why it takes so much
00:05:40
longer so yeah after seeing these so
00:05:43
let's let's just maybe do four thousand
00:05:45
probably gonna be waiting for a second
00:05:46
four
00:05:47
thousand
00:05:50
let that load
00:05:54
[Music]
00:05:56
i've actually got a webgl build of this
00:05:58
and if you did this amount on webgl it
00:06:00
would absolutely crash your browser
00:06:02
there we go so final objective type took
00:06:05
18 seconds to do 4 300 of them uh so
00:06:09
yeah my recommendation is never use
00:06:12
these two functions in any game loop
00:06:15
definitely not an update i honestly
00:06:17
wouldn't even use them in a in a state
00:06:19
change on like a turn-based combat game
00:06:21
right because they're honestly so damn
00:06:23
slow usually there's a better way to
00:06:25
find your objects right have them
00:06:27
in a list of some kind on a manager or
00:06:30
any any number of things the only time i
00:06:32
would ever say
00:06:33
use this is once
00:06:36
on the initialization of your your
00:06:38
classes at the very start of the scene
00:06:39
and start up or awake that is the only
00:06:41
time uh otherwise avoid them like the
00:06:43
plague because they're super damn slow
00:06:45
okay so this next one is very
00:06:47
interesting as well it's about using the
00:06:49
non-alloc versions of the uh physics
00:06:52
functions so for example here in our
00:06:54
benchmark one we're using we're grabbing
00:06:57
the results of physics overlap sphere
00:06:59
okay and if we uh
00:07:01
just have a look here i've just got a
00:07:03
uh
00:07:07
oh gosh
00:07:08
i've just got a bunch of colliders in
00:07:10
this area here
00:07:11
and uh when we click it it's just gonna
00:07:13
overlap sphere and grab them all and in
00:07:15
the second benchmark here we're actually
00:07:16
using overlap sphere non-alloc and we're
00:07:19
sending in our pre-made uh collisions
00:07:22
array right so it's just going to be
00:07:24
reusing that same array instead of
00:07:26
creating a new one and returning it to
00:07:28
us uh let's see how that one goes let's
00:07:31
use max iterations
00:07:33
okay apparently i said that that should
00:07:35
be the maximum so let's do that instead
00:07:38
and as you can see the non-alec version
00:07:41
is slower let's actually do a little bit
00:07:42
more than that
00:07:45
something like that
00:07:48
yeah so you know it's coming in close to
00:07:51
almost double uh the speed but
00:07:54
that doesn't mean you shouldn't use it
00:07:56
so let's actually remove the
00:07:59
normal one the overlap sphere one
00:08:01
and let's head back into unity
00:08:03
okay so let's press play and let's open
00:08:06
the profiler
00:08:07
and run the non-alloy
00:08:10
so then this will be calling the actual
00:08:11
non-allocated version
00:08:13
um
00:08:15
and let's call it again
00:08:18
and again
00:08:20
as you can see there's no garbage that's
00:08:22
being allocated
00:08:23
which makes sense right it's we're using
00:08:25
the non-allocated version so now if we
00:08:27
just swap those
00:08:30
like that and now we're using the actual
00:08:31
uh normal version the allocating version
00:08:35
and we press play
00:08:38
let's run that
00:08:41
we will see here
00:08:43
that it just allocated 38
00:08:46
mb of uh garbage right so obviously uh
00:08:50
arrays uh reference type so it goes
00:08:52
straight to the heap and
00:08:54
when that goes out of scope eventually
00:08:56
the garbage collector needs to come and
00:08:57
pick it up so let's just run that again
00:09:02
yeah 38mb of uh garbage so
00:09:06
yes the non-alec version is slower to
00:09:09
run
00:09:10
but it doesn't generate any garbage
00:09:12
whatsoever so yeah you really just need
00:09:14
to know do i want it to be super super
00:09:16
fast or do i want to allocate no garbage
00:09:18
and in most cases you're probably going
00:09:21
to want to not allocate any garbage so
00:09:23
this one will win most times right
00:09:27
but
00:09:28
uh everybody's game is different and you
00:09:30
might not give a damn about garbage you
00:09:33
might just care about the speed
00:09:35
okay the next one camera access now
00:09:38
whenever you see
00:09:40
any code snippet of someone using
00:09:42
camera.main in update you will
00:09:44
absolutely see the next comment of
00:09:45
someone saying you shouldn't use
00:09:46
cameraman in the update function which
00:09:48
is fine because that's what we're all
00:09:50
told but then just recently actually i
00:09:52
was making a tutorial and i was cashing
00:09:54
the camera just like this here
00:09:56
and someone made a comment saying you
00:09:58
don't need to do that anymore because
00:10:00
unity now caches it uh i thought oh
00:10:02
that's cool but then i tested it and
00:10:05
there's still some weird results so
00:10:07
let's just run this let's do max
00:10:09
iterations here
00:10:12
and you can see using camera main
00:10:14
find with tag is obviously slow as we've
00:10:16
discovered that final objective type is
00:10:17
even slower
00:10:19
but as you can see
00:10:20
caching the camera is still superior
00:10:24
and if we look at camera main
00:10:26
we will see
00:10:28
that it is still an external call so
00:10:30
it's still going to c plus plus so they
00:10:32
may have cased it but they've changed it
00:10:33
on the c plus plus side
00:10:35
so if for some reason in update you're
00:10:38
uh
00:10:39
calling cameraman 53 000 times uh you're
00:10:42
only gonna lag for eight milliseconds
00:10:45
right so calling it one time
00:10:48
or even two thousand times you're not
00:10:50
even it's not even going to slow you
00:10:51
down by one millisecond
00:10:53
okay link versus loop so everybody says
00:10:56
don't use link in unity ever
00:11:00
i
00:11:01
think that's i think you
00:11:02
certainly should
00:11:03
use link
00:11:05
but
00:11:06
ensuring that you use link in the
00:11:07
correct places and at the correct time
00:11:10
so let's just run this link loop it's
00:11:12
saying 1000 max iterations or else would
00:11:15
be here all night
00:11:18
okay so link is obviously the slowest
00:11:20
here we've got uh for loop which is
00:11:23
faster cased for loop which is faster
00:11:25
still i'll show you what that is in a
00:11:26
second and a four-inch loop which is
00:11:29
even faster now i'll show you the code
00:11:32
so
00:11:33
uh we're just making a
00:11:35
uh list here of this which is just an
00:11:37
internet float and all these tests are
00:11:39
doing are just filtering a little bit
00:11:41
and then adding to a list
00:11:43
so this link one is just checking that
00:11:45
this in value is more than this
00:11:47
threshold here and then it's just
00:11:48
selecting all of the remaining ones uh
00:11:51
float value these are doing the exact
00:11:53
same thing just just with loops uh so as
00:11:56
you can see here we're looping through
00:11:57
them all if it is over the threshold
00:11:59
then add it to this list
00:12:01
this cached for loop uh was actually
00:12:03
just caching the count okay so sorry
00:12:05
instead of having data.count here and
00:12:07
doing it every iteration we're actually
00:12:10
caching it
00:12:11
my buddy just wanted to check to see if
00:12:13
that actually makes a difference i was
00:12:14
curious too and it actually does it
00:12:16
always seems to be just that a little
00:12:18
bit faster
00:12:19
and then the 4-h loop so 4-h is
00:12:22
generally faster if your for loop has to
00:12:26
access the
00:12:27
index of the array more than once so if
00:12:30
you're only accessing it once for loop
00:12:32
will always be faster
00:12:33
twice or more you should probably use a
00:12:36
4h
00:12:38
but yeah so i built this and i put this
00:12:40
in webgl and these numbers are all back
00:12:43
to front so in the editor is completely
00:12:46
different to your built webgl game who
00:12:49
knows if it's a built standalone game it
00:12:51
might be different still i would be
00:12:52
really curious
00:12:54
if you guys want to go to the webgl and
00:12:56
run these yourself and tell me if any of
00:12:59
these are different to what you found
00:13:02
here it's really weird i really want to
00:13:04
know what what what is up with that also
00:13:06
my friend did this test in both edge and
00:13:09
firefox
00:13:11
drastically different results so
00:13:13
man it's like super hard to know
00:13:15
what is performant and what is not
00:13:18
and the very last one here is string
00:13:19
builder now i know i've done a few
00:13:21
community posts saying you should
00:13:22
definitely use string builder i just
00:13:24
want to show you why so here we've just
00:13:26
got a phrase subscribe which you should
00:13:28
do
00:13:29
account 100 so we're going to
00:13:31
basically do subscribe 100 times in a
00:13:34
string
00:13:35
so this top one is just simple
00:13:36
concatenation we're just creating a
00:13:39
string concatenating to it the second
00:13:41
one is using a string builder
00:13:43
looping appending and then finally to
00:13:45
stringing it
00:13:46
so if we run this now
00:13:48
uh max 3000 just so we don't lag you'll
00:13:51
see the difference is ginormous right so
00:13:55
let's actually do a bit more than that
00:13:56
let's do 13.
00:13:57
oops
00:13:58
22 why not
00:14:01
oh
00:14:03
it's a lot of string
00:14:05
so yeah absolutely use a string builder
00:14:07
not just for speed performance but also
00:14:10
for garbage allocation
00:14:12
uh
00:14:13
string builder is the way to go unless
00:14:14
it's just like two things if you're just
00:14:16
concatenating two things together do it
00:14:18
who cares
00:14:19
but yeah use a string builder otherwise
00:14:22
um and i've actually got this one last
00:14:23
one that i wanted to show you which is
00:14:24
order of operation so the idea is just
00:14:27
that floats are more expensive to do
00:14:29
arithmetic on than integers vectors more
00:14:32
expensive than floats quaternion's more
00:14:35
expensive than vectors so on
00:14:37
so you should
00:14:39
order your operations in that logical
00:14:42
order if you can say for example this
00:14:44
top one it's float times float times
00:14:46
vector
00:14:48
this next one is float times vector
00:14:49
times float and then vector times float
00:14:51
times flow and if we run this let's do
00:14:54
max iterations
00:14:55
you'll see that it is actually two times
00:14:57
faster let me just give you a little
00:14:59
example of this in action so let's say
00:15:01
transform position plus equals uh let's
00:15:03
say you're wanting to move left
00:15:05
um and you're doing time speed
00:15:08
times time dot delta time right you even
00:15:11
see this in the docs if you go look
00:15:12
through the docs you will see
00:15:14
unity uh doing this so this is an
00:15:17
example of doing it incorrectly right
00:15:18
this is vector times float times float
00:15:20
so this would actually be two times
00:15:22
faster if we flipped this to the other
00:15:24
side so yeah just keep that in mind and
00:15:26
if you go to this link here uh you will
00:15:28
see that unity themselves actually do
00:15:30
recommend it doing it this way so yeah
00:15:32
that's it i hope you enjoyed uh these as
00:15:34
much as i enjoyed making them because i
00:15:36
thought they were quite interesting if
00:15:37
you've got any other
00:15:38
benchmarks that i should add let me know
00:15:40
in the comments and i'll add them here
00:15:42
because i'm interested to uh make an
00:15:44
exhaustive list
00:15:45
and yeah that's it see you in the next
00:15:47
video bye