00:00:00
hi everyone my name is U and today I'm
00:00:03
going to introduce our work which is
00:00:06
just recently accepted at ecv 2024 um
00:00:10
our work is uh learning cross domain F
00:00:13
short object detection via enhanced open
00:00:16
set object
00:00:18
detector so uh first I would like to
00:00:21
give a brief introduction to the task of
00:00:23
cossom running as show in this picture
00:00:26
cost domain FAL rning transfers
00:00:28
Knowledge from ass Source data with
00:00:30
sufficient examples to Target it with
00:00:32
few labeled examples and Source data um
00:00:36
Source data and Target dat have uh
00:00:38
disjoint conet and also they belong to
00:00:41
this uh belong to different
00:00:43
domains um one limitation we would like
00:00:47
to highlight is that most of the
00:00:49
existing cross domain fish earning Works
00:00:51
mainly B based on the classification but
00:00:54
Overlook the object detection which is
00:00:57
also very important uh Vision task
00:01:00
so in this paper we are mainly motivated
00:01:03
to explore the uh to explore the object
00:01:07
detection in Cross domain future rning
00:01:09
we also call it as cross domain future
00:01:12
object detection CD
00:01:16
fsod as for the uh related work uh the
00:01:19
first comes the typical future learning
00:01:22
object detection methods which uh which
00:01:25
uh in concluded the mentor learning
00:01:26
based ones and transfer learning based
00:01:28
ones and also we would like to uh
00:01:32
highlight that recently a open set
00:01:34
detector dvit which only based on the
00:01:37
facial modality and build a open side
00:01:40
detector but it um achieves the S result
00:01:43
on the fal
00:01:45
OD uh in our paper we also uh
00:01:48
investigate the vi based object detector
00:01:51
and also cross domain FAL OD
00:01:55
method um as for the uh motivation of
00:01:59
the method mainly try to answer these
00:02:01
two uh keing questions so the first is
00:02:04
could the existing detectors especially
00:02:07
the open set detector generates to the
00:02:09
class domain settings the second is if
00:02:12
not how could the open set method be
00:02:14
first improved with the sign significant
00:02:17
domain Gap
00:02:19
issue we will first give our conclusion
00:02:22
um on the on this two questions so the
00:02:25
first is uh we investigate the S DV uh
00:02:29
and and we and then we tested the DV on
00:02:32
six novel Target which like ARX o CLI P
00:02:37
Dino deep um deep fish and um and two
00:02:41
others and the main conclusion is that
00:02:44
we can observe the performance of the S
00:02:46
V jobs very quickly on the novel Target
00:02:49
when there is a huge domain
00:02:52
Gap and then we consider to uh and then
00:02:55
we started to consider to consider what
00:02:58
is the challeng inside the behind this
00:03:00
task so um we found that when compar the
00:03:04
cross domain transfer with the in domain
00:03:06
transfer there there will uh our model
00:03:10
will meet like three main challenges so
00:03:13
the first is compared with the SCE data
00:03:16
the inter class viance in the Target
00:03:19
data usually is more small and also um
00:03:22
the we may meet the uh situation that
00:03:25
the object and the and its background is
00:03:28
very close to each other so we call it
00:03:30
call such phenomenon as a indivisible
00:03:33
boundary and also uh we investigate the
00:03:36
changing appearance which means that the
00:03:38
stale of the source and the target will
00:03:41
change and this is like our uh observ
00:03:45
the three main technical uh changes that
00:03:48
we will have to tackle under the cross
00:03:50
domain FAL object detection and as far
00:03:53
as the uh second question our conclusion
00:03:56
is yes and so we in our paper we Tred we
00:03:59
perect right um we propose a new method
00:04:02
CD V based on uh the dvit and we show
00:04:06
that with our CD V we can make the uh
00:04:09
original open set detector greater again
00:04:11
on the uh cross domain
00:04:14
setting and the so our specific Works uh
00:04:19
which try to uh tackle the first
00:04:21
question is could the existing detectors
00:04:24
uh especially open set detector gener to
00:04:26
the cost domain to answer this question
00:04:29
we first propose a new cross domain
00:04:31
ficial OD Benchmark with diverse St icv
00:04:34
and IB which is uh means the interclass
00:04:37
fance IND dividable
00:04:40
boundaries and in our uh in our
00:04:42
Benchmark we take the MS Coco as a
00:04:45
source data and we uh we introduce uh
00:04:49
six different Target as the target data
00:04:51
uh which has ax o clip part di deep Fish
00:04:56
N and
00:04:58
EOD and the what is uh wor uh Worth to
00:05:02
mention is that our Benchmark has uh
00:05:05
diverse their icv and IB so for example
00:05:08
Ms Coco is U has the photo realistics
00:05:11
they large icv slide slide individual B
00:05:15
Ray but for example for for the EOD
00:05:18
which is underwater dat set so it has
00:05:21
the underwat sty small SUV and
00:05:24
significant
00:05:26
IB um to investigate
00:05:30
was the exactly performance of the
00:05:32
existing detector under our new
00:05:33
Benchmark we uh study like four
00:05:36
different types of the Curren uh
00:05:39
proposed detectors like in concludes the
00:05:42
typical fish OD cross domain F OD VI
00:05:44
based detector and also open S detector
00:05:48
Here Comes our U main result we we show
00:05:51
the result under the T setting and in
00:05:55
our paper we um we very um how to say we
00:05:59
very uh detailed try to answer this
00:06:02
following questions for example does the
00:06:05
uh domain Gap post changes for the
00:06:07
future OD our V based Models Super U
00:06:10
better than the reset based ones and in
00:06:13
here uh we mainly we mainly highlight
00:06:17
that through the result we can um OB
00:06:20
observe very clearly that the domain Gap
00:06:23
did um POS a large uh how say and did
00:06:27
did POS huge challeng in for the current
00:06:29
FAL OD method and also and if we
00:06:32
directly use openid detector to um to
00:06:35
address a Closs domain FAL OD is
00:06:37
unfortunately not The Simple
00:06:39
Solution also in our paper we um analyze
00:06:43
like how the stale IV and IB in in
00:06:46
impact the domains and also the DAT set
00:06:49
and the basic conclusion is that the St
00:06:51
has relatively minor uh effector on the
00:06:55
domains which is very different from the
00:06:57
classification task and the ICB is very
00:06:59
notable but it turns out that we can
00:07:02
tackle the icv uh by the technique
00:07:05
method and the IB which is uh very very
00:07:09
hard to
00:07:13
tackle for the second question uh like
00:07:17
we want to First improve the open cell
00:07:20
detectors in the even even under the
00:07:22
domain Gap so ins paper we propose a
00:07:25
novel cossom Vision Transformer CD VTO
00:07:28
for this task and our method is uh is
00:07:31
built upon the dvit Baseline so here we
00:07:36
mainly summarize the main contri uh the
00:07:38
main modules of the dvit as as the BR um
00:07:43
as the bre Brew modules so basically the
00:07:46
dvit only use a dinov v tool to compare
00:07:49
the um features with the carer and
00:07:52
support so first the support images the
00:07:54
dvit use Ain D to to get is like a
00:07:58
pre-calculate the prototype for example
00:08:01
the shape and also it calculate some
00:08:03
backgrounds and then for the query image
00:08:06
we use the IP P Dino V2 and also the ROI
00:08:10
to get it uh features all upon every Roi
00:08:14
area and then compare the uh query Roi
00:08:18
with the support prototypes and based on
00:08:20
this uh it decided two different heads
00:08:23
one is a detection head another is one
00:08:25
vs rest classification head and then
00:08:28
output is the final
00:08:30
result um B based on the Baseline we
00:08:35
first propose three novel modules so the
00:08:38
first one is learnable instance features
00:08:41
which uh let the uh or original fixed
00:08:45
features became renable and the second
00:08:47
is the instance reating and the third is
00:08:49
a domain Pro um domain pretter and if we
00:08:52
if you still uh remember the three
00:08:55
technical changes that we mentioned in
00:08:57
the Closs domain setting we have the
00:08:58
changing appearance we have IND
00:09:00
dividable boundaries we have the small
00:09:03
icv U basically we decide the running
00:09:07
instance features and also together with
00:09:08
the 52 we want to make the features like
00:09:12
uh learnable and then through the 52 uh
00:09:15
we we we can manage to align its
00:09:18
features to the label so that means we
00:09:20
can uh somewh increase the
00:09:24
icv and the instance waiting is decided
00:09:27
to tackle the uh individ boundary which
00:09:29
means we want uh we want the high
00:09:32
quantity the high quantity instance with
00:09:35
uh with higher with higher weights and
00:09:39
also the domain propert is proposed
00:09:40
which we want the features to be more
00:09:42
robust to different domains th we can
00:09:45
tackling the changing
00:09:46
Styles next we will introduce each
00:09:49
modules respectively so the first come
00:09:51
the renable instance features and
00:09:53
fighing we first propose to fighing the
00:09:57
top two uh detection and classification
00:09:59
here and then uh with the F we first set
00:10:03
the original fixed instance features as
00:10:05
learnable uh which is our motivation is
00:10:08
that we want to BU F set the features
00:10:12
reenable and also fight that means we
00:10:14
can use the sematic labels to supervise
00:10:16
them so in that means we expected the
00:10:19
models could increase the icv by
00:10:22
aligning the features to distinct
00:10:24
semantic labels and to show that we
00:10:26
indeed achieve this goal through our
00:10:29
rable instance features we uh we did
00:10:32
this analyze in here we compared the uh
00:10:35
cosine uh distance of different class
00:10:38
protypes and compared the result with uh
00:10:41
initial fixed ones and our rable ones
00:10:43
and the result shows that with our
00:10:46
renable once we uh decrease the
00:10:49
similarity between different class which
00:10:51
means we uh in increase the uh
00:10:54
difference between different
00:10:57
class and uh
00:11:00
and for second is the instance reating
00:11:03
so instance reating uh means that we
00:11:05
want to reate this different instance
00:11:08
with different values so we expect the
00:11:11
high quty instance for example those
00:11:14
with low IB could be more valued and
00:11:17
here is the model design the model part
00:11:19
is very simple it's just using some MLP
00:11:22
to asside different ways to different
00:11:24
instance and here comes our analyze with
00:11:28
uh score we uh arranges them by the
00:11:31
score from high to low and from the
00:11:33
result we can observe that more
00:11:36
significant IB isue then it will get
00:11:38
less weight for example here it's like
00:11:41
the box is more uh close with its uh
00:11:44
background then it will get less less
00:11:49
value and the third one is our domain
00:11:52
prompt in our domain parameter we we
00:11:54
first introduce several virtual domains
00:11:56
that is not exist in the orinal
00:11:59
uh framework and then we decide to uh L
00:12:02
function so here is our introduce the
00:12:05
domain so we also set an as on larable
00:12:07
parameters and the motivation of
00:12:10
introduce this two domains is we we want
00:12:13
to uh use this domains
00:12:16
to as the noise and then we add it into
00:12:20
the original uh prototype so here we can
00:12:23
we can introduce it first with our uh
00:12:25
two different loss the first loss is the
00:12:28
domain that first LW in this domain
00:12:30
diverse loss we want the different
00:12:32
domains themselves to be different from
00:12:34
each other so they will be diverse and
00:12:36
the second one is the Prototype
00:12:38
consistent LW in this law as in this sub
00:12:41
figure we for example we add the two
00:12:44
different virtual domains into the same
00:12:46
uh class prototype but we me them to be
00:12:49
uh similar with each other that means
00:12:51
they are positive Pairs and then if we
00:12:54
add different uh version domains into
00:12:57
different class prototype then there
00:12:59
still different set means they are
00:13:01
negative negative PA So based on this
00:13:03
one we uh first propose our prototype
00:13:06
consistence with the Prototype
00:13:08
consistence we can somewhat achieve the
00:13:11
goal set even when we adding the
00:13:14
different domains into the same
00:13:15
prototype we can still keep the semantic
00:13:18
unchanged so uh thus we can improve the
00:13:22
models generation ability with different
00:13:24
visual Styles here also comes our our
00:13:27
analysis in here which shows the uh TS
00:13:30
of the domains and also the per bir Fe
00:13:33
features and in the first figure we show
00:13:36
that our we visualize the Lend domains
00:13:39
and we found that the Lend domains are
00:13:41
diverse and in this figure we add the
00:13:43
domains into the original class
00:13:45
prototype and in the figure we can um we
00:13:48
can draw the conclusion that adding the
00:13:50
domains into the original class P
00:13:52
doesn't cause the semantic shift
00:13:55
issue and uh with all the with all
00:13:59
modules added upon the Baseline we
00:14:02
finally build our CDV method and here
00:14:05
comes the final uh comparation result we
00:14:08
report one shot five short and a 10
00:14:10
short and we highlight that uh our C VI
00:14:15
significantly improve the base DV and
00:14:18
also outperform a as competitors
00:14:20
building a new s on this Benchmark and
00:14:23
we also did the evolation on different
00:14:25
modules in here we found that the F
00:14:27
helps a lot which is like uh has which
00:14:31
is like being uh reviewed by many other
00:14:34
papers but still uh we did a lot of
00:14:37
evolation study to find which is the
00:14:38
best way to fight and also we found that
00:14:41
all the modules contribute to the final
00:14:44
result uh also we visualiz uh we
00:14:47
visualize the detection result between
00:14:49
the uh Valiant dvit and also our CDV we
00:14:53
found that uh from the visualization
00:14:56
result we found our CD VI out
00:14:58
performance that the DV which is very
00:15:00
clear and also but still our method
00:15:03
could be first improved for example in
00:15:05
the an DT and uod which both of these
00:15:08
two this set had the uh significant IB
00:15:11
issue our methods also sjles to Output a
00:15:15
very good uh result so which means that
00:15:17
uh the benchmarks still have like a
00:15:20
large Lo to to be
00:15:23
improved uh finally comes to our
00:15:25
conclusion in this paper we we first
00:15:28
propose a comprehensive cross domain F
00:15:30
OD Benchmark with several novel dat set
00:15:33
that value the domain issue in the
00:15:36
Target dat set which has stale IV and IB
00:15:39
and second second we conduct extensive
00:15:42
study of existing open set detectors and
00:15:45
also we um investigate other types of
00:15:48
the detector for example VI based and
00:15:50
also as cossom fure methods and uh
00:15:53
thirdly based on the dvit we propose a
00:15:56
new enhanced open set detector Sy V and
00:16:00
in the CD VI we have three novel
00:16:03
modules uh here comes the link that if
00:16:06
you're interested in our work you can
00:16:08
easily uh get our paper and also we have
00:16:12
the pro project page and also we have
00:16:15
released all the code and all the data
00:16:17
set in in in the GitHub repo and uh
00:16:20
welome to use our uh this set and try
00:16:24
our method and uh thank you for
00:16:26
listening um and that's all thank you
00:16:28
bye bye-bye