ECCV24: Cross-Domain Few-Shot Object Detection via Enhanced Open-Set..

00:16:30
https://www.youtube.com/watch?v=t5vREYQIup8

الملخص

TLDRThis presentation discusses a novel approach to cross-domain few-shot object detection (CD-FSOD) through an enhanced open set object detector. The authors highlight the limitations of existing methods that focus mainly on classification, emphasizing the importance of object detection. They propose a new benchmark and a method called CDV, which addresses challenges such as inter-class variance and indivisible boundaries. The results indicate significant improvements in detection performance across various datasets, demonstrating the effectiveness of the proposed modules and methods.

الوجبات الجاهزة

  • 📊 Introduction of CD-FSOD and its significance
  • 🔍 Challenges in cross-domain object detection
  • 🛠️ Proposal of the CDV method
  • 📈 Significant performance improvements
  • 📚 New benchmark for evaluation
  • 💻 Code and datasets available on GitHub
  • 🔑 Key contributions of the research
  • 📉 Analysis of existing detectors
  • 🌊 Diverse datasets used for testing
  • 🚀 Future work to address remaining challenges

الجدول الزمني

  • 00:00:00 - 00:05:00

    The speaker introduces their work on cross-domain few-shot object detection, highlighting the limitations of existing methods that focus primarily on classification rather than object detection. They aim to explore object detection in cross-domain settings, addressing the challenges posed by domain gaps and the need for effective transfer learning techniques.

  • 00:05:00 - 00:10:00

    The research investigates the performance of existing open set detectors in cross-domain scenarios, identifying three main challenges: inter-class variance, indivisible boundaries, and changing appearances. The authors propose a new method, CDV, to enhance the capabilities of open set detectors in these challenging environments, supported by a new benchmark that includes diverse datasets.

  • 00:10:00 - 00:16:30

    The proposed CDV method incorporates three novel modules to improve detection performance: learnable instance features, instance weighting, and domain prompting. The results demonstrate significant improvements over baseline models, with the CDV method outperforming competitors and addressing the challenges identified in cross-domain few-shot object detection. The speaker concludes by inviting the audience to explore their work further.

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • What is the main focus of the research?

    The research focuses on cross-domain few-shot object detection (CD-FSOD) using an enhanced open set object detector.

  • What are the main challenges addressed in the study?

    The study addresses challenges such as inter-class variance, indivisible boundaries, and changing appearances in cross-domain settings.

  • What is the proposed method called?

    The proposed method is called CDV, which enhances existing open set detectors.

  • What datasets were used for benchmarking?

    The study uses MS Coco as the source data and introduces six different target datasets for benchmarking.

  • What are the key contributions of the paper?

    The key contributions include a new benchmark for CD-FSOD, an extensive study of existing detectors, and the introduction of the CDV method with novel modules.

  • Where can the code and datasets be found?

    The code and datasets are available on the project's GitHub repository.

  • What is the significance of the proposed benchmark?

    The benchmark values domain issues in target datasets, focusing on inter-class variance and indivisible boundaries.

  • How does the CDV method improve detection performance?

    CDV improves detection performance by introducing learnable instance features, instance weighting, and domain prompting.

  • What were the results of the experiments?

    The experiments showed that CDV significantly outperformed existing methods and improved detection accuracy across various datasets.

  • What future work is suggested?

    Future work includes further improvements to tackle significant indivisible boundary issues in certain datasets.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:00
    hi everyone my name is U and today I'm
  • 00:00:03
    going to introduce our work which is
  • 00:00:06
    just recently accepted at ecv 2024 um
  • 00:00:10
    our work is uh learning cross domain F
  • 00:00:13
    short object detection via enhanced open
  • 00:00:16
    set object
  • 00:00:18
    detector so uh first I would like to
  • 00:00:21
    give a brief introduction to the task of
  • 00:00:23
    cossom running as show in this picture
  • 00:00:26
    cost domain FAL rning transfers
  • 00:00:28
    Knowledge from ass Source data with
  • 00:00:30
    sufficient examples to Target it with
  • 00:00:32
    few labeled examples and Source data um
  • 00:00:36
    Source data and Target dat have uh
  • 00:00:38
    disjoint conet and also they belong to
  • 00:00:41
    this uh belong to different
  • 00:00:43
    domains um one limitation we would like
  • 00:00:47
    to highlight is that most of the
  • 00:00:49
    existing cross domain fish earning Works
  • 00:00:51
    mainly B based on the classification but
  • 00:00:54
    Overlook the object detection which is
  • 00:00:57
    also very important uh Vision task
  • 00:01:00
    so in this paper we are mainly motivated
  • 00:01:03
    to explore the uh to explore the object
  • 00:01:07
    detection in Cross domain future rning
  • 00:01:09
    we also call it as cross domain future
  • 00:01:12
    object detection CD
  • 00:01:16
    fsod as for the uh related work uh the
  • 00:01:19
    first comes the typical future learning
  • 00:01:22
    object detection methods which uh which
  • 00:01:25
    uh in concluded the mentor learning
  • 00:01:26
    based ones and transfer learning based
  • 00:01:28
    ones and also we would like to uh
  • 00:01:32
    highlight that recently a open set
  • 00:01:34
    detector dvit which only based on the
  • 00:01:37
    facial modality and build a open side
  • 00:01:40
    detector but it um achieves the S result
  • 00:01:43
    on the fal
  • 00:01:45
    OD uh in our paper we also uh
  • 00:01:48
    investigate the vi based object detector
  • 00:01:51
    and also cross domain FAL OD
  • 00:01:55
    method um as for the uh motivation of
  • 00:01:59
    the method mainly try to answer these
  • 00:02:01
    two uh keing questions so the first is
  • 00:02:04
    could the existing detectors especially
  • 00:02:07
    the open set detector generates to the
  • 00:02:09
    class domain settings the second is if
  • 00:02:12
    not how could the open set method be
  • 00:02:14
    first improved with the sign significant
  • 00:02:17
    domain Gap
  • 00:02:19
    issue we will first give our conclusion
  • 00:02:22
    um on the on this two questions so the
  • 00:02:25
    first is uh we investigate the S DV uh
  • 00:02:29
    and and we and then we tested the DV on
  • 00:02:32
    six novel Target which like ARX o CLI P
  • 00:02:37
    Dino deep um deep fish and um and two
  • 00:02:41
    others and the main conclusion is that
  • 00:02:44
    we can observe the performance of the S
  • 00:02:46
    V jobs very quickly on the novel Target
  • 00:02:49
    when there is a huge domain
  • 00:02:52
    Gap and then we consider to uh and then
  • 00:02:55
    we started to consider to consider what
  • 00:02:58
    is the challeng inside the behind this
  • 00:03:00
    task so um we found that when compar the
  • 00:03:04
    cross domain transfer with the in domain
  • 00:03:06
    transfer there there will uh our model
  • 00:03:10
    will meet like three main challenges so
  • 00:03:13
    the first is compared with the SCE data
  • 00:03:16
    the inter class viance in the Target
  • 00:03:19
    data usually is more small and also um
  • 00:03:22
    the we may meet the uh situation that
  • 00:03:25
    the object and the and its background is
  • 00:03:28
    very close to each other so we call it
  • 00:03:30
    call such phenomenon as a indivisible
  • 00:03:33
    boundary and also uh we investigate the
  • 00:03:36
    changing appearance which means that the
  • 00:03:38
    stale of the source and the target will
  • 00:03:41
    change and this is like our uh observ
  • 00:03:45
    the three main technical uh changes that
  • 00:03:48
    we will have to tackle under the cross
  • 00:03:50
    domain FAL object detection and as far
  • 00:03:53
    as the uh second question our conclusion
  • 00:03:56
    is yes and so we in our paper we Tred we
  • 00:03:59
    perect right um we propose a new method
  • 00:04:02
    CD V based on uh the dvit and we show
  • 00:04:06
    that with our CD V we can make the uh
  • 00:04:09
    original open set detector greater again
  • 00:04:11
    on the uh cross domain
  • 00:04:14
    setting and the so our specific Works uh
  • 00:04:19
    which try to uh tackle the first
  • 00:04:21
    question is could the existing detectors
  • 00:04:24
    uh especially open set detector gener to
  • 00:04:26
    the cost domain to answer this question
  • 00:04:29
    we first propose a new cross domain
  • 00:04:31
    ficial OD Benchmark with diverse St icv
  • 00:04:34
    and IB which is uh means the interclass
  • 00:04:37
    fance IND dividable
  • 00:04:40
    boundaries and in our uh in our
  • 00:04:42
    Benchmark we take the MS Coco as a
  • 00:04:45
    source data and we uh we introduce uh
  • 00:04:49
    six different Target as the target data
  • 00:04:51
    uh which has ax o clip part di deep Fish
  • 00:04:56
    N and
  • 00:04:58
    EOD and the what is uh wor uh Worth to
  • 00:05:02
    mention is that our Benchmark has uh
  • 00:05:05
    diverse their icv and IB so for example
  • 00:05:08
    Ms Coco is U has the photo realistics
  • 00:05:11
    they large icv slide slide individual B
  • 00:05:15
    Ray but for example for for the EOD
  • 00:05:18
    which is underwater dat set so it has
  • 00:05:21
    the underwat sty small SUV and
  • 00:05:24
    significant
  • 00:05:26
    IB um to investigate
  • 00:05:30
    was the exactly performance of the
  • 00:05:32
    existing detector under our new
  • 00:05:33
    Benchmark we uh study like four
  • 00:05:36
    different types of the Curren uh
  • 00:05:39
    proposed detectors like in concludes the
  • 00:05:42
    typical fish OD cross domain F OD VI
  • 00:05:44
    based detector and also open S detector
  • 00:05:48
    Here Comes our U main result we we show
  • 00:05:51
    the result under the T setting and in
  • 00:05:55
    our paper we um we very um how to say we
  • 00:05:59
    very uh detailed try to answer this
  • 00:06:02
    following questions for example does the
  • 00:06:05
    uh domain Gap post changes for the
  • 00:06:07
    future OD our V based Models Super U
  • 00:06:10
    better than the reset based ones and in
  • 00:06:13
    here uh we mainly we mainly highlight
  • 00:06:17
    that through the result we can um OB
  • 00:06:20
    observe very clearly that the domain Gap
  • 00:06:23
    did um POS a large uh how say and did
  • 00:06:27
    did POS huge challeng in for the current
  • 00:06:29
    FAL OD method and also and if we
  • 00:06:32
    directly use openid detector to um to
  • 00:06:35
    address a Closs domain FAL OD is
  • 00:06:37
    unfortunately not The Simple
  • 00:06:39
    Solution also in our paper we um analyze
  • 00:06:43
    like how the stale IV and IB in in
  • 00:06:46
    impact the domains and also the DAT set
  • 00:06:49
    and the basic conclusion is that the St
  • 00:06:51
    has relatively minor uh effector on the
  • 00:06:55
    domains which is very different from the
  • 00:06:57
    classification task and the ICB is very
  • 00:06:59
    notable but it turns out that we can
  • 00:07:02
    tackle the icv uh by the technique
  • 00:07:05
    method and the IB which is uh very very
  • 00:07:09
    hard to
  • 00:07:13
    tackle for the second question uh like
  • 00:07:17
    we want to First improve the open cell
  • 00:07:20
    detectors in the even even under the
  • 00:07:22
    domain Gap so ins paper we propose a
  • 00:07:25
    novel cossom Vision Transformer CD VTO
  • 00:07:28
    for this task and our method is uh is
  • 00:07:31
    built upon the dvit Baseline so here we
  • 00:07:36
    mainly summarize the main contri uh the
  • 00:07:38
    main modules of the dvit as as the BR um
  • 00:07:43
    as the bre Brew modules so basically the
  • 00:07:46
    dvit only use a dinov v tool to compare
  • 00:07:49
    the um features with the carer and
  • 00:07:52
    support so first the support images the
  • 00:07:54
    dvit use Ain D to to get is like a
  • 00:07:58
    pre-calculate the prototype for example
  • 00:08:01
    the shape and also it calculate some
  • 00:08:03
    backgrounds and then for the query image
  • 00:08:06
    we use the IP P Dino V2 and also the ROI
  • 00:08:10
    to get it uh features all upon every Roi
  • 00:08:14
    area and then compare the uh query Roi
  • 00:08:18
    with the support prototypes and based on
  • 00:08:20
    this uh it decided two different heads
  • 00:08:23
    one is a detection head another is one
  • 00:08:25
    vs rest classification head and then
  • 00:08:28
    output is the final
  • 00:08:30
    result um B based on the Baseline we
  • 00:08:35
    first propose three novel modules so the
  • 00:08:38
    first one is learnable instance features
  • 00:08:41
    which uh let the uh or original fixed
  • 00:08:45
    features became renable and the second
  • 00:08:47
    is the instance reating and the third is
  • 00:08:49
    a domain Pro um domain pretter and if we
  • 00:08:52
    if you still uh remember the three
  • 00:08:55
    technical changes that we mentioned in
  • 00:08:57
    the Closs domain setting we have the
  • 00:08:58
    changing appearance we have IND
  • 00:09:00
    dividable boundaries we have the small
  • 00:09:03
    icv U basically we decide the running
  • 00:09:07
    instance features and also together with
  • 00:09:08
    the 52 we want to make the features like
  • 00:09:12
    uh learnable and then through the 52 uh
  • 00:09:15
    we we we can manage to align its
  • 00:09:18
    features to the label so that means we
  • 00:09:20
    can uh somewh increase the
  • 00:09:24
    icv and the instance waiting is decided
  • 00:09:27
    to tackle the uh individ boundary which
  • 00:09:29
    means we want uh we want the high
  • 00:09:32
    quantity the high quantity instance with
  • 00:09:35
    uh with higher with higher weights and
  • 00:09:39
    also the domain propert is proposed
  • 00:09:40
    which we want the features to be more
  • 00:09:42
    robust to different domains th we can
  • 00:09:45
    tackling the changing
  • 00:09:46
    Styles next we will introduce each
  • 00:09:49
    modules respectively so the first come
  • 00:09:51
    the renable instance features and
  • 00:09:53
    fighing we first propose to fighing the
  • 00:09:57
    top two uh detection and classification
  • 00:09:59
    here and then uh with the F we first set
  • 00:10:03
    the original fixed instance features as
  • 00:10:05
    learnable uh which is our motivation is
  • 00:10:08
    that we want to BU F set the features
  • 00:10:12
    reenable and also fight that means we
  • 00:10:14
    can use the sematic labels to supervise
  • 00:10:16
    them so in that means we expected the
  • 00:10:19
    models could increase the icv by
  • 00:10:22
    aligning the features to distinct
  • 00:10:24
    semantic labels and to show that we
  • 00:10:26
    indeed achieve this goal through our
  • 00:10:29
    rable instance features we uh we did
  • 00:10:32
    this analyze in here we compared the uh
  • 00:10:35
    cosine uh distance of different class
  • 00:10:38
    protypes and compared the result with uh
  • 00:10:41
    initial fixed ones and our rable ones
  • 00:10:43
    and the result shows that with our
  • 00:10:46
    renable once we uh decrease the
  • 00:10:49
    similarity between different class which
  • 00:10:51
    means we uh in increase the uh
  • 00:10:54
    difference between different
  • 00:10:57
    class and uh
  • 00:11:00
    and for second is the instance reating
  • 00:11:03
    so instance reating uh means that we
  • 00:11:05
    want to reate this different instance
  • 00:11:08
    with different values so we expect the
  • 00:11:11
    high quty instance for example those
  • 00:11:14
    with low IB could be more valued and
  • 00:11:17
    here is the model design the model part
  • 00:11:19
    is very simple it's just using some MLP
  • 00:11:22
    to asside different ways to different
  • 00:11:24
    instance and here comes our analyze with
  • 00:11:28
    uh score we uh arranges them by the
  • 00:11:31
    score from high to low and from the
  • 00:11:33
    result we can observe that more
  • 00:11:36
    significant IB isue then it will get
  • 00:11:38
    less weight for example here it's like
  • 00:11:41
    the box is more uh close with its uh
  • 00:11:44
    background then it will get less less
  • 00:11:49
    value and the third one is our domain
  • 00:11:52
    prompt in our domain parameter we we
  • 00:11:54
    first introduce several virtual domains
  • 00:11:56
    that is not exist in the orinal
  • 00:11:59
    uh framework and then we decide to uh L
  • 00:12:02
    function so here is our introduce the
  • 00:12:05
    domain so we also set an as on larable
  • 00:12:07
    parameters and the motivation of
  • 00:12:10
    introduce this two domains is we we want
  • 00:12:13
    to uh use this domains
  • 00:12:16
    to as the noise and then we add it into
  • 00:12:20
    the original uh prototype so here we can
  • 00:12:23
    we can introduce it first with our uh
  • 00:12:25
    two different loss the first loss is the
  • 00:12:28
    domain that first LW in this domain
  • 00:12:30
    diverse loss we want the different
  • 00:12:32
    domains themselves to be different from
  • 00:12:34
    each other so they will be diverse and
  • 00:12:36
    the second one is the Prototype
  • 00:12:38
    consistent LW in this law as in this sub
  • 00:12:41
    figure we for example we add the two
  • 00:12:44
    different virtual domains into the same
  • 00:12:46
    uh class prototype but we me them to be
  • 00:12:49
    uh similar with each other that means
  • 00:12:51
    they are positive Pairs and then if we
  • 00:12:54
    add different uh version domains into
  • 00:12:57
    different class prototype then there
  • 00:12:59
    still different set means they are
  • 00:13:01
    negative negative PA So based on this
  • 00:13:03
    one we uh first propose our prototype
  • 00:13:06
    consistence with the Prototype
  • 00:13:08
    consistence we can somewhat achieve the
  • 00:13:11
    goal set even when we adding the
  • 00:13:14
    different domains into the same
  • 00:13:15
    prototype we can still keep the semantic
  • 00:13:18
    unchanged so uh thus we can improve the
  • 00:13:22
    models generation ability with different
  • 00:13:24
    visual Styles here also comes our our
  • 00:13:27
    analysis in here which shows the uh TS
  • 00:13:30
    of the domains and also the per bir Fe
  • 00:13:33
    features and in the first figure we show
  • 00:13:36
    that our we visualize the Lend domains
  • 00:13:39
    and we found that the Lend domains are
  • 00:13:41
    diverse and in this figure we add the
  • 00:13:43
    domains into the original class
  • 00:13:45
    prototype and in the figure we can um we
  • 00:13:48
    can draw the conclusion that adding the
  • 00:13:50
    domains into the original class P
  • 00:13:52
    doesn't cause the semantic shift
  • 00:13:55
    issue and uh with all the with all
  • 00:13:59
    modules added upon the Baseline we
  • 00:14:02
    finally build our CDV method and here
  • 00:14:05
    comes the final uh comparation result we
  • 00:14:08
    report one shot five short and a 10
  • 00:14:10
    short and we highlight that uh our C VI
  • 00:14:15
    significantly improve the base DV and
  • 00:14:18
    also outperform a as competitors
  • 00:14:20
    building a new s on this Benchmark and
  • 00:14:23
    we also did the evolation on different
  • 00:14:25
    modules in here we found that the F
  • 00:14:27
    helps a lot which is like uh has which
  • 00:14:31
    is like being uh reviewed by many other
  • 00:14:34
    papers but still uh we did a lot of
  • 00:14:37
    evolation study to find which is the
  • 00:14:38
    best way to fight and also we found that
  • 00:14:41
    all the modules contribute to the final
  • 00:14:44
    result uh also we visualiz uh we
  • 00:14:47
    visualize the detection result between
  • 00:14:49
    the uh Valiant dvit and also our CDV we
  • 00:14:53
    found that uh from the visualization
  • 00:14:56
    result we found our CD VI out
  • 00:14:58
    performance that the DV which is very
  • 00:15:00
    clear and also but still our method
  • 00:15:03
    could be first improved for example in
  • 00:15:05
    the an DT and uod which both of these
  • 00:15:08
    two this set had the uh significant IB
  • 00:15:11
    issue our methods also sjles to Output a
  • 00:15:15
    very good uh result so which means that
  • 00:15:17
    uh the benchmarks still have like a
  • 00:15:20
    large Lo to to be
  • 00:15:23
    improved uh finally comes to our
  • 00:15:25
    conclusion in this paper we we first
  • 00:15:28
    propose a comprehensive cross domain F
  • 00:15:30
    OD Benchmark with several novel dat set
  • 00:15:33
    that value the domain issue in the
  • 00:15:36
    Target dat set which has stale IV and IB
  • 00:15:39
    and second second we conduct extensive
  • 00:15:42
    study of existing open set detectors and
  • 00:15:45
    also we um investigate other types of
  • 00:15:48
    the detector for example VI based and
  • 00:15:50
    also as cossom fure methods and uh
  • 00:15:53
    thirdly based on the dvit we propose a
  • 00:15:56
    new enhanced open set detector Sy V and
  • 00:16:00
    in the CD VI we have three novel
  • 00:16:03
    modules uh here comes the link that if
  • 00:16:06
    you're interested in our work you can
  • 00:16:08
    easily uh get our paper and also we have
  • 00:16:12
    the pro project page and also we have
  • 00:16:15
    released all the code and all the data
  • 00:16:17
    set in in in the GitHub repo and uh
  • 00:16:20
    welome to use our uh this set and try
  • 00:16:24
    our method and uh thank you for
  • 00:16:26
    listening um and that's all thank you
  • 00:16:28
    bye bye-bye
الوسوم
  • cross-domain
  • few-shot
  • object detection
  • open set detector
  • CD-FSOD
  • benchmark
  • inter-class variance
  • indivisible boundaries
  • detection performance
  • machine learning