ECCV24: Cross-Domain Few-Shot Object Detection via Enhanced Open-Set..

00:16:30
https://www.youtube.com/watch?v=t5vREYQIup8

Resumo

TLDRThis presentation discusses a novel approach to cross-domain few-shot object detection (CD-FSOD) through an enhanced open set object detector. The authors highlight the limitations of existing methods that focus mainly on classification, emphasizing the importance of object detection. They propose a new benchmark and a method called CDV, which addresses challenges such as inter-class variance and indivisible boundaries. The results indicate significant improvements in detection performance across various datasets, demonstrating the effectiveness of the proposed modules and methods.

Conclusões

  • 📊 Introduction of CD-FSOD and its significance
  • 🔍 Challenges in cross-domain object detection
  • 🛠️ Proposal of the CDV method
  • 📈 Significant performance improvements
  • 📚 New benchmark for evaluation
  • 💻 Code and datasets available on GitHub
  • 🔑 Key contributions of the research
  • 📉 Analysis of existing detectors
  • 🌊 Diverse datasets used for testing
  • 🚀 Future work to address remaining challenges

Linha do tempo

  • 00:00:00 - 00:05:00

    The speaker introduces their work on cross-domain few-shot object detection, highlighting the limitations of existing methods that focus primarily on classification rather than object detection. They aim to explore object detection in cross-domain settings, addressing the challenges posed by domain gaps and the need for effective transfer learning techniques.

  • 00:05:00 - 00:10:00

    The research investigates the performance of existing open set detectors in cross-domain scenarios, identifying three main challenges: inter-class variance, indivisible boundaries, and changing appearances. The authors propose a new method, CDV, to enhance the capabilities of open set detectors in these challenging environments, supported by a new benchmark that includes diverse datasets.

  • 00:10:00 - 00:16:30

    The proposed CDV method incorporates three novel modules to improve detection performance: learnable instance features, instance weighting, and domain prompting. The results demonstrate significant improvements over baseline models, with the CDV method outperforming competitors and addressing the challenges identified in cross-domain few-shot object detection. The speaker concludes by inviting the audience to explore their work further.

Mapa mental

Vídeo de perguntas e respostas

  • What is the main focus of the research?

    The research focuses on cross-domain few-shot object detection (CD-FSOD) using an enhanced open set object detector.

  • What are the main challenges addressed in the study?

    The study addresses challenges such as inter-class variance, indivisible boundaries, and changing appearances in cross-domain settings.

  • What is the proposed method called?

    The proposed method is called CDV, which enhances existing open set detectors.

  • What datasets were used for benchmarking?

    The study uses MS Coco as the source data and introduces six different target datasets for benchmarking.

  • What are the key contributions of the paper?

    The key contributions include a new benchmark for CD-FSOD, an extensive study of existing detectors, and the introduction of the CDV method with novel modules.

  • Where can the code and datasets be found?

    The code and datasets are available on the project's GitHub repository.

  • What is the significance of the proposed benchmark?

    The benchmark values domain issues in target datasets, focusing on inter-class variance and indivisible boundaries.

  • How does the CDV method improve detection performance?

    CDV improves detection performance by introducing learnable instance features, instance weighting, and domain prompting.

  • What were the results of the experiments?

    The experiments showed that CDV significantly outperformed existing methods and improved detection accuracy across various datasets.

  • What future work is suggested?

    Future work includes further improvements to tackle significant indivisible boundary issues in certain datasets.

Ver mais resumos de vídeos

Obtenha acesso instantâneo a resumos gratuitos de vídeos do YouTube com tecnologia de IA!
Legendas
en
Rolagem automática:
  • 00:00:00
    hi everyone my name is U and today I'm
  • 00:00:03
    going to introduce our work which is
  • 00:00:06
    just recently accepted at ecv 2024 um
  • 00:00:10
    our work is uh learning cross domain F
  • 00:00:13
    short object detection via enhanced open
  • 00:00:16
    set object
  • 00:00:18
    detector so uh first I would like to
  • 00:00:21
    give a brief introduction to the task of
  • 00:00:23
    cossom running as show in this picture
  • 00:00:26
    cost domain FAL rning transfers
  • 00:00:28
    Knowledge from ass Source data with
  • 00:00:30
    sufficient examples to Target it with
  • 00:00:32
    few labeled examples and Source data um
  • 00:00:36
    Source data and Target dat have uh
  • 00:00:38
    disjoint conet and also they belong to
  • 00:00:41
    this uh belong to different
  • 00:00:43
    domains um one limitation we would like
  • 00:00:47
    to highlight is that most of the
  • 00:00:49
    existing cross domain fish earning Works
  • 00:00:51
    mainly B based on the classification but
  • 00:00:54
    Overlook the object detection which is
  • 00:00:57
    also very important uh Vision task
  • 00:01:00
    so in this paper we are mainly motivated
  • 00:01:03
    to explore the uh to explore the object
  • 00:01:07
    detection in Cross domain future rning
  • 00:01:09
    we also call it as cross domain future
  • 00:01:12
    object detection CD
  • 00:01:16
    fsod as for the uh related work uh the
  • 00:01:19
    first comes the typical future learning
  • 00:01:22
    object detection methods which uh which
  • 00:01:25
    uh in concluded the mentor learning
  • 00:01:26
    based ones and transfer learning based
  • 00:01:28
    ones and also we would like to uh
  • 00:01:32
    highlight that recently a open set
  • 00:01:34
    detector dvit which only based on the
  • 00:01:37
    facial modality and build a open side
  • 00:01:40
    detector but it um achieves the S result
  • 00:01:43
    on the fal
  • 00:01:45
    OD uh in our paper we also uh
  • 00:01:48
    investigate the vi based object detector
  • 00:01:51
    and also cross domain FAL OD
  • 00:01:55
    method um as for the uh motivation of
  • 00:01:59
    the method mainly try to answer these
  • 00:02:01
    two uh keing questions so the first is
  • 00:02:04
    could the existing detectors especially
  • 00:02:07
    the open set detector generates to the
  • 00:02:09
    class domain settings the second is if
  • 00:02:12
    not how could the open set method be
  • 00:02:14
    first improved with the sign significant
  • 00:02:17
    domain Gap
  • 00:02:19
    issue we will first give our conclusion
  • 00:02:22
    um on the on this two questions so the
  • 00:02:25
    first is uh we investigate the S DV uh
  • 00:02:29
    and and we and then we tested the DV on
  • 00:02:32
    six novel Target which like ARX o CLI P
  • 00:02:37
    Dino deep um deep fish and um and two
  • 00:02:41
    others and the main conclusion is that
  • 00:02:44
    we can observe the performance of the S
  • 00:02:46
    V jobs very quickly on the novel Target
  • 00:02:49
    when there is a huge domain
  • 00:02:52
    Gap and then we consider to uh and then
  • 00:02:55
    we started to consider to consider what
  • 00:02:58
    is the challeng inside the behind this
  • 00:03:00
    task so um we found that when compar the
  • 00:03:04
    cross domain transfer with the in domain
  • 00:03:06
    transfer there there will uh our model
  • 00:03:10
    will meet like three main challenges so
  • 00:03:13
    the first is compared with the SCE data
  • 00:03:16
    the inter class viance in the Target
  • 00:03:19
    data usually is more small and also um
  • 00:03:22
    the we may meet the uh situation that
  • 00:03:25
    the object and the and its background is
  • 00:03:28
    very close to each other so we call it
  • 00:03:30
    call such phenomenon as a indivisible
  • 00:03:33
    boundary and also uh we investigate the
  • 00:03:36
    changing appearance which means that the
  • 00:03:38
    stale of the source and the target will
  • 00:03:41
    change and this is like our uh observ
  • 00:03:45
    the three main technical uh changes that
  • 00:03:48
    we will have to tackle under the cross
  • 00:03:50
    domain FAL object detection and as far
  • 00:03:53
    as the uh second question our conclusion
  • 00:03:56
    is yes and so we in our paper we Tred we
  • 00:03:59
    perect right um we propose a new method
  • 00:04:02
    CD V based on uh the dvit and we show
  • 00:04:06
    that with our CD V we can make the uh
  • 00:04:09
    original open set detector greater again
  • 00:04:11
    on the uh cross domain
  • 00:04:14
    setting and the so our specific Works uh
  • 00:04:19
    which try to uh tackle the first
  • 00:04:21
    question is could the existing detectors
  • 00:04:24
    uh especially open set detector gener to
  • 00:04:26
    the cost domain to answer this question
  • 00:04:29
    we first propose a new cross domain
  • 00:04:31
    ficial OD Benchmark with diverse St icv
  • 00:04:34
    and IB which is uh means the interclass
  • 00:04:37
    fance IND dividable
  • 00:04:40
    boundaries and in our uh in our
  • 00:04:42
    Benchmark we take the MS Coco as a
  • 00:04:45
    source data and we uh we introduce uh
  • 00:04:49
    six different Target as the target data
  • 00:04:51
    uh which has ax o clip part di deep Fish
  • 00:04:56
    N and
  • 00:04:58
    EOD and the what is uh wor uh Worth to
  • 00:05:02
    mention is that our Benchmark has uh
  • 00:05:05
    diverse their icv and IB so for example
  • 00:05:08
    Ms Coco is U has the photo realistics
  • 00:05:11
    they large icv slide slide individual B
  • 00:05:15
    Ray but for example for for the EOD
  • 00:05:18
    which is underwater dat set so it has
  • 00:05:21
    the underwat sty small SUV and
  • 00:05:24
    significant
  • 00:05:26
    IB um to investigate
  • 00:05:30
    was the exactly performance of the
  • 00:05:32
    existing detector under our new
  • 00:05:33
    Benchmark we uh study like four
  • 00:05:36
    different types of the Curren uh
  • 00:05:39
    proposed detectors like in concludes the
  • 00:05:42
    typical fish OD cross domain F OD VI
  • 00:05:44
    based detector and also open S detector
  • 00:05:48
    Here Comes our U main result we we show
  • 00:05:51
    the result under the T setting and in
  • 00:05:55
    our paper we um we very um how to say we
  • 00:05:59
    very uh detailed try to answer this
  • 00:06:02
    following questions for example does the
  • 00:06:05
    uh domain Gap post changes for the
  • 00:06:07
    future OD our V based Models Super U
  • 00:06:10
    better than the reset based ones and in
  • 00:06:13
    here uh we mainly we mainly highlight
  • 00:06:17
    that through the result we can um OB
  • 00:06:20
    observe very clearly that the domain Gap
  • 00:06:23
    did um POS a large uh how say and did
  • 00:06:27
    did POS huge challeng in for the current
  • 00:06:29
    FAL OD method and also and if we
  • 00:06:32
    directly use openid detector to um to
  • 00:06:35
    address a Closs domain FAL OD is
  • 00:06:37
    unfortunately not The Simple
  • 00:06:39
    Solution also in our paper we um analyze
  • 00:06:43
    like how the stale IV and IB in in
  • 00:06:46
    impact the domains and also the DAT set
  • 00:06:49
    and the basic conclusion is that the St
  • 00:06:51
    has relatively minor uh effector on the
  • 00:06:55
    domains which is very different from the
  • 00:06:57
    classification task and the ICB is very
  • 00:06:59
    notable but it turns out that we can
  • 00:07:02
    tackle the icv uh by the technique
  • 00:07:05
    method and the IB which is uh very very
  • 00:07:09
    hard to
  • 00:07:13
    tackle for the second question uh like
  • 00:07:17
    we want to First improve the open cell
  • 00:07:20
    detectors in the even even under the
  • 00:07:22
    domain Gap so ins paper we propose a
  • 00:07:25
    novel cossom Vision Transformer CD VTO
  • 00:07:28
    for this task and our method is uh is
  • 00:07:31
    built upon the dvit Baseline so here we
  • 00:07:36
    mainly summarize the main contri uh the
  • 00:07:38
    main modules of the dvit as as the BR um
  • 00:07:43
    as the bre Brew modules so basically the
  • 00:07:46
    dvit only use a dinov v tool to compare
  • 00:07:49
    the um features with the carer and
  • 00:07:52
    support so first the support images the
  • 00:07:54
    dvit use Ain D to to get is like a
  • 00:07:58
    pre-calculate the prototype for example
  • 00:08:01
    the shape and also it calculate some
  • 00:08:03
    backgrounds and then for the query image
  • 00:08:06
    we use the IP P Dino V2 and also the ROI
  • 00:08:10
    to get it uh features all upon every Roi
  • 00:08:14
    area and then compare the uh query Roi
  • 00:08:18
    with the support prototypes and based on
  • 00:08:20
    this uh it decided two different heads
  • 00:08:23
    one is a detection head another is one
  • 00:08:25
    vs rest classification head and then
  • 00:08:28
    output is the final
  • 00:08:30
    result um B based on the Baseline we
  • 00:08:35
    first propose three novel modules so the
  • 00:08:38
    first one is learnable instance features
  • 00:08:41
    which uh let the uh or original fixed
  • 00:08:45
    features became renable and the second
  • 00:08:47
    is the instance reating and the third is
  • 00:08:49
    a domain Pro um domain pretter and if we
  • 00:08:52
    if you still uh remember the three
  • 00:08:55
    technical changes that we mentioned in
  • 00:08:57
    the Closs domain setting we have the
  • 00:08:58
    changing appearance we have IND
  • 00:09:00
    dividable boundaries we have the small
  • 00:09:03
    icv U basically we decide the running
  • 00:09:07
    instance features and also together with
  • 00:09:08
    the 52 we want to make the features like
  • 00:09:12
    uh learnable and then through the 52 uh
  • 00:09:15
    we we we can manage to align its
  • 00:09:18
    features to the label so that means we
  • 00:09:20
    can uh somewh increase the
  • 00:09:24
    icv and the instance waiting is decided
  • 00:09:27
    to tackle the uh individ boundary which
  • 00:09:29
    means we want uh we want the high
  • 00:09:32
    quantity the high quantity instance with
  • 00:09:35
    uh with higher with higher weights and
  • 00:09:39
    also the domain propert is proposed
  • 00:09:40
    which we want the features to be more
  • 00:09:42
    robust to different domains th we can
  • 00:09:45
    tackling the changing
  • 00:09:46
    Styles next we will introduce each
  • 00:09:49
    modules respectively so the first come
  • 00:09:51
    the renable instance features and
  • 00:09:53
    fighing we first propose to fighing the
  • 00:09:57
    top two uh detection and classification
  • 00:09:59
    here and then uh with the F we first set
  • 00:10:03
    the original fixed instance features as
  • 00:10:05
    learnable uh which is our motivation is
  • 00:10:08
    that we want to BU F set the features
  • 00:10:12
    reenable and also fight that means we
  • 00:10:14
    can use the sematic labels to supervise
  • 00:10:16
    them so in that means we expected the
  • 00:10:19
    models could increase the icv by
  • 00:10:22
    aligning the features to distinct
  • 00:10:24
    semantic labels and to show that we
  • 00:10:26
    indeed achieve this goal through our
  • 00:10:29
    rable instance features we uh we did
  • 00:10:32
    this analyze in here we compared the uh
  • 00:10:35
    cosine uh distance of different class
  • 00:10:38
    protypes and compared the result with uh
  • 00:10:41
    initial fixed ones and our rable ones
  • 00:10:43
    and the result shows that with our
  • 00:10:46
    renable once we uh decrease the
  • 00:10:49
    similarity between different class which
  • 00:10:51
    means we uh in increase the uh
  • 00:10:54
    difference between different
  • 00:10:57
    class and uh
  • 00:11:00
    and for second is the instance reating
  • 00:11:03
    so instance reating uh means that we
  • 00:11:05
    want to reate this different instance
  • 00:11:08
    with different values so we expect the
  • 00:11:11
    high quty instance for example those
  • 00:11:14
    with low IB could be more valued and
  • 00:11:17
    here is the model design the model part
  • 00:11:19
    is very simple it's just using some MLP
  • 00:11:22
    to asside different ways to different
  • 00:11:24
    instance and here comes our analyze with
  • 00:11:28
    uh score we uh arranges them by the
  • 00:11:31
    score from high to low and from the
  • 00:11:33
    result we can observe that more
  • 00:11:36
    significant IB isue then it will get
  • 00:11:38
    less weight for example here it's like
  • 00:11:41
    the box is more uh close with its uh
  • 00:11:44
    background then it will get less less
  • 00:11:49
    value and the third one is our domain
  • 00:11:52
    prompt in our domain parameter we we
  • 00:11:54
    first introduce several virtual domains
  • 00:11:56
    that is not exist in the orinal
  • 00:11:59
    uh framework and then we decide to uh L
  • 00:12:02
    function so here is our introduce the
  • 00:12:05
    domain so we also set an as on larable
  • 00:12:07
    parameters and the motivation of
  • 00:12:10
    introduce this two domains is we we want
  • 00:12:13
    to uh use this domains
  • 00:12:16
    to as the noise and then we add it into
  • 00:12:20
    the original uh prototype so here we can
  • 00:12:23
    we can introduce it first with our uh
  • 00:12:25
    two different loss the first loss is the
  • 00:12:28
    domain that first LW in this domain
  • 00:12:30
    diverse loss we want the different
  • 00:12:32
    domains themselves to be different from
  • 00:12:34
    each other so they will be diverse and
  • 00:12:36
    the second one is the Prototype
  • 00:12:38
    consistent LW in this law as in this sub
  • 00:12:41
    figure we for example we add the two
  • 00:12:44
    different virtual domains into the same
  • 00:12:46
    uh class prototype but we me them to be
  • 00:12:49
    uh similar with each other that means
  • 00:12:51
    they are positive Pairs and then if we
  • 00:12:54
    add different uh version domains into
  • 00:12:57
    different class prototype then there
  • 00:12:59
    still different set means they are
  • 00:13:01
    negative negative PA So based on this
  • 00:13:03
    one we uh first propose our prototype
  • 00:13:06
    consistence with the Prototype
  • 00:13:08
    consistence we can somewhat achieve the
  • 00:13:11
    goal set even when we adding the
  • 00:13:14
    different domains into the same
  • 00:13:15
    prototype we can still keep the semantic
  • 00:13:18
    unchanged so uh thus we can improve the
  • 00:13:22
    models generation ability with different
  • 00:13:24
    visual Styles here also comes our our
  • 00:13:27
    analysis in here which shows the uh TS
  • 00:13:30
    of the domains and also the per bir Fe
  • 00:13:33
    features and in the first figure we show
  • 00:13:36
    that our we visualize the Lend domains
  • 00:13:39
    and we found that the Lend domains are
  • 00:13:41
    diverse and in this figure we add the
  • 00:13:43
    domains into the original class
  • 00:13:45
    prototype and in the figure we can um we
  • 00:13:48
    can draw the conclusion that adding the
  • 00:13:50
    domains into the original class P
  • 00:13:52
    doesn't cause the semantic shift
  • 00:13:55
    issue and uh with all the with all
  • 00:13:59
    modules added upon the Baseline we
  • 00:14:02
    finally build our CDV method and here
  • 00:14:05
    comes the final uh comparation result we
  • 00:14:08
    report one shot five short and a 10
  • 00:14:10
    short and we highlight that uh our C VI
  • 00:14:15
    significantly improve the base DV and
  • 00:14:18
    also outperform a as competitors
  • 00:14:20
    building a new s on this Benchmark and
  • 00:14:23
    we also did the evolation on different
  • 00:14:25
    modules in here we found that the F
  • 00:14:27
    helps a lot which is like uh has which
  • 00:14:31
    is like being uh reviewed by many other
  • 00:14:34
    papers but still uh we did a lot of
  • 00:14:37
    evolation study to find which is the
  • 00:14:38
    best way to fight and also we found that
  • 00:14:41
    all the modules contribute to the final
  • 00:14:44
    result uh also we visualiz uh we
  • 00:14:47
    visualize the detection result between
  • 00:14:49
    the uh Valiant dvit and also our CDV we
  • 00:14:53
    found that uh from the visualization
  • 00:14:56
    result we found our CD VI out
  • 00:14:58
    performance that the DV which is very
  • 00:15:00
    clear and also but still our method
  • 00:15:03
    could be first improved for example in
  • 00:15:05
    the an DT and uod which both of these
  • 00:15:08
    two this set had the uh significant IB
  • 00:15:11
    issue our methods also sjles to Output a
  • 00:15:15
    very good uh result so which means that
  • 00:15:17
    uh the benchmarks still have like a
  • 00:15:20
    large Lo to to be
  • 00:15:23
    improved uh finally comes to our
  • 00:15:25
    conclusion in this paper we we first
  • 00:15:28
    propose a comprehensive cross domain F
  • 00:15:30
    OD Benchmark with several novel dat set
  • 00:15:33
    that value the domain issue in the
  • 00:15:36
    Target dat set which has stale IV and IB
  • 00:15:39
    and second second we conduct extensive
  • 00:15:42
    study of existing open set detectors and
  • 00:15:45
    also we um investigate other types of
  • 00:15:48
    the detector for example VI based and
  • 00:15:50
    also as cossom fure methods and uh
  • 00:15:53
    thirdly based on the dvit we propose a
  • 00:15:56
    new enhanced open set detector Sy V and
  • 00:16:00
    in the CD VI we have three novel
  • 00:16:03
    modules uh here comes the link that if
  • 00:16:06
    you're interested in our work you can
  • 00:16:08
    easily uh get our paper and also we have
  • 00:16:12
    the pro project page and also we have
  • 00:16:15
    released all the code and all the data
  • 00:16:17
    set in in in the GitHub repo and uh
  • 00:16:20
    welome to use our uh this set and try
  • 00:16:24
    our method and uh thank you for
  • 00:16:26
    listening um and that's all thank you
  • 00:16:28
    bye bye-bye
Etiquetas
  • cross-domain
  • few-shot
  • object detection
  • open set detector
  • CD-FSOD
  • benchmark
  • inter-class variance
  • indivisible boundaries
  • detection performance
  • machine learning