Robust deep learning for computer vision to counteract data scarcity and label noise

Nguyen, Duc Tam

doi:10.6094/UNIFR/175727

Dissertation

Robust deep learning for computer vision to counteract data scarcity and label noise

Nguyen, Duc Tam

Erschienen in

Bibliographische Angaben

Erscheinungsjahr: 2020

DOI: 10.6094/UNIFR/175727 URN: urn:nbn:de:bsz:25-freidok-1757274 Sprache: englisch

Schlagwörter:

Maschinelles Sehen , Unüberwachtes Lernen , Teilüberwachtes Lernen , Robust learning

Abstract

englisch

Many recent advances in computer vision have been achieved with deep neural networks. Nevertheless, critical challenges for practical applications remain. The robustness of the neural network models is among the most significant hurdles. In a production environment, the model needs to be robust towards data and label noise. Specifically, deep neural networks often require an extensive training set and high-quality, human-annotated labels. However, these data and labels might be expensive or impracticable to obtain. Therefore, in this thesis, we focus on training neural network models robustly under typical and extreme conditions. This work starts with the robust learning of neural networks under typical conditions with convolutional networks. The widely used convolution module has only a predetermined and small receptive field. Therefore, the standard convolution layer considers a very local neighborhood, while the global context is neglected. Based on this observation, in this thesis, we propose a new convolution module with a flexible and adaptive reception field. Specifically, our convolution module uses parametric attention maps to adjust its learnable receptive field. To achieve this effect, we parametrize the self-attention mechanism using simple kernel functions, such as the Gaussian kernel. From the convolution perspective, our module corresponds to a 1x1 convolution, followed by a weighted importance sample and another 1x1 convolution. Our module is motivated by a self-attention mechanism that often focuses on the global context. From the perspective of self-attention, our module corresponds to learning a strongly regularized self-attention mechanism. Our analysis shows that our new convolution module is more robust than the traditional convolution layer, regardless of the choice of the network architectures. Our module proves to be a valuable complementary component to the traditional convolution network. Typical neural networks often perform well on clean and large datasets. However, their performance decreases as the size of the datasets shrinks. In extreme cases such as defect detection and cancer detection, data for some rare classes be missing at the training time. One possible solution is using one-class classification by modeling the typical case with a deep generative network. However, these models perform poorly on high-dimensional inputs and learn inefficiently on small datasets. In this thesis, we propose to learn the normal data distribution more efficiently with a multi-hypothesis autoencoder. The model is enforced by joint training with a critical discriminator to follow the underlying data distribution. In addition, the discriminator encourages the generative model to increase the diversity of output to cover existing dense data regions. Our extensive analysis indicates that the proposed framework can reliably detect anomalies on a realistic metal anomaly dataset and CIFAR-10. Our framework improves previously reported results by over 5.3% and 3.9%, respectively. Additionally, such a one-class learning approach is extremely sensitive to mislabeled datasets. Training deep models on such noisy labels often leads to a strong model overfitting. To counteract this overfitting effect, we propose a simple but effective method of filtering out wrong labels progressively with self-ensembling (SELF). Specifically, our SELF method improves the model's performance on the task at hand by forming ensembles of itself to reduce the variance in learning. Subsequently, the self-ensemble of the model identifies potentially mislabeled samples and excludes them from the model training. Removed samples remain in unsupervised loss to train the model. The filtering process is repeated until the best model is found. We show the effectiveness of our simple framework for training networks on symmetric and asymmetric label noise at different noise ratios. Our progressive filtering using the self-ensembling approach is generic. The framework can be applied to any task and remains robust regardless of the choice of the network architecture baseline. If the label set is strongly contaminated with noise or comes from an untrustworthy source, the label set may have to be wholly neglected. In this case, the supervised learning problem can be reformulated into an unsupervised learning task. For example, a typical solution for saliency detection is to train deep models directly on unlabeled data. Typically, a set of carefully selected, handcrafted methods generate pseudo-labels for the training of a deep network. In this thesis, we show that the pseudo-labels generated by each handcrafted method can be refined in isolation by a simple two-step training approach. In the first stage, additional prior knowledge from the design of a neural network is incorporated into the pseudo-labels. In the later stage, the results of several handcrafted methods are combined for a final network training. Our approach surpasses previously published methods on multiple datasets. Moreover, the quality of the model's prediction is comparable to learning a model in a supervised environment. The proposed principles are generic and straightforward. Hence, the framework could be applied to any anomaly detection or classification tasks.

Beschreibung

Dateien

Lizenz

Urheberrechtlich geschützt: Eine Nutzung ist nur gemäß Urheberrechtsgesetzen und/oder einschlägigen verwandten Schutzrechten möglich.

Dieser Beitrag ist mit Zustimmung des Rechteinhabers (Verlag) aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.

Urheberrechtlich geschützt: Eine Nutzung ist nur gemäß Urheberrechtsgesetzen und/oder einschlägigen verwandten Schutzrechten möglich.

Robust deep learning for computer vision to counteract data scarcity and label noise

ist lizenziert unter einer
Urheberrechtlich geschützt: Eine Nutzung ist nur gemäß Urheberrechtsgesetzen und/oder einschlägigen verwandten Schutzrechten möglich.

Urheberrechtlich geschützt: Eine Nutzung ist nur gemäß Urheberrechtsgesetzen und/oder einschlägigen verwandten Schutzrechten möglich.

Robust deep learning for computer vision to counteract data scarcity and label noise
ist lizenziert unter einer
Urheberrechtlich geschützt: Eine Nutzung ist nur gemäß Urheberrechtsgesetzen und/oder einschlägigen verwandten Schutzrechten möglich.

Thesis.pdf SHA256 checksum: d0df57fb47900e0936dc966c32eb6267334dc3c794feb7a90fc84db0e406f3d2

Download (13.48 MB)

Beschreibung der Forschungsdaten

Relationen

Laden...

Prüfungsangaben Fakultät: Technische Fakultät Betreuer:in: Brox, Thomas Prüfungsdatum: 10.12.2020

Korrekturanfrage

Vielen Dank für Ihre Korrekturanfrage. Wir werden uns um Ihre Anfrage kümmern und uns ggf. über die angegebene E-Mail-Adresse bei Ihnen zurückmelden. Bitte haben Sie Verständnis dafür, dass die Korrektur unter Umständen einige Tage dauern kann.

Es ist ein Fehler aufgetreten. Bitte versuchen Sie es später noch einmal.