Paper Group ANR 918
Scene recognition with CNNs: objects, scales and dataset bias. Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs. QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships. Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Imag …
Scene recognition with CNNs: objects, scales and dataset bias
Title | Scene recognition with CNNs: objects, scales and dataset bias |
Authors | Luis Herranz, Shuqiang Jiang, Xiangyang Li |
Abstract | Since scenes are composed in part of objects, accurate recognition of scenes requires knowledge about both scenes and objects. In this paper we address two related problems: 1) scale induced dataset bias in multi-scale convolutional neural network (CNN) architectures, and 2) how to combine effectively scene-centric and object-centric knowledge (i.e. Places and ImageNet) in CNNs. An earlier attempt, Hybrid-CNN, showed that incorporating ImageNet did not help much. Here we propose an alternative method taking the scale into account, resulting in significant recognition gains. By analyzing the response of ImageNet-CNNs and Places-CNNs at different scales we find that both operate in different scale ranges, so using the same network for all the scales induces dataset bias resulting in limited performance. Thus, adapting the feature extractor to each particular scale (i.e. scale-specific CNNs) is crucial to improve recognition, since the objects in the scenes have their specific range of scales. Experimental results show that the recognition accuracy highly depends on the scale, and that simple yet carefully chosen multi-scale combinations of ImageNet-CNNs and Places-CNNs, can push the state-of-the-art recognition accuracy in SUN397 up to 66.26% (and even 70.17% with deeper architectures, comparable to human performance). |
Tasks | Scene Recognition |
Published | 2018-01-21 |
URL | http://arxiv.org/abs/1801.06867v1 |
http://arxiv.org/pdf/1801.06867v1.pdf | |
PWC | https://paperswithcode.com/paper/scene-recognition-with-cnns-objects-scales |
Repo | |
Framework | |
Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs
Title | Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs |
Authors | Xinhang Song, Luis Herranz, Shuqiang Jiang |
Abstract | Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data. |
Tasks | Scene Recognition |
Published | 2018-01-21 |
URL | http://arxiv.org/abs/1801.06797v1 |
http://arxiv.org/pdf/1801.06797v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-cnns-for-rgb-d-scene-recognition |
Repo | |
Framework | |
QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships
Title | QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships |
Authors | Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal |
Abstract | Many natural language questions require recognizing and reasoning with qualitative relationships (e.g., in science, economics, and medicine), but are challenging to answer with corpus-based methods. Qualitative modeling provides tools that support such reasoning, but the semantic parsing task of mapping questions into those models has formidable challenges. We present QuaRel, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. The dataset has 2771 questions relating 19 different types of quantities. For example, “Jenny observes that the robot vacuum cleaner moves slower on the living room carpet than on the bedroom carpet. Which carpet has more friction?” We contribute (1) a simple and flexible conceptual framework for representing these kinds of questions; (2) the QuaRel dataset, including logical forms, exemplifying the parsing challenges; and (3) two novel models for this task, built as extensions of type-constrained semantic parsing. The first of these models (called QuaSP+) significantly outperforms off-the-shelf tools on QuaRel. The second (QuaSP+Zero) demonstrates zero-shot capability, i.e., the ability to handle new qualitative relationships without requiring additional training data, something not possible with previous models. This work thus makes inroads into answering complex, qualitative questions that require reasoning, and scaling to new relationships at low cost. The dataset and models are available at http://data.allenai.org/quarel. |
Tasks | Semantic Parsing |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08048v1 |
http://arxiv.org/pdf/1811.08048v1.pdf | |
PWC | https://paperswithcode.com/paper/quarel-a-dataset-and-models-for-answering |
Repo | |
Framework | |
Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images
Title | Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images |
Authors | Fernando Camaro Nogues, Andrew Huie, Sakyasingha Dasgupta |
Abstract | In this work, we present an application of domain randomization and generative adversarial networks (GAN) to train a near real-time object detector for industrial electric parts, entirely in a simulated environment. Large scale availability of labelled real world data is typically rare and difficult to obtain in many industrial settings. As such here, only a few hundred of unlabelled real images are used to train a Cyclic-GAN network, in combination with various degree of domain randomization procedures. We demonstrate that this enables robust translation of synthetic images to the real world domain. We show that a combination of the original synthetic (simulation) and GAN translated images, when used for training a Mask-RCNN object detection network achieves greater than 0.95 mean average precision in detecting and classifying a collection of industrial electric parts. We evaluate the performance across different combinations of training data. |
Tasks | Object Detection |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11778v2 |
http://arxiv.org/pdf/1805.11778v2.pdf | |
PWC | https://paperswithcode.com/paper/object-detection-using-domain-randomization |
Repo | |
Framework | |
Formal Ontology Learning from English IS-A Sentences
Title | Formal Ontology Learning from English IS-A Sentences |
Authors | Sourish Dasgupta, Ankur Padia, Gaurav Maheshwari, Priyansh Trivedi, Jens Lehmann |
Abstract | Ontology learning (OL) is the process of automatically generating an ontological knowledge base from a plain text document. In this paper, we propose a new ontology learning approach and tool, called DLOL, which generates a knowledge base in the description logic (DL) SHOQ(D) from a collection of factual non-negative IS-A sentences in English. We provide extensive experimental results on the accuracy of DLOL, giving experimental comparisons to three state-of-the-art existing OL tools, namely Text2Onto, FRED, and LExO. Here, we use the standard OL accuracy measure, called lexical accuracy, and a novel OL accuracy measure, called instance-based inference model. In our experimental results, DLOL turns out to be about 21% and 46%, respectively, better than the best of the other three approaches. |
Tasks | |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03701v1 |
http://arxiv.org/pdf/1802.03701v1.pdf | |
PWC | https://paperswithcode.com/paper/formal-ontology-learning-from-english-is-a |
Repo | |
Framework | |
What Face and Body Shapes Can Tell About Height
Title | What Face and Body Shapes Can Tell About Height |
Authors | Semih Günel, Helge Rhodin, Pascal Fua |
Abstract | Recovering a person’s height from a single image is important for virtual garment fitting, autonomous driving and surveillance, however, it is also very challenging due to the absence of absolute scale information. We tackle the rarely addressed case, where camera parameters and scene geometry is unknown. To nevertheless resolve the inherent scale ambiguity, we infer height from statistics that are intrinsic to human anatomy and can be estimated from images directly, such as articulated pose, bone length proportions, and facial features. Our contribution is twofold. First, we experiment with different machine learning models to capture the relation between image content and human height. Second, we show that performance is predominantly limited by dataset size and create a new dataset that is three magnitudes larger, by mining explicit height labels and propagating them to additional images through face recognition and assignment consistency. Our evaluation shows that monocular height estimation is possible with a MAE of 5.56cm. |
Tasks | Autonomous Driving, Face Recognition |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10355v1 |
http://arxiv.org/pdf/1805.10355v1.pdf | |
PWC | https://paperswithcode.com/paper/what-face-and-body-shapes-can-tell-about |
Repo | |
Framework | |
Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance
Title | Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance |
Authors | Taylor Denouden, Rick Salay, Krzysztof Czarnecki, Vahdat Abdelzad, Buu Phan, Sachin Vernekar |
Abstract | There is an increasingly apparent need for validating the classifications made by deep learning systems in safety-critical applications like autonomous vehicle systems. A number of recent papers have proposed methods for detecting anomalous image data that appear different from known inlier data samples, including reconstruction-based autoencoders. Autoencoders optimize the compression of input data to a latent space of a dimensionality smaller than the original input and attempt to accurately reconstruct the input using that compressed representation. Since the latent vector is optimized to capture the salient features from the inlier class only, it is commonly assumed that images of objects from outside of the training class cannot effectively be compressed and reconstructed. Some thus consider reconstruction error as a kind of novelty measure. Here we suggest that reconstruction-based approaches fail to capture particular anomalies that lie far from known inlier samples in latent space but near the latent dimension manifold defined by the parameters of the model. We propose incorporating the Mahalanobis distance in latent space to better capture these out-of-distribution samples and our results show that this method often improves performance over the baseline approach. |
Tasks | Out-of-Distribution Detection |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02765v1 |
http://arxiv.org/pdf/1812.02765v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-reconstruction-autoencoder-out-of |
Repo | |
Framework | |
Towards Leveraging the Information of Gradients in Optimization-based Adversarial Attack
Title | Towards Leveraging the Information of Gradients in Optimization-based Adversarial Attack |
Authors | Jingyang Zhang, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen |
Abstract | In recent years, deep neural networks demonstrated state-of-the-art performance in a large variety of tasks and therefore have been adopted in many applications. On the other hand, the latest studies revealed that neural networks are vulnerable to adversarial examples obtained by carefully adding small perturbation to legitimate samples. Based upon the observation, many attack methods were proposed. Among them, the optimization-based CW attack is the most powerful as the produced adversarial samples present much less distortion compared to other methods. The better attacking effect, however, comes at the cost of running more iterations and thus longer computation time to reach desirable results. In this work, we propose to leverage the information of gradients as a guidance during the search of adversaries. More specifically, directly incorporating the gradients into the perturbation can be regarded as a constraint added to the optimization process. We intuitively and empirically prove the rationality of our method in reducing the search space. Our experiments show that compared to the original CW attack, the proposed method requires fewer iterations towards adversarial samples, obtaining a higher success rate and resulting in smaller $\ell_2$ distortion. |
Tasks | Adversarial Attack |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02524v1 |
http://arxiv.org/pdf/1812.02524v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-leveraging-the-information-of |
Repo | |
Framework | |
On the Computational Power of Online Gradient Descent
Title | On the Computational Power of Online Gradient Descent |
Authors | Vaggos Chatziafratis, Tim Roughgarden, Joshua R. Wang |
Abstract | We prove that the evolution of weight vectors in online gradient descent can encode arbitrary polynomial-space computations, even in very simple learning settings. Our results imply that, under weak complexity-theoretic assumptions, it is impossible to reason efficiently about the fine-grained behavior of online gradient descent. |
Tasks | |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01280v2 |
http://arxiv.org/pdf/1807.01280v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-computational-power-of-online-gradient |
Repo | |
Framework | |
Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition
Title | Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition |
Authors | Gerard Pons, David Masip |
Abstract | Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in Deep Learning have supposed a significant breakthrough in this topic, strong changes in pose, orientation and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly, and current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this paper, we propose to apply a multi-task learning loss function to share a common feature representation with other related tasks. Particularly we show that emotion recognition benefits from jointly learning a model with a detector of facial Action Units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multi-task approaches. We validate the proposal using two datasets acquired in non controlled environments, and an application to predict compound facial emotion expressions. |
Tasks | Emotion Recognition, Multi-Task Learning |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06664v1 |
http://arxiv.org/pdf/1802.06664v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-multi-label-and-multi-domain |
Repo | |
Framework | |
Learning to generate classifiers
Title | Learning to generate classifiers |
Authors | Nicholas Guttenberg, Ryota Kanai |
Abstract | We train a network to generate mappings between training sets and classification policies (a ‘classifier generator’) by conditioning on the entire training set via an attentional mechanism. The network is directly optimized for test set performance on an training set of related tasks, which is then transferred to unseen ‘test’ tasks. We use this to optimize for performance in the low-data and unsupervised learning regimes, and obtain significantly better performance in the 10-50 datapoint regime than support vector classifiers, random forests, XGBoost, and k-nearest neighbors on a range of small datasets. |
Tasks | |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11373v1 |
http://arxiv.org/pdf/1803.11373v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-generate-classifiers |
Repo | |
Framework | |
Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
Title | Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings |
Authors | Eda Okur, Hakan Demir, Arzucan Özgür |
Abstract | Recently, due to the increasing popularity of social media, the necessity for extracting information from informal text types, such as microblog texts, has gained significant attention. In this study, we focused on the Named Entity Recognition (NER) problem on informal text types for Turkish. We utilized a semi-supervised learning approach based on neural networks. We applied a fast unsupervised method for learning continuous representations of words in vector space. We made use of these obtained word embeddings, together with language independent features that are engineered to work better on informal text types, for generating a Turkish NER system on microblog texts. We evaluated our Turkish NER system on Twitter messages and achieved better F-score performances than the published results of previously proposed NER systems on Turkish tweets. Since we did not employ any language dependent features, we believe that our method can be easily adapted to microblog texts in other morphologically rich languages. |
Tasks | Named Entity Recognition, Word Embeddings |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08732v1 |
http://arxiv.org/pdf/1810.08732v1.pdf | |
PWC | https://paperswithcode.com/paper/named-entity-recognition-on-twitter-for |
Repo | |
Framework | |
Spherical Harmonic Residual Network for Diffusion Signal Harmonization
Title | Spherical Harmonic Residual Network for Diffusion Signal Harmonization |
Authors | Simon Koppers, Luke Bloy, Jeffrey I. Berman, Chantal M. W. Tax, J. Christopher Edgar, Dorit Merhof |
Abstract | Diffusion imaging is an important method in the field of neuroscience, as it is sensitive to changes within the tissue microstructure of the human brain. However, a major challenge when using MRI to derive quantitative measures is that the use of different scanners, as used in multi-site group studies, introduces measurement variability. This can lead to an increased variance in quantitative metrics, even if the same brain is scanned. Contrary to the assumption that these characteristics are comparable and similar, small changes in these values are observed in many clinical studies, hence harmonization of the signals is essential. In this paper, we present a method that does not require additional preprocessing, such as segmentation or registration, and harmonizes the signal based on a deep learning residual network. For this purpose, a training database is required, which consist of the same subjects, scanned on different scanners. The results show that harmonized signals are significantly more similar to the ground truth signal compared to no harmonization, but also improve in comparison to another deep learning method. The same effect is also demonstrated in commonly used metrics derived from the diffusion MRI signal. |
Tasks | |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01595v1 |
http://arxiv.org/pdf/1808.01595v1.pdf | |
PWC | https://paperswithcode.com/paper/spherical-harmonic-residual-network-for |
Repo | |
Framework | |
Automatic Identification of Twin Zygosity in Resting-State Functional MRI
Title | Automatic Identification of Twin Zygosity in Resting-State Functional MRI |
Authors | Andrey Gritsenko, Martin A. Lindquist, Gregory R. Kirk, Moo K. Chung |
Abstract | A key strength of twin studies arises from the fact that there are two types of twins, monozygotic and dizygotic, that share differing amounts of genetic information. Accurate differentiation of twin types allows efficient inference on genetic influences in a population. However, identification of zygosity is often prone to errors without genotying. In this study, we propose a novel pairwise feature representation to classify the zygosity of twin pairs of resting state functional magnetic resonance images (rs-fMRI). For this, we project an fMRI signal to a set of basis functions and use the projection coefficients as the compact and discriminative feature representation of noisy fMRI. We encode the relationship between twins as the correlation between the new feature representations across brain regions. We employ hill climbing variable selection to identify brain regions that are the most genetically affected. The proposed framework was applied to 208 twin pairs and achieved 94.19% classification accuracy in automatically identifying the zygosity of paired images. |
Tasks | |
Published | 2018-06-30 |
URL | http://arxiv.org/abs/1807.00244v4 |
http://arxiv.org/pdf/1807.00244v4.pdf | |
PWC | https://paperswithcode.com/paper/automatic-identification-of-twin-zygosity-in |
Repo | |
Framework | |
A Support Tensor Train Machine
Title | A Support Tensor Train Machine |
Authors | Cong Chen, Kim Batselier, Ching-Yun Ko, Ngai Wong |
Abstract | There has been growing interest in extending traditional vector-based machine learning techniques to their tensor forms. An example is the support tensor machine (STM) that utilizes a rank-one tensor to capture the data structure, thereby alleviating the overfitting and curse of dimensionality problems in the conventional support vector machine (SVM). However, the expressive power of a rank-one tensor is restrictive for many real-world data. To overcome this limitation, we introduce a support tensor train machine (STTM) by replacing the rank-one tensor in an STM with a tensor train. Experiments validate and confirm the superiority of an STTM over the SVM and STM. |
Tasks | |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06114v1 |
http://arxiv.org/pdf/1804.06114v1.pdf | |
PWC | https://paperswithcode.com/paper/a-support-tensor-train-machine |
Repo | |
Framework | |