Paper Group ANR 606
Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition. Excessive Invariance Causes Adversarial Vulnerability. Joint Image Captioning and Question Answering. Facial Landmark Detection: a Literature Survey. Image Transformer. Compressed Dictionary Learning. A Distributed Secon …
Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition
Title | Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition |
Authors | Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, Pascale Fung |
Abstract | We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information. |
Tasks | Named Entity Recognition, Transfer Learning |
Published | 2018-05-30 |
URL | https://arxiv.org/abs/1805.12061v2 |
https://arxiv.org/pdf/1805.12061v2.pdf | |
PWC | https://paperswithcode.com/paper/bilingual-character-representation-for |
Repo | |
Framework | |
Excessive Invariance Causes Adversarial Vulnerability
Title | Excessive Invariance Causes Adversarial Vulnerability |
Authors | Jörn-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge |
Abstract | Despite their impressive performance, deep neural networks exhibit striking failures on out-of-distribution inputs. One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. We decompose these errors into two complementary sources: sensitivity and invariance. We show deep networks are not only too sensitive to task-irrelevant changes of their input, as is well-known from epsilon-adversarial examples, but are also too invariant to a wide range of task-relevant changes, thus making vast regions in input space vulnerable to adversarial attacks. We show such excessive invariance occurs across various tasks and architecture types. On MNIST and ImageNet one can manipulate the class-specific content of almost any image without changing the hidden activations. We identify an insufficiency of the standard cross-entropy loss as a reason for these failures. Further, we extend this objective based on an information-theoretic analysis so it encourages the model to consider all task-dependent features in its decision. This provides the first approach tailored explicitly to overcome excessive invariance and resulting vulnerabilities. |
Tasks | |
Published | 2018-11-01 |
URL | https://arxiv.org/abs/1811.00401v3 |
https://arxiv.org/pdf/1811.00401v3.pdf | |
PWC | https://paperswithcode.com/paper/excessive-invariance-causes-adversarial |
Repo | |
Framework | |
Joint Image Captioning and Question Answering
Title | Joint Image Captioning and Question Answering |
Authors | Jialin Wu, Zeyuan Hu, Raymond J. Mooney |
Abstract | Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers. Meanwhile, image captioning systems with beam search strategy tend to generate similar captions and fail to diversely describe images. To address the aforementioned issues, we present a system to have these two tasks compensate with each other, which is capable of jointly producing image captions and answering visual questions. In particular, we utilize question and image features to generate question-related captions and use the generated captions as additional features to provide new knowledge to the VQA system. For image captioning, our system attains more informative results in term of the relative improvements on VQA tasks as well as competitive results using automated metrics. Applying our system to the VQA tasks, our results on VQA v2 dataset achieve 65.8% using generated captions and 69.1% using annotated captions in validation set and 68.4% in the test-standard set. Further, an ensemble of 10 models results in 69.7% in the test-standard split. |
Tasks | Image Captioning, Question Answering, Visual Question Answering |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08389v1 |
http://arxiv.org/pdf/1805.08389v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-image-captioning-and-question-answering |
Repo | |
Framework | |
Facial Landmark Detection: a Literature Survey
Title | Facial Landmark Detection: a Literature Survey |
Authors | Yue Wu, Qiang Ji |
Abstract | The locations of the fiducial facial landmark points around facial components and facial contour capture the rigid and non-rigid facial deformations due to head movements and facial expressions. They are hence important for various facial analysis tasks. Many facial landmark detection algorithms have been developed to automatically detect those key points over the years, and in this paper, we perform an extensive review of them. We classify the facial landmark detection algorithms into three major categories: holistic methods, Constrained Local Model (CLM) methods, and the regression-based methods. They differ in the ways to utilize the facial appearance and shape information. The holistic methods explicitly build models to represent the global facial appearance and shape information. The CLMs explicitly leverage the global shape model but build the local appearance models. The regression-based methods implicitly capture facial shape and appearance information. For algorithms within each category, we discuss their underlying theories as well as their differences. We also compare their performances on both controlled and in the wild benchmark datasets, under varying facial expressions, head poses, and occlusion. Based on the evaluations, we point out their respective strengths and weaknesses. There is also a separate section to review the latest deep learning-based algorithms. The survey also includes a listing of the benchmark databases and existing software. Finally, we identify future research directions, including combining methods in different categories to leverage their respective strengths to solve landmark detection “in-the-wild”. |
Tasks | Facial Landmark Detection |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05563v1 |
http://arxiv.org/pdf/1805.05563v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-landmark-detection-a-literature-survey |
Repo | |
Framework | |
Image Transformer
Title | Image Transformer |
Authors | Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran |
Abstract | Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art. |
Tasks | Image Generation, Image Super-Resolution, Super-Resolution |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05751v3 |
http://arxiv.org/pdf/1802.05751v3.pdf | |
PWC | https://paperswithcode.com/paper/image-transformer |
Repo | |
Framework | |
Compressed Dictionary Learning
Title | Compressed Dictionary Learning |
Authors | Karin Schnass, Flavio Teixeira |
Abstract | In this paper we show that the computational complexity of the Iterative Thresholding and K-residual-Means (ITKrM) algorithm for dictionary learning can be significantly reduced by using dimensionality-reduction techniques based on the Johnson-Lindenstrauss lemma. The dimensionality reduction is efficiently carried out with the fast Fourier transform. We introduce the Iterative compressed-Thresholding and K-Means (IcTKM) algorithm for fast dictionary learning and study its convergence properties. We show that IcTKM can locally recover an incoherent, overcomplete generating dictionary of $K$ atoms from training signals of sparsity level $S$ with high probability. Fast dictionary learning is achieved by embedding the training data and the dictionary into $m < d$ dimensions, and recovery is shown to be locally stable with an embedding dimension which scales as low as $m = O(S \log^4 S \log^3 K)$. The compression effectively shatters the data dimension bottleneck in the computational cost of ITKrM, reducing it by a factor $O(m/d)$. Our theoretical results are complemented with numerical simulations which demonstrate that IcTKM is a powerful, low-cost algorithm for learning dictionaries from high-dimensional data sets. |
Tasks | Dictionary Learning, Dimensionality Reduction |
Published | 2018-05-02 |
URL | https://arxiv.org/abs/1805.00692v2 |
https://arxiv.org/pdf/1805.00692v2.pdf | |
PWC | https://paperswithcode.com/paper/compressed-dictionary-learning |
Repo | |
Framework | |
A Distributed Second-Order Algorithm You Can Trust
Title | A Distributed Second-Order Algorithm You Can Trust |
Authors | Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi |
Abstract | Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the computation and the communication of a large Hessian matrix. In this paper we present a new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers. To deal with this approximate information we propose an adaptive approach that - akin to trust-region methods - dynamically adapts the auxiliary model to compensate for modeling errors. We provide theoretical rates of convergence for a wide class of problems including L1-regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets. |
Tasks | Distributed Optimization |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07569v1 |
http://arxiv.org/pdf/1806.07569v1.pdf | |
PWC | https://paperswithcode.com/paper/a-distributed-second-order-algorithm-you-can |
Repo | |
Framework | |
Low-Rank Tensor Modeling for Hyperspectral Unmixing Accounting for Spectral Variability
Title | Low-Rank Tensor Modeling for Hyperspectral Unmixing Accounting for Spectral Variability |
Authors | Tales Imbiriba, Ricardo Augusto Borsoi, José Carlos Moreira Bermudez |
Abstract | Traditional hyperspectral unmixing methods neglect the underlying variability of spectral signatures often observed in typical hyperspectral images (HI), propagating these missmodeling errors throughout the whole unmixing process. Attempts to model material spectra as members of sets or as random variables tend to lead to severely ill-posed unmixing problems. Although parametric models have been proposed to overcome this drawback by handling endmember variability through generalizations of the mixing model, the success of these techniques depend on employing appropriate regularization strategies. Moreover, the existing approaches fail to adequately explore the natural multidimensinal representation of HIs. Recently, tensor-based strategies considered low-rank decompositions of hyperspectral images as an alternative to impose low-dimensional structures on the solutions of standard and multitemporal unmixing problems. These strategies, however, present two main drawbacks: 1) they confine the solutions to low-rank tensors, which often cannot represent the complexity of real-world scenarios; and 2) they lack guarantees that endmembers and abundances will be correctly factorized in their respective tensors. In this work, we propose a more flexible approach, called ULTRA-V, that imposes low-rank structures through regularizations whose strictness is controlled by scalar parameters. Simulations attest the superior accuracy of the method when compared with state-of-the-art unmixing algorithms that account for spectral variability. |
Tasks | Hyperspectral Unmixing |
Published | 2018-11-02 |
URL | https://arxiv.org/abs/1811.02413v3 |
https://arxiv.org/pdf/1811.02413v3.pdf | |
PWC | https://paperswithcode.com/paper/low-rank-tensor-modeling-for-hyperspectral |
Repo | |
Framework | |
Artificial Intelligence and Legal Liability
Title | Artificial Intelligence and Legal Liability |
Authors | John Kingston |
Abstract | A recent issue of a popular computing journal asked which laws would apply if a self-driving car killed a pedestrian. This paper considers the question of legal liability for artificially intelligent computer systems. It discusses whether criminal liability could ever apply; to whom it might apply; and, under civil law, whether an AI program is a product that is subject to product design legislation or a service to which the tort of negligence applies. The issue of sales warranties is also considered. A discussion of some of the practical limitations that AI systems are subject to is also included. |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07782v1 |
http://arxiv.org/pdf/1802.07782v1.pdf | |
PWC | https://paperswithcode.com/paper/artificial-intelligence-and-legal-liability |
Repo | |
Framework | |
Domain Adaptation on Graphs by Learning Aligned Graph Bases
Title | Domain Adaptation on Graphs by Learning Aligned Graph Bases |
Authors | Mehmet Pilanci, Elif Vural |
Abstract | A common assumption in semi-supervised learning with graph models is that the class label function varies smoothly on the data graph, resulting in the rather strict prior that the label function has low-frequency content. Meanwhile, in many classification problems, the label function may vary abruptly in certain graph regions, resulting in high-frequency components. Although the semi-supervised estimation of class labels is an ill-posed problem in general, in several applications it is possible to find a source graph on which the label function has similar frequency content to that on the target graph where the actual classification problem is defined. In this paper, we propose a method for domain adaptation on graphs motivated by these observations. Our algorithm is based on learning the spectrum of the label function in a source graph with many labeled nodes, and transferring the information of the spectrum to the target graph with fewer labeled nodes. While the frequency content of the class label function can be identified through the graph Fourier transform, it is not easy to transfer the Fourier coefficients directly between the two graphs, since no one-to-one match exists between the Fourier basis vectors of independently constructed graphs in the domain adaptation setting. We solve this problem by learning a transformation between the Fourier bases of the two graphs that flexibly ``aligns’’ them. The unknown class label function on the target graph is then reconstructed such that its spectrum matches that on the source graph while also ensuring the consistency with the available labels. The proposed method is tested in the classification of image, online product review, and social network data sets. Comparative experiments suggest that the proposed algorithm performs better than recent domain adaptation methods in the literature in most settings. | |
Tasks | Domain Adaptation |
Published | 2018-03-14 |
URL | https://arxiv.org/abs/1803.05288v3 |
https://arxiv.org/pdf/1803.05288v3.pdf | |
PWC | https://paperswithcode.com/paper/domain-adaptation-on-graphs-by-learning-1 |
Repo | |
Framework | |
A Comparison of Embedded Deep Learning Methods for Person Detection
Title | A Comparison of Embedded Deep Learning Methods for Person Detection |
Authors | Chloe Eunhyang Kim, Mahdi Maktab Dar Oghaz, Jiri Fajtl, Vasileios Argyriou, Paolo Remagnino |
Abstract | Recent advancements in parallel computing, GPU technology and deep learning provide a new platform for complex image processing tasks such as person detection to flourish. Person detection is fundamental preliminary operation for several high level computer vision tasks. One industry that can significantly benefit from person detection is retail. In recent years, various studies attempt to find an optimal solution for person detection using neural networks and deep learning. This study conducts a comparison among the state of the art deep learning base object detector with the focus on person detection performance in indoor environments. Performance of various implementations of YOLO, SSD, RCNN, R-FCN and SqueezeDet have been assessed using our in-house proprietary dataset which consists of over 10 thousands indoor images captured form shopping malls, retails and stores. Experimental results indicate that, Tiny YOLO-416 and SSD (VGG-300) are the fastest and Faster-RCNN (Inception ResNet-v2) and R-FCN (ResNet-101) are the most accurate detectors investigated in this study. Further analysis shows that YOLO v3-416 delivers relatively accurate result in a reasonable amount of time, which makes it an ideal model for person detection in embedded platforms. |
Tasks | Human Detection |
Published | 2018-12-09 |
URL | http://arxiv.org/abs/1812.03451v2 |
http://arxiv.org/pdf/1812.03451v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparison-of-embedded-deep-learning |
Repo | |
Framework | |
Exploring the Landscape of Relational Syllogistic Logics
Title | Exploring the Landscape of Relational Syllogistic Logics |
Authors | Alex Kruckman, Lawrence S. Moss |
Abstract | This paper explores relational syllogistic logics, a family of logical systems related to reasoning about relations in extensions of the classical syllogistic. These are all decidable logical systems. We prove completeness theorems and complexity results for a natural subfamily of relational syllogistic logics, parametrized by constructors for terms and for sentences. |
Tasks | |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00656v1 |
http://arxiv.org/pdf/1809.00656v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-landscape-of-relational |
Repo | |
Framework | |
Person Search by Multi-Scale Matching
Title | Person Search by Multi-Scale Matching |
Authors | Xu Lan, Xiatian Zhu, Shaogang Gong |
Abstract | We consider the problem of person search in unconstrained scene images. Existing methods usually focus on improving the person detection accuracy to mitigate negative effects imposed by misalignment, mis-detections, and false alarms resulted from noisy people auto-detection. In contrast to previous studies, we show that sufficiently reliable person instance cropping is achievable by slightly improved state-of-the-art deep learning object detectors (e.g. Faster-RCNN), and the under-studied multi-scale matching problem in person search is a more severe barrier. In this work, we address this multi-scale person search challenge by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model. This is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced by a novel cross pyramid-level semantic alignment loss function. This favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture. Extensive experiments show the modelling advantages and performance superiority of CLSA over the state-of-the-art person search and multi-scale matching methods on two large person search benchmarking datasets: CUHK-SYSU and PRW. |
Tasks | Human Detection, Person Search |
Published | 2018-07-23 |
URL | http://arxiv.org/abs/1807.08582v1 |
http://arxiv.org/pdf/1807.08582v1.pdf | |
PWC | https://paperswithcode.com/paper/person-search-by-multi-scale-matching |
Repo | |
Framework | |
Triclustering of Gene Expression Microarray data using Evolutionary Approach
Title | Triclustering of Gene Expression Microarray data using Evolutionary Approach |
Authors | Shreya Mishra, Swati Vipsita |
Abstract | In Tri-clustering, a sub-matrix is being created, which exhibit highly similar behavior with respect to genes, conditions and time-points. In this technique, genes with same expression values are discovered across some fragment of time points, under certain conditions. In this paper, triclustering using evolutionary algorithm is implemented using a new fitness function consisting of 3D Mean Square residue (MSR) and Least Square approximation (LSL). The primary objective is to find triclusters with minimum overlapping, low MSR, low LSL and covering almost every element of expression matrix, thus minimizing the overall fitness value. To improve the results of algorithm, new fitness function is introduced to find good quality triclusters. It is observed from experiments that, triclustering using EA yielded good quality triclusters. The experiment was implemented on yeast Saccharomyces dataset. Index Terms-Tri-clustering, Genetic Algorithm, Mean squared residue, Volume, Weights, Least square approximation. |
Tasks | |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05047v1 |
http://arxiv.org/pdf/1805.05047v1.pdf | |
PWC | https://paperswithcode.com/paper/triclustering-of-gene-expression-microarray |
Repo | |
Framework | |
Modeling Spatio-Temporal Human Track Structure for Action Localization
Title | Modeling Spatio-Temporal Human Track Structure for Action Localization |
Authors | Guilhem Chéron, Anton Osokin, Ivan Laptev, Cordelia Schmid |
Abstract | This paper addresses spatio-temporal localization of human actions in video. In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks. Our model is trained to simultaneously recognize and localize action classes in time and is based on two layer gated recurrent units (GRU) applied separately to two streams, i.e. appearance and optical flow streams. When used together with state-of-the-art person detection and tracking, our model is shown to improve substantially spatio-temporal action localization in videos. The gain is shown to be mainly due to improved temporal localization. We evaluate our method on two recent datasets for spatio-temporal action localization, UCF101-24 and DALY, demonstrating a significant improvement of the state of the art. |
Tasks | Action Localization, Human Detection, Optical Flow Estimation, Spatio-Temporal Action Localization, Temporal Action Localization, Temporal Localization |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.11008v1 |
http://arxiv.org/pdf/1806.11008v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-spatio-temporal-human-track |
Repo | |
Framework | |