October 19, 2019

3042 words 15 mins read

Paper Group ANR 408

Text to Image Synthesis Using Generative Adversarial Networks. L2-Nonexpansive Neural Networks. Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes. Treating Keywords as Outliers: A Keyphrase Extraction Approach. Gradient Descent Provably Optimizes Over-parameteri …

Text to Image Synthesis Using Generative Adversarial Networks


Title	Text to Image Synthesis Using Generative Adversarial Networks
Authors	Cristian Bodnar
Abstract	Generating images from natural language is one of the primary applications of recent conditional generative models. Besides testing our ability to model conditional, highly dimensional distributions, text to image synthesis has many exciting and practical applications such as photo editing or computer-aided content creation. Recent progress has been made using Generative Adversarial Networks (GANs). This material starts with a gentle introduction to these topics and discusses the existent state of the art models. Moreover, I propose Wasserstein GAN-CLS, a new model for conditional image generation based on the Wasserstein distance which offers guarantees of stability. Then, I show how the novel loss function of Wasserstein GAN-CLS can be used in a Conditional Progressive Growing GAN. In combination with the proposed loss, the model boosts by 7.07% the best Inception Score (on the Caltech birds dataset) of the models which use only the sentence-level visual semantics. The only model which performs better than the Conditional Wasserstein Progressive Growing GAN is the recently proposed AttnGAN which uses word-level visual semantics as well.
Tasks	Conditional Image Generation, Image Generation
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00676v1
PDF	http://arxiv.org/pdf/1805.00676v1.pdf
PWC	https://paperswithcode.com/paper/text-to-image-synthesis-using-generative
Repo
Framework

L2-Nonexpansive Neural Networks


Title	L2-Nonexpansive Neural Networks
Authors	Haifeng Qian, Mark N. Wegman
Abstract	This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. We develop the known methodology of controlling Lipschitz constants to realize its full potential in maximizing robustness, with a new regularization scheme for linear layers, new ways to adapt nonlinearities and a new loss function. With MNIST and CIFAR-10 classifiers, we demonstrate a number of advantages. Without needing any adversarial training, the proposed classifiers exceed the state of the art in robustness against white-box L2-bounded adversarial attacks. They generalize better than ordinary networks from noisy data with partially random labels. Their outputs are quantitatively meaningful and indicate levels of confidence and generalization, among other desirable properties.
Tasks
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07896v4
PDF	http://arxiv.org/pdf/1802.07896v4.pdf
PWC	https://paperswithcode.com/paper/l2-nonexpansive-neural-networks
Repo
Framework

Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes


Title	Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes
Authors	Xiaowei Jia, Anuj Karpatne, Jared Willard, Michael Steinbach, Jordan Read, Paul C Hanson, Hilary A Dugan, Vipin Kumar
Abstract	In this paper, we introduce a novel framework for combining scientific knowledge within physics-based models and recurrent neural networks to advance scientific discovery in many dynamical systems. We will first describe the use of outputs from physics-based models in learning a hybrid-physics-data model. Then, we further incorporate physical knowledge in real-world dynamical systems as additional constraints for training recurrent neural networks. We will apply this approach on modeling lake temperature and quality where we take into account the physical constraints along both the depth dimension and time dimension. By using scientific knowledge to guide the construction and learning the data-driven model, we demonstrate that this method can achieve better prediction accuracy as well as scientific consistency of results.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02880v1
PDF	http://arxiv.org/pdf/1810.02880v1.pdf
PWC	https://paperswithcode.com/paper/physics-guided-recurrent-neural-networks-for
Repo
Framework

Treating Keywords as Outliers: A Keyphrase Extraction Approach


Title	Treating Keywords as Outliers: A Keyphrase Extraction Approach
Authors	Eirini Papagiannopoulou, Grigorios Tsoumakas
Abstract	We propose a novel unsupervised keyphrase extraction approach that filters candidate keywords using outlier detection. It starts by training word embeddings on the target document to capture semantic regularities among the words. It then uses the minimum covariance determinant estimator to model the distribution of non-keyphrase word vectors, under the assumption that these vectors come from the same distribution, indicative of their irrelevance to the semantics expressed by the dimensions of the learned vector representation. Candidate keyphrases only consist of words that are detected as outliers of this dominant distribution. Empirical results show that our approach outperforms state-of-the-art and recent unsupervised keyphrase extraction methods.
Tasks	Outlier Detection, Word Embeddings
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03712v2
PDF	http://arxiv.org/pdf/1808.03712v2.pdf
PWC	https://paperswithcode.com/paper/treating-keywords-as-outliers-a-keyphrase
Repo
Framework

Gradient Descent Provably Optimizes Over-parameterized Neural Networks


Title	Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Authors	Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
Abstract	One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies this surprising phenomenon for two-layer fully connected ReLU activated neural networks. For an $m$ hidden node shallow neural network with ReLU activation and $n$ training data, we show as long as $m$ is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. Our analysis relies on the following observation: over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. We believe these insights are also useful in analyzing deep models and other first order methods.
Tasks
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02054v2
PDF	http://arxiv.org/pdf/1810.02054v2.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-provably-optimizes-over
Repo
Framework

Matching Features without Descriptors: Implicitly Matched Interest Points


Title	Matching Features without Descriptors: Implicitly Matched Interest Points
Authors	Titus Cieslewski, Michael Bloesch, Davide Scaramuzza
Abstract	The extraction and matching of interest points is a prerequisite for many geometric computer vision problems. Traditionally, matching has been achieved by assigning descriptors to interest points and matching points that have similar descriptors. In this paper, we propose a method by which interest points are instead already implicitly matched at detection time. With this, descriptors do not need to be calculated, stored, communicated, or matched any more. This is achieved by a convolutional neural network with multiple output channels and can be thought of as a collection of a variety of detectors, each specialized to specific visual features. This paper describes how to design and train such a network in a way that results in successful relative pose estimation performance despite the limitation on interest point count. While the overall matching score is slightly lower than with traditional methods, the approach is descriptor free and thus enables localization systems with a significantly smaller memory footprint and multi-agent localization systems with lower bandwidth requirements. The network also outputs the confidence for a specific interest point resulting in a valid match. We evaluate performance relative to state-of-the-art alternatives.
Tasks	Pose Estimation
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10681v2
PDF	https://arxiv.org/pdf/1811.10681v2.pdf
PWC	https://paperswithcode.com/paper/matching-features-without-descriptors
Repo
Framework


Title	Weakly Aggregative Modal Logic: Characterization and Interpolation (new version)
Authors	Jixin Liu, Yanjing Wang, Yifeng Ding
Abstract	Weakly Aggregative Modal Logic (WAML) is a collection of disguised polyadic modal logics with n-ary modalities whose arguments are all the same. WAML has some interesting applications on epistemic logic and logic of games, so we study some basic model theoretical aspects of WAML in this paper. Specifically, we give a van Benthem-Rosen characterization theorem of WAML based on an intuitive notion of bisimulation and show that each basic WAML system K_n lacks Craig Interpolation.
Tasks
Published	2018-03-29
URL	https://arxiv.org/abs/1803.10953v3
PDF	https://arxiv.org/pdf/1803.10953v3.pdf
PWC	https://paperswithcode.com/paper/weakly-aggregative-modal-logic
Repo
Framework

A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild


Title	A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild
Authors	Saumya Kumaar, Abhinandan Dogra, Abrar Majeedi, Hanan Gani, Ravi M. Vishwanath, S N Omkar
Abstract	Facial recognition has always been a challeng- ing task for computer vision scientists and experts. Despite complexities arising due to variations in camera parameters, illumination and face orientations, significant progress has been made in the field with deep learning algorithms now competing with human-level accuracy. But in contrast to the recent advances in face recognition techniques, Disguised Facial Identification continues to be a tougher challenge in the field of computer vision. The modern day scenario, where security is of prime concern, regular face identification techniques do not perform as required when the faces are disguised, which calls for a different approach to handle situations where intruders have their faces masked. Along the same lines, we propose a deep learning architecture for disguised facial recognition (DFR). The algorithm put forward in this paper detects 20 facial key-points in the first stage, using a 14-layered convolutional neural network (CNN). These facial key-points are later utilized by a support vector machine (SVM) for classifying the disguised faces based on the euclidean distance ratios and angles between different facial key-points. This overall architecture imparts a basic intelligence to our system. Our key-point feature prediction accuracy is 65% while the classification rate is 72.4%. Moreover, the architecture works at 19 FPS, thereby performing in almost real-time. The efficiency of our approach is also compared with the state-of-the-art Disguised Facial Identification methods.
Tasks	Face Identification, Face Recognition
Published	2018-09-08
URL	http://arxiv.org/abs/1809.02875v1
PDF	http://arxiv.org/pdf/1809.02875v1.pdf
PWC	https://paperswithcode.com/paper/a-supervised-learning-methodology-for-real
Repo
Framework

Hyperbolic Attention Networks


Title	Hyperbolic Attention Networks
Authors	Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas
Abstract	We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.
Tasks	Machine Translation, Question Answering, Visual Question Answering
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09786v1
PDF	http://arxiv.org/pdf/1805.09786v1.pdf
PWC	https://paperswithcode.com/paper/hyperbolic-attention-networks
Repo
Framework

A New Target-specific Object Proposal Generation Method for Visual Tracking


Title	A New Target-specific Object Proposal Generation Method for Visual Tracking
Authors	Guanjun Guo, Hanzi Wang, Yan Yan, Hong-Yuan Mark Liao, Bo Li
Abstract	Object proposal generation methods have been widely applied to many computer vision tasks. However, existing object proposal generation methods often suffer from the problems of motion blur, low contrast, deformation, etc., when they are applied to video related tasks. In this paper, we propose an effective and highly accurate target-specific object proposal generation (TOPG) method, which takes full advantage of the context information of a video to alleviate these problems. Specifically, we propose to generate target-specific object proposals by integrating the information of two important objectness cues: colors and edges, which are complementary to each other for different challenging environments in the process of generating object proposals. As a result, the recall of the proposed TOPG method is significantly increased. Furthermore, we propose an object proposal ranking strategy to increase the rank accuracy of the generated object proposals. The proposed TOPG method has yielded significant recall gain (about 20%-60% higher) compared with several state-of-the-art object proposal methods on several challenging visual tracking datasets. Then, we apply the proposed TOPG method to the task of visual tracking and propose a TOPG-based tracker (called as TOPGT), where TOPG is used as a sample selection strategy to select a small number of high-quality target candidates from the generated object proposals. Since the object proposals generated by the proposed TOPG cover many hard negative samples and positive samples, these object proposals can not only be used for training an effective classifier, but also be used as target candidates for visual tracking. Experimental results show the superior performance of TOPGT for visual tracking compared with several other state-of-the-art visual trackers (about 3%-11% higher than the winner of the VOT2015 challenge in term of distance precision).
Tasks	Object Proposal Generation, Visual Tracking
Published	2018-03-27
URL	http://arxiv.org/abs/1803.10098v1
PDF	http://arxiv.org/pdf/1803.10098v1.pdf
PWC	https://paperswithcode.com/paper/a-new-target-specific-object-proposal
Repo
Framework

Object Hallucination in Image Captioning


Title	Object Hallucination in Image Captioning
Authors	Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko
Abstract	Despite continuously improving performance, contemporary image captioning models are prone to “hallucinating” objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. We analyze how captioning model architectures and learning objectives contribute to object hallucination, explore when hallucination is likely due to image misclassification or language priors, and assess how well current sentence metrics capture object hallucination. We investigate these questions on the standard image captioning benchmark, MSCOCO, using a diverse set of models. Our analysis yields several interesting findings, including that models which score best on standard sentence metrics do not always have lower hallucination and that models which hallucinate more tend to make errors driven by language priors.
Tasks	Image Captioning
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02156v2
PDF	http://arxiv.org/pdf/1809.02156v2.pdf
PWC	https://paperswithcode.com/paper/object-hallucination-in-image-captioning
Repo
Framework

Deep Spatiotemporal Representation of the Face for Automatic Pain Intensity Estimation


Title	Deep Spatiotemporal Representation of the Face for Automatic Pain Intensity Estimation
Authors	Mohammad Tavakolian, Abdenour Hadid
Abstract	Automatic pain intensity assessment has a high value in disease diagnosis applications. Inspired by the fact that many diseases and brain disorders can interrupt normal facial expression formation, we aim to develop a computational model for automatic pain intensity assessment from spontaneous and micro facial variations. For this purpose, we propose a 3D deep architecture for dynamic facial video representation. The proposed model is built by stacking several convolutional modules where each module encompasses a 3D convolution kernel with a fixed temporal depth, several parallel 3D convolutional kernels with different temporal depths, and an average pooling layer. Deploying variable temporal depths in the proposed architecture allows the model to effectively capture a wide range of spatiotemporal variations on the faces. Extensive experiments on the UNBC-McMaster Shoulder Pain Expression Archive database show that our proposed model yields in a promising performance compared to the state-of-the-art in automatic pain intensity estimation.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06793v1
PDF	http://arxiv.org/pdf/1806.06793v1.pdf
PWC	https://paperswithcode.com/paper/deep-spatiotemporal-representation-of-the
Repo
Framework

Deep Algorithms: designs for networks


Title	Deep Algorithms: designs for networks
Authors	Abhejit Rajagopal, Shivkumar Chandrasekaran, Hrushikesh N. Mhaskar
Abstract	A new design methodology for neural networks that is guided by traditional algorithm design is presented. To prove our point, we present two heuristics and demonstrate an algorithmic technique for incorporating additional weights in their signal-flow graphs. We show that with training the performance of these networks can not only exceed the performance of the initial network, but can match the performance of more-traditional neural network architectures. A key feature of our approach is that these networks are initialized with parameters that provide a known performance threshold for the architecture on a given task.
Tasks
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02003v1
PDF	http://arxiv.org/pdf/1806.02003v1.pdf
PWC	https://paperswithcode.com/paper/deep-algorithms-designs-for-networks
Repo
Framework

Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures


Title	Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures
Authors	Mengjia Yan, Christopher Fletcher, Josep Torrellas
Abstract	Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN’s architecture (i.e., its hyper-parameters) broadly determines the DNN’s accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN’s architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN’s architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04761v1
PDF	http://arxiv.org/pdf/1808.04761v1.pdf
PWC	https://paperswithcode.com/paper/cache-telepathy-leveraging-shared-resource
Repo
Framework

Reinforcement Evolutionary Learning Method for self-learning


Title	Reinforcement Evolutionary Learning Method for self-learning
Authors	Kumarjit Pathak, Jitin Kapila
Abstract	In statistical modelling the biggest threat is concept drift which makes the model gradually showing deteriorating performance over time. There are state of the art methodologies to detect the impact of concept drift, however general strategy considered to overcome the issue in performance is to rebuild or re-calibrate the model periodically as the variable patterns for the model changes significantly due to market change or consumer behavior change etc. Quantitative research is the most widely spread application of data science in Marketing or financial domain where applicability of state of the art reinforcement learning for auto-learning is less explored paradigm. Reinforcement learning is heavily dependent on having a simulated environment which is majorly available for gaming or online systems, to learn from the live feedback. However, there are some research happened on the area of online advertisement, pricing etc where due to the nature of the online learning environment scope of reinforcement learning is explored. Our proposed solution is a reinforcement learning based, true self-learning algorithm which can adapt to the data change or concept drift and auto learn and self-calibrate for the new patterns of the data solving the problem of concept drift. Keywords - Reinforcement learning, Genetic Algorithm, Q-learning, Classification modelling, CMA-ES, NES, Multi objective optimization, Concept drift, Population stability index, Incremental learning, F1-measure, Predictive Modelling, Self-learning, MCTS, AlphaGo, AlphaZero
Tasks	Q-Learning
Published	2018-10-07
URL	http://arxiv.org/abs/1810.03198v1
PDF	http://arxiv.org/pdf/1810.03198v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-evolutionary-learning-method
Repo
Framework