Paper Group ANR 408
Text to Image Synthesis Using Generative Adversarial Networks. L2-Nonexpansive Neural Networks. Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes. Treating Keywords as Outliers: A Keyphrase Extraction Approach. Gradient Descent Provably Optimizes Over-parameteri …
Text to Image Synthesis Using Generative Adversarial Networks
Title | Text to Image Synthesis Using Generative Adversarial Networks |
Authors | Cristian Bodnar |
Abstract | Generating images from natural language is one of the primary applications of recent conditional generative models. Besides testing our ability to model conditional, highly dimensional distributions, text to image synthesis has many exciting and practical applications such as photo editing or computer-aided content creation. Recent progress has been made using Generative Adversarial Networks (GANs). This material starts with a gentle introduction to these topics and discusses the existent state of the art models. Moreover, I propose Wasserstein GAN-CLS, a new model for conditional image generation based on the Wasserstein distance which offers guarantees of stability. Then, I show how the novel loss function of Wasserstein GAN-CLS can be used in a Conditional Progressive Growing GAN. In combination with the proposed loss, the model boosts by 7.07% the best Inception Score (on the Caltech birds dataset) of the models which use only the sentence-level visual semantics. The only model which performs better than the Conditional Wasserstein Progressive Growing GAN is the recently proposed AttnGAN which uses word-level visual semantics as well. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00676v1 |
http://arxiv.org/pdf/1805.00676v1.pdf | |
PWC | https://paperswithcode.com/paper/text-to-image-synthesis-using-generative |
Repo | |
Framework | |
L2-Nonexpansive Neural Networks
Title | L2-Nonexpansive Neural Networks |
Authors | Haifeng Qian, Mark N. Wegman |
Abstract | This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. We develop the known methodology of controlling Lipschitz constants to realize its full potential in maximizing robustness, with a new regularization scheme for linear layers, new ways to adapt nonlinearities and a new loss function. With MNIST and CIFAR-10 classifiers, we demonstrate a number of advantages. Without needing any adversarial training, the proposed classifiers exceed the state of the art in robustness against white-box L2-bounded adversarial attacks. They generalize better than ordinary networks from noisy data with partially random labels. Their outputs are quantitatively meaningful and indicate levels of confidence and generalization, among other desirable properties. |
Tasks | |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.07896v4 |
http://arxiv.org/pdf/1802.07896v4.pdf | |
PWC | https://paperswithcode.com/paper/l2-nonexpansive-neural-networks |
Repo | |
Framework | |
Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes
Title | Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes |
Authors | Xiaowei Jia, Anuj Karpatne, Jared Willard, Michael Steinbach, Jordan Read, Paul C Hanson, Hilary A Dugan, Vipin Kumar |
Abstract | In this paper, we introduce a novel framework for combining scientific knowledge within physics-based models and recurrent neural networks to advance scientific discovery in many dynamical systems. We will first describe the use of outputs from physics-based models in learning a hybrid-physics-data model. Then, we further incorporate physical knowledge in real-world dynamical systems as additional constraints for training recurrent neural networks. We will apply this approach on modeling lake temperature and quality where we take into account the physical constraints along both the depth dimension and time dimension. By using scientific knowledge to guide the construction and learning the data-driven model, we demonstrate that this method can achieve better prediction accuracy as well as scientific consistency of results. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02880v1 |
http://arxiv.org/pdf/1810.02880v1.pdf | |
PWC | https://paperswithcode.com/paper/physics-guided-recurrent-neural-networks-for |
Repo | |
Framework | |
Treating Keywords as Outliers: A Keyphrase Extraction Approach
Title | Treating Keywords as Outliers: A Keyphrase Extraction Approach |
Authors | Eirini Papagiannopoulou, Grigorios Tsoumakas |
Abstract | We propose a novel unsupervised keyphrase extraction approach that filters candidate keywords using outlier detection. It starts by training word embeddings on the target document to capture semantic regularities among the words. It then uses the minimum covariance determinant estimator to model the distribution of non-keyphrase word vectors, under the assumption that these vectors come from the same distribution, indicative of their irrelevance to the semantics expressed by the dimensions of the learned vector representation. Candidate keyphrases only consist of words that are detected as outliers of this dominant distribution. Empirical results show that our approach outperforms state-of-the-art and recent unsupervised keyphrase extraction methods. |
Tasks | Outlier Detection, Word Embeddings |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.03712v2 |
http://arxiv.org/pdf/1808.03712v2.pdf | |
PWC | https://paperswithcode.com/paper/treating-keywords-as-outliers-a-keyphrase |
Repo | |
Framework | |
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Title | Gradient Descent Provably Optimizes Over-parameterized Neural Networks |
Authors | Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh |
Abstract | One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies this surprising phenomenon for two-layer fully connected ReLU activated neural networks. For an $m$ hidden node shallow neural network with ReLU activation and $n$ training data, we show as long as $m$ is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. Our analysis relies on the following observation: over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows us to exploit a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. We believe these insights are also useful in analyzing deep models and other first order methods. |
Tasks | |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02054v2 |
http://arxiv.org/pdf/1810.02054v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-provably-optimizes-over |
Repo | |
Framework | |
Matching Features without Descriptors: Implicitly Matched Interest Points
Title | Matching Features without Descriptors: Implicitly Matched Interest Points |
Authors | Titus Cieslewski, Michael Bloesch, Davide Scaramuzza |
Abstract | The extraction and matching of interest points is a prerequisite for many geometric computer vision problems. Traditionally, matching has been achieved by assigning descriptors to interest points and matching points that have similar descriptors. In this paper, we propose a method by which interest points are instead already implicitly matched at detection time. With this, descriptors do not need to be calculated, stored, communicated, or matched any more. This is achieved by a convolutional neural network with multiple output channels and can be thought of as a collection of a variety of detectors, each specialized to specific visual features. This paper describes how to design and train such a network in a way that results in successful relative pose estimation performance despite the limitation on interest point count. While the overall matching score is slightly lower than with traditional methods, the approach is descriptor free and thus enables localization systems with a significantly smaller memory footprint and multi-agent localization systems with lower bandwidth requirements. The network also outputs the confidence for a specific interest point resulting in a valid match. We evaluate performance relative to state-of-the-art alternatives. |
Tasks | Pose Estimation |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10681v2 |
https://arxiv.org/pdf/1811.10681v2.pdf | |
PWC | https://paperswithcode.com/paper/matching-features-without-descriptors |
Repo | |
Framework | |
Weakly Aggregative Modal Logic: Characterization and Interpolation (new version)
Title | Weakly Aggregative Modal Logic: Characterization and Interpolation (new version) |
Authors | Jixin Liu, Yanjing Wang, Yifeng Ding |
Abstract | Weakly Aggregative Modal Logic (WAML) is a collection of disguised polyadic modal logics with n-ary modalities whose arguments are all the same. WAML has some interesting applications on epistemic logic and logic of games, so we study some basic model theoretical aspects of WAML in this paper. Specifically, we give a van Benthem-Rosen characterization theorem of WAML based on an intuitive notion of bisimulation and show that each basic WAML system K_n lacks Craig Interpolation. |
Tasks | |
Published | 2018-03-29 |
URL | https://arxiv.org/abs/1803.10953v3 |
https://arxiv.org/pdf/1803.10953v3.pdf | |
PWC | https://paperswithcode.com/paper/weakly-aggregative-modal-logic |
Repo | |
Framework | |
A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild
Title | A Supervised Learning Methodology for Real-Time Disguised Face Recognition in the Wild |
Authors | Saumya Kumaar, Abhinandan Dogra, Abrar Majeedi, Hanan Gani, Ravi M. Vishwanath, S N Omkar |
Abstract | Facial recognition has always been a challeng- ing task for computer vision scientists and experts. Despite complexities arising due to variations in camera parameters, illumination and face orientations, significant progress has been made in the field with deep learning algorithms now competing with human-level accuracy. But in contrast to the recent advances in face recognition techniques, Disguised Facial Identification continues to be a tougher challenge in the field of computer vision. The modern day scenario, where security is of prime concern, regular face identification techniques do not perform as required when the faces are disguised, which calls for a different approach to handle situations where intruders have their faces masked. Along the same lines, we propose a deep learning architecture for disguised facial recognition (DFR). The algorithm put forward in this paper detects 20 facial key-points in the first stage, using a 14-layered convolutional neural network (CNN). These facial key-points are later utilized by a support vector machine (SVM) for classifying the disguised faces based on the euclidean distance ratios and angles between different facial key-points. This overall architecture imparts a basic intelligence to our system. Our key-point feature prediction accuracy is 65% while the classification rate is 72.4%. Moreover, the architecture works at 19 FPS, thereby performing in almost real-time. The efficiency of our approach is also compared with the state-of-the-art Disguised Facial Identification methods. |
Tasks | Face Identification, Face Recognition |
Published | 2018-09-08 |
URL | http://arxiv.org/abs/1809.02875v1 |
http://arxiv.org/pdf/1809.02875v1.pdf | |
PWC | https://paperswithcode.com/paper/a-supervised-learning-methodology-for-real |
Repo | |
Framework | |
Hyperbolic Attention Networks
Title | Hyperbolic Attention Networks |
Authors | Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas |
Abstract | We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact. |
Tasks | Machine Translation, Question Answering, Visual Question Answering |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09786v1 |
http://arxiv.org/pdf/1805.09786v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperbolic-attention-networks |
Repo | |
Framework | |
A New Target-specific Object Proposal Generation Method for Visual Tracking
Title | A New Target-specific Object Proposal Generation Method for Visual Tracking |
Authors | Guanjun Guo, Hanzi Wang, Yan Yan, Hong-Yuan Mark Liao, Bo Li |
Abstract | Object proposal generation methods have been widely applied to many computer vision tasks. However, existing object proposal generation methods often suffer from the problems of motion blur, low contrast, deformation, etc., when they are applied to video related tasks. In this paper, we propose an effective and highly accurate target-specific object proposal generation (TOPG) method, which takes full advantage of the context information of a video to alleviate these problems. Specifically, we propose to generate target-specific object proposals by integrating the information of two important objectness cues: colors and edges, which are complementary to each other for different challenging environments in the process of generating object proposals. As a result, the recall of the proposed TOPG method is significantly increased. Furthermore, we propose an object proposal ranking strategy to increase the rank accuracy of the generated object proposals. The proposed TOPG method has yielded significant recall gain (about 20%-60% higher) compared with several state-of-the-art object proposal methods on several challenging visual tracking datasets. Then, we apply the proposed TOPG method to the task of visual tracking and propose a TOPG-based tracker (called as TOPGT), where TOPG is used as a sample selection strategy to select a small number of high-quality target candidates from the generated object proposals. Since the object proposals generated by the proposed TOPG cover many hard negative samples and positive samples, these object proposals can not only be used for training an effective classifier, but also be used as target candidates for visual tracking. Experimental results show the superior performance of TOPGT for visual tracking compared with several other state-of-the-art visual trackers (about 3%-11% higher than the winner of the VOT2015 challenge in term of distance precision). |
Tasks | Object Proposal Generation, Visual Tracking |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10098v1 |
http://arxiv.org/pdf/1803.10098v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-target-specific-object-proposal |
Repo | |
Framework | |
Object Hallucination in Image Captioning
Title | Object Hallucination in Image Captioning |
Authors | Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko |
Abstract | Despite continuously improving performance, contemporary image captioning models are prone to “hallucinating” objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. We analyze how captioning model architectures and learning objectives contribute to object hallucination, explore when hallucination is likely due to image misclassification or language priors, and assess how well current sentence metrics capture object hallucination. We investigate these questions on the standard image captioning benchmark, MSCOCO, using a diverse set of models. Our analysis yields several interesting findings, including that models which score best on standard sentence metrics do not always have lower hallucination and that models which hallucinate more tend to make errors driven by language priors. |
Tasks | Image Captioning |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02156v2 |
http://arxiv.org/pdf/1809.02156v2.pdf | |
PWC | https://paperswithcode.com/paper/object-hallucination-in-image-captioning |
Repo | |
Framework | |
Deep Spatiotemporal Representation of the Face for Automatic Pain Intensity Estimation
Title | Deep Spatiotemporal Representation of the Face for Automatic Pain Intensity Estimation |
Authors | Mohammad Tavakolian, Abdenour Hadid |
Abstract | Automatic pain intensity assessment has a high value in disease diagnosis applications. Inspired by the fact that many diseases and brain disorders can interrupt normal facial expression formation, we aim to develop a computational model for automatic pain intensity assessment from spontaneous and micro facial variations. For this purpose, we propose a 3D deep architecture for dynamic facial video representation. The proposed model is built by stacking several convolutional modules where each module encompasses a 3D convolution kernel with a fixed temporal depth, several parallel 3D convolutional kernels with different temporal depths, and an average pooling layer. Deploying variable temporal depths in the proposed architecture allows the model to effectively capture a wide range of spatiotemporal variations on the faces. Extensive experiments on the UNBC-McMaster Shoulder Pain Expression Archive database show that our proposed model yields in a promising performance compared to the state-of-the-art in automatic pain intensity estimation. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06793v1 |
http://arxiv.org/pdf/1806.06793v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-spatiotemporal-representation-of-the |
Repo | |
Framework | |
Deep Algorithms: designs for networks
Title | Deep Algorithms: designs for networks |
Authors | Abhejit Rajagopal, Shivkumar Chandrasekaran, Hrushikesh N. Mhaskar |
Abstract | A new design methodology for neural networks that is guided by traditional algorithm design is presented. To prove our point, we present two heuristics and demonstrate an algorithmic technique for incorporating additional weights in their signal-flow graphs. We show that with training the performance of these networks can not only exceed the performance of the initial network, but can match the performance of more-traditional neural network architectures. A key feature of our approach is that these networks are initialized with parameters that provide a known performance threshold for the architecture on a given task. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02003v1 |
http://arxiv.org/pdf/1806.02003v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-algorithms-designs-for-networks |
Repo | |
Framework | |
Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures
Title | Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures |
Authors | Mengjia Yan, Christopher Fletcher, Josep Torrellas |
Abstract | Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN’s architecture (i.e., its hyper-parameters) broadly determines the DNN’s accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN’s architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN’s architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16. |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04761v1 |
http://arxiv.org/pdf/1808.04761v1.pdf | |
PWC | https://paperswithcode.com/paper/cache-telepathy-leveraging-shared-resource |
Repo | |
Framework | |
Reinforcement Evolutionary Learning Method for self-learning
Title | Reinforcement Evolutionary Learning Method for self-learning |
Authors | Kumarjit Pathak, Jitin Kapila |
Abstract | In statistical modelling the biggest threat is concept drift which makes the model gradually showing deteriorating performance over time. There are state of the art methodologies to detect the impact of concept drift, however general strategy considered to overcome the issue in performance is to rebuild or re-calibrate the model periodically as the variable patterns for the model changes significantly due to market change or consumer behavior change etc. Quantitative research is the most widely spread application of data science in Marketing or financial domain where applicability of state of the art reinforcement learning for auto-learning is less explored paradigm. Reinforcement learning is heavily dependent on having a simulated environment which is majorly available for gaming or online systems, to learn from the live feedback. However, there are some research happened on the area of online advertisement, pricing etc where due to the nature of the online learning environment scope of reinforcement learning is explored. Our proposed solution is a reinforcement learning based, true self-learning algorithm which can adapt to the data change or concept drift and auto learn and self-calibrate for the new patterns of the data solving the problem of concept drift. Keywords - Reinforcement learning, Genetic Algorithm, Q-learning, Classification modelling, CMA-ES, NES, Multi objective optimization, Concept drift, Population stability index, Incremental learning, F1-measure, Predictive Modelling, Self-learning, MCTS, AlphaGo, AlphaZero |
Tasks | Q-Learning |
Published | 2018-10-07 |
URL | http://arxiv.org/abs/1810.03198v1 |
http://arxiv.org/pdf/1810.03198v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-evolutionary-learning-method |
Repo | |
Framework | |