January 25, 2020

3272 words 16 mins read

Paper Group ANR 1699

Paper Group ANR 1699

Collapse Resistant Deep Convolutional GAN for Multi-Object Image Generation. An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression. Shapley Values of Reconstruction Errors of PCA for Explaining Anomaly Detection. LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks. DeepLig …

Collapse Resistant Deep Convolutional GAN for Multi-Object Image Generation

Title Collapse Resistant Deep Convolutional GAN for Multi-Object Image Generation
Authors Elijah D. Bolluyt, Cristina Comaniciu
Abstract This work introduces a novel system for the generation of images that contain multiple classes of objects. Recent work in Generative Adversarial Networks have produced high quality images, but many focus on generating images of a single object or set of objects. Our system addresses the task of image generation conditioned on a list of desired classes to be included in a single image. This enables our system to generate images with any given combination of objects, all composed into a visually realistic natural image. The system learns the interrelationships of all classes represented in a dataset, and can generate diverse samples including a set of these classes. It displays the ability to arrange these objects together, accounting for occlusions and inter-object spatial relations that characterize complex natural images. To accomplish this, we introduce a novel architecture based on Conditional Deep Convolutional GANs that is stabilized against collapse relative to both mode and condition. The system learns to rectify mode collapse during training, self-correcting to avoid suboptimal generation modes.
Tasks Image Generation
Published 2019-11-08
URL https://arxiv.org/abs/1911.02996v1
PDF https://arxiv.org/pdf/1911.02996v1.pdf
PWC https://paperswithcode.com/paper/collapse-resistant-deep-convolutional-gan-for
Repo
Framework

An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression

Title An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Authors Sandip Modha, Prasenjit Majumder
Abstract This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: Non-aggressive, Overtly Aggressive, and Covertly Aggressive. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted $F_1$ score is used as a primary evaluation metric. Results show that text representation using BoW performs better than word embedding on machine learning classifiers. While pre-trained Word embedding techniques perform better on classifiers based on deep neural net. Recent transfer learning model like ELMO, ULMFiT are fine-tuned for the Aggression classification task. However, results are not at par with pre-trained word embedding model. Overall, word embedding using fastText produce best weighted $F_1$-score than Word2Vec and Glove. Results are further improved using pre-trained vector model. Statistical significance tests are employed to ensure the significance of the classification results. In the case of lexically different test Dataset, other than training Dataset, deep neural models are more robust and perform substantially better than machine learning classifiers.
Tasks Transfer Learning
Published 2019-04-16
URL http://arxiv.org/abs/1904.08770v1
PDF http://arxiv.org/pdf/1904.08770v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-evaluation-of-text
Repo
Framework

Shapley Values of Reconstruction Errors of PCA for Explaining Anomaly Detection

Title Shapley Values of Reconstruction Errors of PCA for Explaining Anomaly Detection
Authors Naoya Takeishi
Abstract We present a method to compute the Shapley values of reconstruction errors of principal component analysis (PCA), which is particularly useful in explaining the results of anomaly detection based on PCA. Because features are usually correlated when PCA-based anomaly detection is applied, care must be taken in computing a value function for the Shapley values. We utilize the probabilistic view of PCA, particularly its conditional distribution, to exactly compute a value function for the Shapely values. We also present numerical examples, which imply that the Shapley values are advantageous for explaining detected anomalies than raw reconstruction errors of each feature.
Tasks Anomaly Detection
Published 2019-09-08
URL https://arxiv.org/abs/1909.03495v2
PDF https://arxiv.org/pdf/1909.03495v2.pdf
PWC https://paperswithcode.com/paper/shapley-values-of-reconstruction-errors-of
Repo
Framework

LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks

Title LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks
Authors Sudhakar Kumawat, Shanmuganathan Raman
Abstract Traditional 3D Convolutional Neural Networks (CNNs) are computationally expensive, memory intensive, prone to overfit, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. The ReLPV block extracts the phase in a 3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain the feature maps. The phase is extracted by computing 3D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 3D local neighborhood of each position. These feature maps at different frequency points are then linearly combined after passing them through an activation function. The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3 times compared to the standard 3D convolutional layer with the filter sizes 3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities of the ReLPV block are significantly better than the standard 3D convolutional layer. Furthermore, it produces consistently better results across different 3D data representations. We achieve state-of-the-art accuracy on the volumetric ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the current state-of-the-art. We also improve the state-of-the-art on the UCF-101 split-1 action recognition dataset by 5.68% (when trained from scratch) while using only 15% of the parameters of the state-of-the-art. The project webpage is available at https://sites.google.com/view/lp-3dcnn/home.
Tasks Temporal Action Localization
Published 2019-04-06
URL http://arxiv.org/abs/1904.03498v1
PDF http://arxiv.org/pdf/1904.03498v1.pdf
PWC https://paperswithcode.com/paper/lp-3dcnn-unveiling-local-phase-in-3d
Repo
Framework

DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality

Title DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality
Authors Chloe LeGendre, Wan-Chun Ma, Graham Fyffe, John Flynn, Laurent Charbonnel, Jay Busch, Paul Debevec
Abstract We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera’s FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.
Tasks
Published 2019-04-02
URL http://arxiv.org/abs/1904.01175v1
PDF http://arxiv.org/pdf/1904.01175v1.pdf
PWC https://paperswithcode.com/paper/deeplight-learning-illumination-for
Repo
Framework

Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

Title Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning
Authors Wenqi Shi, Yunzhong Hou, Sheng Zhou, Zhisheng Niu, Yang Zhang, Lu Geng
Abstract Deep neural networks (DNNs) are state-of-the-art solutions for many machine learning applications, and have been widely used on mobile devices. Running DNNs on resource-constrained mobile devices often requires the help from edge servers via computation offloading. However, offloading through a bandwidth-limited wireless link is non-trivial due to the tight interplay between the computation resources on mobile devices and wireless resources. Existing studies have focused on cooperative inference where DNN models are partitioned at different neural network layers, and the two parts are executed at the mobile device and the edge server, respectively. Since the output data size of a DNN layer can be larger than that of the raw data, offloading intermediate data between layers can suffer from high transmission latency under limited wireless bandwidth. In this paper, we propose an efficient and flexible 2-step pruning framework for DNN partition between mobile devices and edge servers. In our framework, the DNN model only needs to be pruned once in the training phase where unimportant convolutional filters are removed iteratively. By limiting the pruning region, our framework can greatly reduce either the wireless transmission workload of the device or the total computation workload. A series of pruned models are generated in the training phase, from which the framework can automatically select to satisfy varying latency and accuracy requirements. Furthermore, coding for the intermediate data is added to provide extra transmission workload reduction. Our experiments show that the proposed framework can achieve up to 25.6$\times$ reduction on transmission workload, 6.01$\times$ acceleration on total computation and 4.81$\times$ reduction on end-to-end latency as compared to partitioning the original DNN model without pruning.
Tasks
Published 2019-03-08
URL http://arxiv.org/abs/1903.03472v1
PDF http://arxiv.org/pdf/1903.03472v1.pdf
PWC https://paperswithcode.com/paper/improving-device-edge-cooperative-inference
Repo
Framework

On-manifold Adversarial Data Augmentation Improves Uncertainty Calibration

Title On-manifold Adversarial Data Augmentation Improves Uncertainty Calibration
Authors Kanil Patel, William Beluch, Dan Zhang, Michael Pfeiffer, Bin Yang
Abstract Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. In order to improve uncertainty estimation, we propose On-Manifold Adversarial Data Augmentation or OMADA, which specifically attempts to generate the most challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder-based generative model that closely approximates decision boundaries between two or more classes. On a variety of datasets as well as on multiple diverse network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.
Tasks Adversarial Attack, Calibration, Data Augmentation
Published 2019-12-16
URL https://arxiv.org/abs/1912.07458v3
PDF https://arxiv.org/pdf/1912.07458v3.pdf
PWC https://paperswithcode.com/paper/on-manifold-adversarial-data-augmentation
Repo
Framework

Multi-modality Latent Interaction Network for Visual Question Answering

Title Multi-modality Latent Interaction Network for Visual Question Answering
Authors Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li
Abstract Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations between individual visual regions and words, which are not enough to correctly answer the question. From humans’ perspective, answering a visual question requires understanding the summarizations of visual and language information. In this paper, we proposed the Multi-modality Latent Interaction module (MLI) to tackle this problem. The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations. The cross-modality information between the latent summarizations are propagated to fuse valuable information from both modalities and are used to update the visual and word features. Such MLI modules can be stacked for several stages to model complex and latent relations between the two modalities and achieves highly competitive performance on public VQA benchmarks, VQA v2.0 and TDIUC . In addition, we show that the performance of our methods could be significantly improved by combining with pre-trained language model BERT.
Tasks Language Modelling, Question Answering, Visual Question Answering
Published 2019-08-10
URL https://arxiv.org/abs/1908.04289v1
PDF https://arxiv.org/pdf/1908.04289v1.pdf
PWC https://paperswithcode.com/paper/multi-modality-latent-interaction-network-for
Repo
Framework

Write, Execute, Assess: Program Synthesis with a REPL

Title Write, Execute, Assess: Program Synthesis with a REPL
Authors Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, Armando Solar-Lezama
Abstract We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs. We equip the search process with an interpreter or a read-eval-print-loop (REPL), which immediately executes partially written programs, exposing their semantics. The REPL addresses a basic challenge of program synthesis: tiny changes in syntax can lead to huge changes in semantics. We train a pair of models, a policy that proposes the new piece of code to write, and a value function that assesses the prospects of the code written so-far. At test time we can combine these models with a Sequential Monte Carlo algorithm. We apply our approach to two domains: synthesizing text editing programs and inferring 2D and 3D graphics programs.
Tasks Program Synthesis
Published 2019-06-09
URL https://arxiv.org/abs/1906.04604v1
PDF https://arxiv.org/pdf/1906.04604v1.pdf
PWC https://paperswithcode.com/paper/write-execute-assess-program-synthesis-with-a
Repo
Framework

Generalization in multitask deep neural classifiers: a statistical physics approach

Title Generalization in multitask deep neural classifiers: a statistical physics approach
Authors Tyler Lee, Anthony Ndirango
Abstract A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings. We do so by adapting techniques from the statistical physics of disordered systems, accounting for both finite size datasets and correlated outputs induced by the training dynamics. We discuss the validity of our theoretical results in comparison to a comprehensive suite of numerical experiments. Our analysis provides theoretical support for the intuition that the performance of multitask learning is determined by the noisiness of the tasks and how well their input features align with each other. Highly related, clean tasks benefit each other, whereas unrelated, clean tasks can be detrimental to individual task performance.
Tasks
Published 2019-10-30
URL https://arxiv.org/abs/1910.13593v1
PDF https://arxiv.org/pdf/1910.13593v1.pdf
PWC https://paperswithcode.com/paper/generalization-in-multitask-deep-neural
Repo
Framework

Pixel-Adaptive Convolutional Neural Networks

Title Pixel-Adaptive Convolutional Neural Networks
Authors Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, Jan Kautz
Abstract Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.05373v1
PDF http://arxiv.org/pdf/1904.05373v1.pdf
PWC https://paperswithcode.com/paper/pixel-adaptive-convolutional-neural-networks
Repo
Framework

Function approximation by deep networks

Title Function approximation by deep networks
Authors H. N. Mhaskar, T. Poggio
Abstract We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift’ theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network. |
Tasks
Published 2019-05-30
URL https://arxiv.org/abs/1905.12882v2
PDF https://arxiv.org/pdf/1905.12882v2.pdf
PWC https://paperswithcode.com/paper/function-approximation-by-deep-networks
Repo
Framework

Text Level Graph Neural Network for Text Classification

Title Text Level Graph Neural Network for Text Classification
Authors Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng WANG
Abstract Recently, researches have explored the graph neural network (GNN) techniques on text classification, since GNN does well in handling complex structures and preserving global information. However, previous methods based on GNN are mainly faced with the practical problems of fixed corpus level graph structure which do not support online testing and high memory consumption. To tackle the problems, we propose a new GNN based model that builds graphs for each input text with global parameters sharing instead of a single graph for the whole corpus. This method removes the burden of dependence between an individual text and entire corpus which support online testing, but still preserve global information. Besides, we build graphs by much smaller windows in the text, which not only extract more local features but also significantly reduce the edge numbers as well as memory consumption. Experiments show that our model outperforms existing models on several text classification datasets even with consuming less memory.
Tasks Text Classification
Published 2019-10-06
URL https://arxiv.org/abs/1910.02356v2
PDF https://arxiv.org/pdf/1910.02356v2.pdf
PWC https://paperswithcode.com/paper/text-level-graph-neural-network-for-text
Repo
Framework

A Theoretical Connection Between Statistical Physics and Reinforcement Learning

Title A Theoretical Connection Between Statistical Physics and Reinforcement Learning
Authors Jad Rahme, Ryan P. Adams
Abstract Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function $\mathcal{Z}$, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and $Q$-functions can be derived from this partition function and interpreted via average energies, the $\mathcal{Z}$-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $\mathcal{Z}$ is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these $\mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.
Tasks Decision Making
Published 2019-06-24
URL https://arxiv.org/abs/1906.10228v1
PDF https://arxiv.org/pdf/1906.10228v1.pdf
PWC https://paperswithcode.com/paper/a-theoretical-connection-between-statistical
Repo
Framework

Two models of double descent for weak features

Title Two models of double descent for weak features
Authors Mikhail Belkin, Daniel Hsu, Ji Xu
Abstract The “double descent” risk curve was recently proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models. This article provides a precise mathematical analysis for the shape of this curve in two simple data models with the least squares/least norm predictor. Specifically, it is shown that the risk peaks when the number of features $p$ is close to the sample size $n$, but also that the risk decreases towards its minimum as $p$ increases beyond $n$. This behavior is contrasted with that of “prescient” models that select features in an a priori optimal order.
Tasks
Published 2019-03-18
URL http://arxiv.org/abs/1903.07571v1
PDF http://arxiv.org/pdf/1903.07571v1.pdf
PWC https://paperswithcode.com/paper/two-models-of-double-descent-for-weak
Repo
Framework
comments powered by Disqus