October 20, 2019

3276 words 16 mins read

Paper Group AWR 338

Paper Group AWR 338

Neural Network Acceptability Judgments. The PyTorch-Kaldi Speech Recognition Toolkit. A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces based on Brightness and Mass Conservation Laws. A Quantum Many-body Wave Function Inspired Language Modeling Approach. 3D Human Pose Estimation with Siamese Equivariant Embeddin …

Neural Network Acceptability Judgments

Title Neural Network Acceptability Judgments
Authors Alex Warstadt, Amanpreet Singh, Samuel R. Bowman
Abstract This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.‘s models and ours learn systematic generalizations like subject-verb-object order. However, all models we test perform far below human level on a wide range of grammatical constructions.
Tasks Language Acquisition, Linguistic Acceptability
Published 2018-05-31
URL https://arxiv.org/abs/1805.12471v3
PDF https://arxiv.org/pdf/1805.12471v3.pdf
PWC https://paperswithcode.com/paper/neural-network-acceptability-judgments
Repo https://github.com/nyu-mll/CoLA-baselines
Framework pytorch

The PyTorch-Kaldi Speech Recognition Toolkit

Title The PyTorch-Kaldi Speech Recognition Toolkit
Authors Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio
Abstract The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
Tasks Distant Speech Recognition, Noisy Speech Recognition, Speech Recognition
Published 2018-11-19
URL http://arxiv.org/abs/1811.07453v2
PDF http://arxiv.org/pdf/1811.07453v2.pdf
PWC https://paperswithcode.com/paper/the-pytorch-kaldi-speech-recognition-toolkit
Repo https://github.com/xpz123/pytorch-kaldi
Framework pytorch

A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces based on Brightness and Mass Conservation Laws

Title A Numerical Framework for Efficient Motion Estimation on Evolving Sphere-Like Surfaces based on Brightness and Mass Conservation Laws
Authors Lukas F. Lang
Abstract In this work we consider brightness and mass conservation laws for motion estimation on evolving Riemannian 2-manifolds that allow for a radial parametrisation from the 2-sphere. While conservation of brightness constitutes the foundation for optical flow methods and has been generalised to said scenario, we formulate in this article the principle of mass conservation for time-varying surfaces which are embedded in Euclidean 3-space and derive a generalised continuity equation. The main motivation for this work is efficient cell motion estimation in time-lapse (4D) volumetric fluorescence microscopy images of a living zebrafish embryo. Increasing spatial and temporal resolution of modern microscopes require efficient analysis of such data. With this application in mind we address this need and follow an emerging paradigm in this field: dimensional reduction. In light of the ill-posedness of considered conservation laws we employ Tikhonov regularisation and propose the use of spatially varying regularisation functionals that recover motion only in regions with cells. For the efficient numerical solution we devise a Galerkin method based on compactly supported (tangent) vectorial basis functions. Furthermore, for the fast and accurate estimation of the evolving sphere-like surface from scattered data we utilise surface interpolation with spatio-temporal regularisation. We present numerical results based on aforementioned zebrafish microscopy data featuring fluorescently labelled cells.
Tasks Motion Estimation, Optical Flow Estimation
Published 2018-05-02
URL http://arxiv.org/abs/1805.01006v1
PDF http://arxiv.org/pdf/1805.01006v1.pdf
PWC https://paperswithcode.com/paper/a-numerical-framework-for-efficient-motion
Repo https://github.com/lukaslang/ofcm
Framework none

A Quantum Many-body Wave Function Inspired Language Modeling Approach

Title A Quantum Many-body Wave Function Inspired Language Modeling Approach
Authors Peng Zhang, Zhan Su, Lipeng Zhang, Benyou Wang, Dawei Song
Abstract The recently proposed quantum language model (QLM) aimed at a principled approach to modeling term dependency by applying the quantum probability theory. The latest development for a more effective QLM has adopted word embeddings as a kind of global dependency information and integrated the quantum-inspired idea in a neural network architecture. While these quantum-inspired LMs are theoretically more general and also practically effective, they have two major limitations. First, they have not taken into account the interaction among words with multiple meanings, which is common and important in understanding natural language text. Second, the integration of the quantum-inspired LM with the neural network was mainly for effective training of parameters, yet lacking a theoretical foundation accounting for such integration. To address these two issues, in this paper, we propose a Quantum Many-body Wave Function (QMWF) inspired language modeling approach. The QMWF inspired LM can adopt the tensor product to model the aforesaid interaction among words. It also enables us to reveal the inherent necessity of using Convolutional Neural Network (CNN) in QMWF language modeling. Furthermore, our approach delivers a simple algorithm to represent and match text/sentence pairs. Systematic evaluation shows the effectiveness of the proposed QMWF-LM algorithm, in comparison with the state of the art quantum-inspired LMs and a couple of CNN-based methods, on three typical Question Answering (QA) datasets.
Tasks Language Modelling, Question Answering, Word Embeddings
Published 2018-08-28
URL http://arxiv.org/abs/1808.09891v3
PDF http://arxiv.org/pdf/1808.09891v3.pdf
PWC https://paperswithcode.com/paper/a-quantum-many-body-wave-function-inspired
Repo https://github.com/TJUIRLAB/CIKM2018_QMWFLM
Framework tf

3D Human Pose Estimation with Siamese Equivariant Embedding

Title 3D Human Pose Estimation with Siamese Equivariant Embedding
Authors Márton Véges, Viktor Varga, András Lőrincz
Abstract In monocular 3D human pose estimation a common setup is to first detect 2D positions and then lift the detection into 3D coordinates. Many algorithms suffer from overfitting to camera positions in the training set. We propose a siamese architecture that learns a rotation equivariant hidden representation to reduce the need for data augmentation. Our method is evaluated on multiple databases with different base networks and shows a consistent improvement of error metrics. It achieves state-of-the-art cross-camera error rate among algorithms that use estimated 2D joint coordinates only.
Tasks 3D Human Pose Estimation, Data Augmentation, Pose Estimation
Published 2018-09-19
URL http://arxiv.org/abs/1809.07217v2
PDF http://arxiv.org/pdf/1809.07217v2.pdf
PWC https://paperswithcode.com/paper/3d-human-pose-estimation-with-siamese
Repo https://github.com/vegesm/siamese-pose-estimation
Framework tf

Robustness of Conditional GANs to Noisy Labels

Title Robustness of Conditional GANs to Noisy Labels
Authors Kiran Koshy Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh
Abstract We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the noise model is known or not. When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN). The main idea is to corrupt the label of the generated sample before feeding to the adversarial discriminator, forcing the generator to produce samples with clean labels. This approach of passing through a matching noisy channel is justified by corresponding multiplicative approximation bounds between the loss of the RCGAN and the distance between the clean real distribution and the generator distribution. This shows that the proposed approach is robust, when used with a carefully chosen discriminator architecture, known as projection discriminator. When the distribution of the noise is not known, we provide an extension of our architecture, which we call RCGAN-U, that learns the noise model simultaneously while training the generator. We show experimentally on MNIST and CIFAR-10 datasets that both the approaches consistently improve upon baseline approaches, and RCGAN-U closely matches the performance of RCGAN.
Tasks
Published 2018-11-08
URL http://arxiv.org/abs/1811.03205v1
PDF http://arxiv.org/pdf/1811.03205v1.pdf
PWC https://paperswithcode.com/paper/robustness-of-conditional-gans-to-noisy
Repo https://github.com/POLane16/Robust-Conditional-GAN
Framework tf

An Attention-Gated Convolutional Neural Network for Sentence Classification

Title An Attention-Gated Convolutional Neural Network for Sentence Classification
Authors Yang Liu, Lixin Ji, Ruiyang Huang, Tuosiyu Ming, Chao Gao, Jianpeng Zhang
Abstract The classification of sentences is very challenging, since sentences contain the limited contextual information. In this paper, we proposed an Attention-Gated Convolutional Neural Network (AGCNN) for sentence classification, which generates attention weights from the feature’s context windows of different sizes by using specialized convolution encoders. It makes full use of limited contextual information to extract and enhance the influence of important features in predicting the sentence’s category. Experimental results demonstrated that our model can achieve up to 3.1% higher accuracy than standard CNN models, and gain competitive results over the baselines on four out of the six tasks. Besides, we designed an activation function, namely, Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed that NLReLU can outperform ReLU and is comparable to other well-known activation functions on AGCNN.
Tasks Sentence Classification
Published 2018-08-22
URL http://arxiv.org/abs/1808.07325v3
PDF http://arxiv.org/pdf/1808.07325v3.pdf
PWC https://paperswithcode.com/paper/an-attention-gated-convolutional-neural
Repo https://github.com/fabyangliu/AGCNN_sentence_classification
Framework tf

Distributionally Adversarial Attack

Title Distributionally Adversarial Attack
Authors Tianhang Zheng, Changyou Chen, Kui Ren
Abstract Recent work on adversarial attack has shown that Projected Gradient Descent (PGD) Adversary is a universal first-order adversary, and the classifier adversarially trained by PGD is robust against a wide range of first-order attacks. It is worth noting that the original objective of an attack/defense model relies on a data distribution $p(\mathbf{x})$, typically in the form of risk maximization/minimization, e.g., $\max/\min\mathbb{E}{p(\mathbf(x))}\mathcal{L}(\mathbf{x})$ with $p(\mathbf{x})$ some unknown data distribution and $\mathcal{L}(\cdot)$ a loss function. However, since PGD generates attack samples independently for each data sample based on $\mathcal{L}(\cdot)$, the procedure does not necessarily lead to good generalization in terms of risk optimization. In this paper, we achieve the goal by proposing distributionally adversarial attack (DAA), a framework to solve an optimal {\em adversarial-data distribution}, a perturbed distribution that satisfies the $L\infty$ constraint but deviates from the original data distribution to increase the generalization risk maximally. Algorithmically, DAA performs optimization on the space of potential data distributions, which introduces direct dependency between all data points when generating adversarial samples. DAA is evaluated by attacking state-of-the-art defense models, including the adversarially-trained models provided by {\em MIT MadryLab}. Notably, DAA ranks {\em the first place} on MadryLab’s white-box leaderboards, reducing the accuracy of their secret MNIST model to $88.79%$ (with $l_\infty$ perturbations of $\epsilon = 0.3$) and the accuracy of their secret CIFAR model to $44.71%$ (with $l_\infty$ perturbations of $\epsilon = 8.0$). Code for the experiments is released on \url{https://github.com/tianzheng4/Distributionally-Adversarial-Attack}.
Tasks Adversarial Attack
Published 2018-08-16
URL http://arxiv.org/abs/1808.05537v3
PDF http://arxiv.org/pdf/1808.05537v3.pdf
PWC https://paperswithcode.com/paper/distributionally-adversarial-attack
Repo https://github.com/tianzheng4/Distributionally-Adversarial-Attack
Framework tf

Improved Network Robustness with Adversary Critic

Title Improved Network Robustness with Adversary Critic
Authors Alexander Matyasko, Lap-Pui Chau
Abstract Ideally, what confuses neural network should be confusing to humans. However, recent experiments have shown that small, imperceptible perturbations can change the network prediction. To address this gap in perception, we propose a novel approach for learning robust classifier. Our main idea is: adversarial examples for the robust classifier should be indistinguishable from the regular data of the adversarial target. We formulate a problem of learning robust classifier in the framework of Generative Adversarial Networks (GAN), where the adversarial attack on classifier acts as a generator, and the critic network learns to distinguish between regular and adversarial images. The classifier cost is augmented with the objective that its adversarial examples should confuse the adversary critic. To improve the stability of the adversarial mapping, we introduce adversarial cycle-consistency constraint which ensures that the adversarial mapping of the adversarial examples is close to the original. In the experiments, we show the effectiveness of our defense. Our method surpasses in terms of robustness networks trained with adversarial training. Additionally, we verify in the experiments with human annotators on MTurk that adversarial examples are indeed visually confusing. Codes for the project are available at https://github.com/aam-at/adversary_critic.
Tasks Adversarial Attack
Published 2018-10-30
URL http://arxiv.org/abs/1810.12576v1
PDF http://arxiv.org/pdf/1810.12576v1.pdf
PWC https://paperswithcode.com/paper/improved-network-robustness-with-adversary
Repo https://github.com/aam-at/adversary_critic
Framework tf

Multinomial Adversarial Networks for Multi-Domain Text Classification

Title Multinomial Adversarial Networks for Multi-Domain Text Classification
Authors Xilun Chen, Claire Cardie
Abstract Many text classification tasks are known to be highly domain-dependent. Unfortunately, the availability of training data can vary drastically across domains. Worse still, for some domains there may not be any annotated data at all. In this work, we propose a multinomial adversarial network (MAN) to tackle the text classification problem in this real-world multidomain setting (MDTC). We provide theoretical justifications for the MAN framework, proving that different instances of MANs are essentially minimizers of various f-divergence metrics (Ali and Silvey, 1966) among multiple probability distributions. MANs are thus a theoretically sound generalization of traditional adversarial networks that discriminate over two distributions. More specifically, for the MDTC task, MAN learns features that are invariant across multiple domains by resorting to its ability to reduce the divergence among the feature distributions of each domain. We present experimental results showing that MANs significantly outperform the prior art on the MDTC task. We also show that MANs achieve state-of-the-art performance for domains with no labeled data.
Tasks Cross-Domain Text Classification, Domain Adaptation, Text Classification, Unsupervised Domain Adaptation
Published 2018-02-15
URL http://arxiv.org/abs/1802.05694v1
PDF http://arxiv.org/pdf/1802.05694v1.pdf
PWC https://paperswithcode.com/paper/multinomial-adversarial-networks-for-multi
Repo https://github.com/ccsasuke/man
Framework pytorch

Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation

Title Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation
Authors Hao-Wen Dong, Yi-Hsuan Yang
Abstract We propose the BinaryGAN, a novel generative adversarial network (GAN) that uses binary neurons at the output layer of the generator. We employ the sigmoid-adjusted straight-through estimators to estimate the gradients for the binary neurons and train the whole network by end-to-end backpropogation. The proposed model is able to directly generate binary-valued predictions at test time. We implement such a model to generate binarized MNIST digits and experimentally compare the performance for different types of binary neurons, GAN objectives and network architectures. Although the results are still preliminary, we show that it is possible to train a GAN that has binary neurons and that the use of gradient estimators can be a promising direction for modeling discrete distributions with GANs. For reproducibility, the source code is available at https://github.com/salu133445/binarygan .
Tasks
Published 2018-10-10
URL http://arxiv.org/abs/1810.04714v1
PDF http://arxiv.org/pdf/1810.04714v1.pdf
PWC https://paperswithcode.com/paper/training-generative-adversarial-networks-with-1
Repo https://github.com/torchgan/model-zoo
Framework pytorch

Langevin-gradient parallel tempering for Bayesian neural learning

Title Langevin-gradient parallel tempering for Bayesian neural learning
Authors Rohitash Chandra, Konark Jain, Ratneel V. Deo, Sally Cripps
Abstract Bayesian neural learning feature a rigorous approach to estimation and uncertainty quantification via the posterior distribution of weights that represent knowledge of the neural network. This not only provides point estimates of optimal set of weights but also the ability to quantify uncertainty in decision making using the posterior distribution. Markov chain Monte Carlo (MCMC) techniques are typically used to obtain sample-based estimates of the posterior distribution. However, these techniques face challenges in convergence and scalability, particularly in settings with large datasets and network architectures. This paper address these challenges in two ways. First, parallel tempering is used used to explore multiple modes of the posterior distribution and implemented in multi-core computing architecture. Second, we make within-chain sampling schemes more efficient by using Langevin gradient information in forming Metropolis-Hastings proposal distributions. We demonstrate the techniques using time series prediction and pattern classification applications. The results show that the method not only improves the computational time, but provides better prediction or decision making capabilities when compared to related methods.
Tasks Decision Making, Time Series, Time Series Prediction
Published 2018-11-11
URL http://arxiv.org/abs/1811.04343v1
PDF http://arxiv.org/pdf/1811.04343v1.pdf
PWC https://paperswithcode.com/paper/langevin-gradient-parallel-tempering-for
Repo https://github.com/sydney-machine-learning/surrogate-assisted-parallel-tempering
Framework tf

Memory Efficient Experience Replay for Streaming Learning

Title Memory Efficient Experience Replay for Streaming Learning
Authors Tyler L. Hayes, Nathan D. Cahill, Christopher Kanan
Abstract In supervised machine learning, an agent is typically trained once and then deployed. While this works well for static settings, robots often operate in changing environments and must quickly learn new things from data streams. In this paradigm, known as streaming learning, a learner is trained online, in a single pass, from a data stream that cannot be assumed to be independent and identically distributed (iid). Streaming learning will cause conventional deep neural networks (DNNs) to fail for two reasons: 1) they need multiple passes through the entire dataset; and 2) non-iid data will cause catastrophic forgetting. An old fix to both of these issues is rehearsal. To learn a new example, rehearsal mixes it with previous examples, and then this mixture is used to update the DNN. Full rehearsal is slow and memory intensive because it stores all previously observed examples, and its effectiveness for preventing catastrophic forgetting has not been studied in modern DNNs. Here, we describe the ExStream algorithm for memory efficient rehearsal and compare it to alternatives. We find that full rehearsal can eliminate catastrophic forgetting in a variety of streaming learning settings, with ExStream performing well using far less memory and computation.
Tasks
Published 2018-09-16
URL http://arxiv.org/abs/1809.05922v2
PDF http://arxiv.org/pdf/1809.05922v2.pdf
PWC https://paperswithcode.com/paper/memory-efficient-experience-replay-for
Repo https://github.com/tyler-hayes/ExStream
Framework pytorch

Multi-Scale Deep Compressive Sensing Network

Title Multi-Scale Deep Compressive Sensing Network
Authors Thuong Nguyen Canh, Byeungwoo Jeon
Abstract With joint learning of sampling and recovery, the deep learning-based compressive sensing (DCS) has shown significant improvement in performance and running time reduction. Its reconstructed image, however, losses high-frequency content especially at low subrates. This happens similarly in the multi-scale sampling scheme which also samples more low-frequency components. In this paper, we propose a multi-scale DCS convolutional neural network (MS-DCSNet) in which we convert image signal using multiple scale-based wavelet transform, then capture it through convolution block by block across scales. The initial reconstructed image is directly recovered from multi-scale measurements. Multi-scale wavelet convolution is utilized to enhance the final reconstruction quality. The network is able to learn both multi-scale sampling and multi-scale reconstruction, thus results in better reconstruction quality.
Tasks Compressive Sensing
Published 2018-09-15
URL http://arxiv.org/abs/1809.05717v2
PDF http://arxiv.org/pdf/1809.05717v2.pdf
PWC https://paperswithcode.com/paper/multi-scale-deep-compressive-sensing-network
Repo https://github.com/AtenaKid/MS-DCSNet-Release
Framework none

Benchmark data and method for real-time people counting in cluttered scenes using depth sensors

Title Benchmark data and method for real-time people counting in cluttered scenes using depth sensors
Authors ShiJie Sun, Naveed Akhtar, HuanSheng Song, ChaoYang Zhang, JianXin Li, Ajmal Mian
Abstract Vision-based automatic counting of people has widespread applications in intelligent transportation systems, security, and logistics. However, there is currently no large-scale public dataset for benchmarking approaches on this problem. This work fills this gap by introducing the first real-world RGB-D People Counting DataSet (PCDS) containing over 4,500 videos recorded at the entrance doors of buses in normal and cluttered conditions. It also proposes an efficient method for counting people in real-world cluttered scenes related to public transportations using depth videos. The proposed method computes a point cloud from the depth video frame and re-projects it onto the ground plane to normalize the depth information. The resulting depth image is analyzed for identifying potential human heads. The human head proposals are meticulously refined using a 3D human model. The proposals in each frame of the continuous video stream are tracked to trace their trajectories. The trajectories are again refined to ascertain reliable counting. People are eventually counted by accumulating the head trajectories leaving the scene. To enable effective head and trajectory identification, we also propose two different compound features. A thorough evaluation on PCDS demonstrates that our technique is able to count people in cluttered scenes with high accuracy at 45 fps on a 1.7 GHz processor, and hence it can be deployed for effective real-time people counting for intelligent transportation systems.
Tasks
Published 2018-04-12
URL http://arxiv.org/abs/1804.04339v2
PDF http://arxiv.org/pdf/1804.04339v2.pdf
PWC https://paperswithcode.com/paper/benchmark-data-and-method-for-real-time
Repo https://github.com/shijieS/people-counting-dataset
Framework none
comments powered by Disqus