January 25, 2020

2942 words 14 mins read

Paper Group NAWR 7

Paper Group NAWR 7

Building English-to-Serbian Machine Translation System for IMDb Movie Reviews. Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses. Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning. Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks. Scalable Bayesian inferen …

Building English-to-Serbian Machine Translation System for IMDb Movie Reviews

Title Building English-to-Serbian Machine Translation System for IMDb Movie Reviews
Authors Pintu Lohar, Maja Popovi{'c}, Andy Way
Abstract This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.
Tasks Machine Translation
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3715/
PDF https://www.aclweb.org/anthology/W19-3715
PWC https://paperswithcode.com/paper/building-english-to-serbian-machine
Repo https://github.com/m-popovic/imdb-corpus-for-MT
Framework none

Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses

Title Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses
Authors Jerome Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, Eric Granger
Abstract Research on adversarial examples in computer vision tasks has shown that small, often imperceptible changes to an image can induce misclassification, which has security implications for a wide range of image processing systems. Considering L2 norm distortions, the Carlini and Wagner attack is presently the most effective white-box attack in the literature. However, this method is slow since it performs a line-search for one of the optimization terms, and often requires thousands of iterations. In this paper, an efficient approach is proposed to generate gradient-based attacks that induce misclassifications with low L2 norm, by decoupling the direction and the norm of the adversarial perturbation that is added to the image. Experiments conducted on the MNIST, CIFAR-10 and ImageNet datasets indicate that our attack achieves comparable results to the state-of-the-art (in terms of L2 norm) with considerably fewer iterations (as few as 100 iterations), which opens the possibility of using these attacks for adversarial training. Models trained with our attack achieve state-of-the-art robustness against white-box gradient-based L2 attacks on the MNIST and CIFAR-10 datasets, outperforming the Madry defense when the attacks are limited to a maximum norm.
Tasks Adversarial Attack, Adversarial Defense
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Rony_Decoupling_Direction_and_Norm_for_Efficient_Gradient-Based_L2_Adversarial_Attacks_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Rony_Decoupling_Direction_and_Norm_for_Efficient_Gradient-Based_L2_Adversarial_Attacks_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/decoupling-direction-and-norm-for-efficient-1
Repo https://github.com/jeromerony/fast_adversarial
Framework pytorch

Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning

Title Limited Data Rolling Bearing Fault Diagnosis with Few-shot Learning
Authors Ansi Zhang, Shaobo Li, Yuxin Cui, Wanli Yang, Rongzhi Dong and Jianjun Hu
Abstract This paper focuses on bearing fault diagnosis with limited training data. A major challenge in fault diagnosis is the infeasibility of obtaining sufficient training samples for every fault type under all working conditions. Recently deep learning based fault diagnosis methods have achieved promising results. However, most of these methods require large amount of training data. In this study, we propose a deep neural network based few-shot learning approach for rolling bearing fault diagnosis with limited data. Our model is based on the siamese neural network, which learns by exploiting sample pairs of the same or different categories. Experimental results over the standard Case Western Reserve University (CWRU) bearing fault diagnosis benchmark dataset showed that our few-shot learning approach is more effective in fault diagnosis with limited data availability. When tested over different noise environments with minimal amount of training data, the performance of our few-shot learning model surpasses the one of the baseline with reasonable noise level. When evaluated over test sets with new fault types or new working conditions, few-shot models work better than the baseline trained with all fault types. All our models and datasets in this study are open sourced and can be downloaded from https://mekhub.cn/as/fault_diagnosis_with_few-shot_learning/ .
Tasks Few-Shot Learning
Published 2019-08-22
URL https://ieeexplore.ieee.org/abstract/document/8793060
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8793060
PWC https://paperswithcode.com/paper/limited-data-rolling-bearing-fault-diagnosis
Repo https://github.com/SNBQT/Limited-Data-Rolling-Bearing-Fault-Diagnosis-with-Few-shot-Learning
Framework none

Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks

Title Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks
Authors Seungjoo Yoo, Hyojin Bahng, Sunghyo Chung, Junsoo Lee, Jaehyuk Chang, Jaegul Choo
Abstract Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning. Existing models require a significant amount of training data. To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data. In particular, our model is able to capture rare instances and successfully colorize them. Also, we propose a novel threshold triplet loss that enables unsupervised training of memory networks without the need for class labels. Experiments show that our model has superior quality in both few-shot and one-shot colorization tasks.
Tasks Colorization, Few-Shot Learning
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Yoo_Coloring_With_Limited_Data_Few-Shot_Colorization_via_Memory_Augmented_Networks_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Yoo_Coloring_With_Limited_Data_Few-Shot_Colorization_via_Memory_Augmented_Networks_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/coloring-with-limited-data-few-shot
Repo https://github.com/dongheehand/MemoPainter-PyTorch
Framework pytorch

Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Title Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models
Authors Ruoxi Sun, Ian Kinsella, Scott Linderman, Liam Paninski
Abstract Recent advances in optical voltage sensors have brought us closer to a critical goal in cellular neuroscience: imaging the full spatiotemporal voltage on a dendritic tree. However, current sensors and imaging approaches still face significant limitations in SNR and sampling frequency; therefore statistical denoising and interpolation methods remain critical for understanding single-trial spatiotemporal dendritic voltage dynamics. Previous denoising approaches were either based on an inadequate linear voltage model or scaled poorly to large trees. Here we introduce a scalable fully Bayesian approach. We develop a generative nonlinear model that requires few parameters per compartment of the cell but is nonetheless flexible enough to sample realistic spatiotemporal data. The model captures different dynamics in each compartment and leverages biophysical knowledge to constrain intra- and inter-compartmental dynamics. We obtain a full posterior distribution over spatiotemporal voltage via an augmented Gibbs sampling algorithm. The nonlinear smoother model outperforms previously developed linear methods, and scales to much larger systems than previous methods based on sequential Monte Carlo approaches.
Tasks Bayesian Inference, Denoising
Published 2019-12-01
URL http://papers.nips.cc/paper/9206-scalable-bayesian-inference-of-dendritic-voltage-via-spatiotemporal-recurrent-state-space-models
PDF http://papers.nips.cc/paper/9206-scalable-bayesian-inference-of-dendritic-voltage-via-spatiotemporal-recurrent-state-space-models.pdf
PWC https://paperswithcode.com/paper/scalable-bayesian-inference-of-dendritic
Repo https://github.com/SunRuoxi/Voltage_
Framework none

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Title Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning
Authors Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Abstract Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.
Tasks Continuous Control
Published 2019-12-01
URL http://papers.nips.cc/paper/9026-loaded-dice-trading-off-bias-and-variance-in-any-order-score-function-gradient-estimators-for-reinforcement-learning
PDF http://papers.nips.cc/paper/9026-loaded-dice-trading-off-bias-and-variance-in-any-order-score-function-gradient-estimators-for-reinforcement-learning.pdf
PWC https://paperswithcode.com/paper/loaded-dice-trading-off-bias-and-variance-in-1
Repo https://github.com/oxwhirl/loaded-dice
Framework none

Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning

Title Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning
Authors Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
Abstract Event detection systems rely on discrimination knowledge to distinguish ambiguous trigger words and generalization knowledge to detect unseen/sparse trigger words. Current neural event detection approaches focus on trigger-centric representations, which work well on distilling discrimination knowledge, but poorly on learning generalization knowledge. To address this problem, this paper proposes a Delta-learning approach to distill discrimination and generalization knowledge by effectively decoupling, incrementally learning and adaptively fusing event representation. Experiments show that our method significantly outperforms previous approaches on unseen/sparse trigger words, and achieves state-of-the-art performance on both ACE2005 and KBP2017 datasets.
Tasks Representation Learning
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1429/
PDF https://www.aclweb.org/anthology/P19-1429
PWC https://paperswithcode.com/paper/distilling-discrimination-and-generalization
Repo https://github.com/luyaojie/delta-learning-for-ed
Framework pytorch

A Self-Training Approach for Short Text Clustering

Title A Self-Training Approach for Short Text Clustering
Authors Amir Hadifar, Lucas Sterckx, Thomas Demeester, Chris Develder
Abstract Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
Tasks Sentence Embedding, Text Clustering
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4322/
PDF https://www.aclweb.org/anthology/W19-4322
PWC https://paperswithcode.com/paper/a-self-training-approach-for-short-text
Repo https://github.com/hadifar/stc_clustering
Framework none

Octave Deep Plane-Sweeping Network: Reducing Spatial Redundancy for Learning-Based Plane-Sweeping Stereo

Title Octave Deep Plane-Sweeping Network: Reducing Spatial Redundancy for Learning-Based Plane-Sweeping Stereo
Authors R. Komatsu, H. Fujii, Y. Tamura, A. Yamashita, H. Asama
Abstract In this paper, we propose the octave deep plane-sweeping network (OctDPSNet). OctDPSNet is a novel learning-based plane-sweeping stereo, which drastically reduces the required GPU memory and computation time while achieving a state-of-the-art depth estimation accuracy. Inspired by octave convolution, we divide image features into high and low spatial frequency features, and two cost volumes are generated from these using our proposed plane-sweeping module. To reduce spatial redundancy, the resolution of the cost volume from the low spatial frequency features is set to half that of the high spatial frequency features, which enables the memory consumption and computational cost to be reduced. After refinement, the two cost volumes are integrated into a final cost volume through our proposed pixel-wise “squeeze-and-excitation” based attention mechanism, and the depth maps are estimated from the final cost volume. We evaluate the proposed model on five datasets: SUN3D, RGB-D SLAM, MVS, Scenes11, and ETH3D. Our model outperforms previous methods on five datasets while drastically reducing the memory consumption and computational cost. Our source code is available at https://github.com/matsuren/octDPSNet.
Tasks Depth Estimation, Stereo Depth Estimation
Published 2019-10-14
URL https://ieeexplore.ieee.org/document/8867874
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8867874
PWC https://paperswithcode.com/paper/octave-deep-plane-sweeping-network-reducing
Repo https://github.com/matsuren/octDPSNet
Framework pytorch

Effective Adversarial Regularization for Neural Machine Translation

Title Effective Adversarial Regularization for Neural Machine Translation
Authors Motoki Sato, Jun Suzuki, Shun Kiyono
Abstract A regularization technique based on adversarial perturbation, which was initially developed in the field of image processing, has been successfully applied to text classification tasks and has yielded attractive improvements. We aim to further leverage this promising methodology into more sophisticated and critical neural models in the natural language processing field, i.e., neural machine translation (NMT) models. However, it is not trivial to apply this methodology to such models. Thus, this paper investigates the effectiveness of several possible configurations of applying the adversarial perturbation and reveals that the adversarial regularization technique can significantly and consistently improve the performance of widely used NMT models, such as LSTM-based and Transformer-based models.
Tasks Machine Translation, Text Classification
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1020/
PDF https://www.aclweb.org/anthology/P19-1020
PWC https://paperswithcode.com/paper/effective-adversarial-regularization-for
Repo https://github.com/pfnet-research/vat_nmt
Framework none

Difference of Convolution for Deep Compressive Sensing

Title Difference of Convolution for Deep Compressive Sensing
Authors Canh, Thuong Nguyen; Jeon, Byeungwoo
Abstract Deep learning-based compressive sensing (DCS) has improved the compressive sensing (CS) with fast and high reconstruction quality. Researchers have further extended it to multi-scale DCS which improves reconstruction quality based on Wavelet decomposition. In this work, we mimic the Difference of Gaussian via convolution and propose a scheme named as Difference of convolution-based multi-scale DCS (DoC-DCS). Unlike the multi-scale DCS based on a well-designed filter in wavelet domain, the proposed DoC-DCS learns decomposition, thereby, outperforms other state-of-the-art compressive sensing methods.
Tasks Compressive Sensing
Published 2019-09-22
URL https://github.com/AtenaKid/DoC-DCS
PDF https://github.com/AtenaKid/DoC-DCS
PWC https://paperswithcode.com/paper/difference-of-convolution-for-deep
Repo https://github.com/AtenaKid/DoC-DCS
Framework none

Automated characterization of noise distributions in diffusion MRI data

Title Automated characterization of noise distributions in diffusion MRI data
Authors Samuel St-Jean, Alberto De Luca, Chantal M. W. Tax, Max A. Viergever, Alexander Leemans
Abstract Purpose: To understand and characterize noise distributions in parallel imaging for diffusion MRI. Theory and Methods: Two new automated methods using the moments and the maximum likelihood equations of the Gamma distribution were developed. Simulations using stationary and spatially varying noncentral chi noise distributions were created for two diffusion weightings with SENSE or GRAPPA reconstruction and 8, 12 or 32 receiver coils. Furthermore, MRI data of a water phantom with different combinations of multiband and SENSE acceleration were acquired on a 3T scanner along with noise-only measurements. Finally, an in vivo dataset was acquired at 3T using multiband acceleration and GRAPPA reconstruction. Estimation of the noise distribution was performed with the proposed methods and compared with 3 other existing algorithms. Results: Simulations showed that assuming a Rician distribution can lead to misestimation in parallel imaging. Results on the acquired datasets showed that signal leakage in multiband can lead to a misestimation of the parameters. Noise maps are robust to these artifacts, but may misestimate parameters in some cases. The proposed algorithms herein can estimate both parameters of the noise distribution, are robust to signal leakage artifacts and perform best when used on acquired noise maps. Conclusion: Misestimation of the correct noise distribution can hamper further processing such as bias correction and denoising, especially when the measured distribution differs too much from the actual signal distribution e.g., due to artifacts. The use of noise maps can yield more robust estimates than the use of diffusion weighted images as input for algorithms.
Tasks Denoising
Published 2019-06-28
URL https://arxiv.org/abs/1906.12121
PDF https://arxiv.org/pdf/1906.12121.pdf
PWC https://paperswithcode.com/paper/automated-characterization-of-noise
Repo https://github.com/samuelstjean/autodmri
Framework none

R2D2: Reliable and Repeatable Detector and Descriptor

Title R2D2: Reliable and Repeatable Detector and Descriptor
Authors Jerome Revaud, Cesar De Souza, Martin Humenberger, Philippe Weinzaepfel
Abstract Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical approaches are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection or learning descriptors at the detected keypoint locations. In this work, we argue that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows to avoid ambiguous areas, thus leading to reliable keypoint detection and description. Our detection-and-description approach simultaneously outputs sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset and on the recent Aachen Day-Night localization benchmark.
Tasks Atari Games, Interest Point Detection, Keypoint Detection, Metric Learning
Published 2019-12-01
URL http://papers.nips.cc/paper/9407-r2d2-reliable-and-repeatable-detector-and-descriptor
PDF http://papers.nips.cc/paper/9407-r2d2-reliable-and-repeatable-detector-and-descriptor.pdf
PWC https://paperswithcode.com/paper/r2d2-reliable-and-repeatable-detector-and
Repo https://github.com/naver/r2d2
Framework pytorch

Sampling Networks and Aggregate Simulation for Online POMDP Planning

Title Sampling Networks and Aggregate Simulation for Online POMDP Planning
Authors Hao(Jackson) Cui, Roni Khardon
Abstract The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space. The graph captures the result of simulating the model in aggregate under independence assumptions, giving a symbolic representation of the value function. The algorithm supports large observation spaces using sampling networks, a representation of the process of sampling values of observations, which is integrated into the graph representation. Following previous work in MDPs this approach enables action selection in POMDPs through gradient optimization over the graph representation. This approach complements recent algorithms for POMDPs which are based on particle representations of belief states and an explicit search for action selection. Our approach enables scaling to large factored action spaces in addition to large state spaces and observation spaces. An experimental evaluation demonstrates that the algorithm provides excellent performance relative to state of the art in large POMDP problems.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9121-sampling-networks-and-aggregate-simulation-for-online-pomdp-planning
PDF http://papers.nips.cc/paper/9121-sampling-networks-and-aggregate-simulation-for-online-pomdp-planning.pdf
PWC https://paperswithcode.com/paper/sampling-networks-and-aggregate-simulation
Repo https://github.com/hcui01/SNAP
Framework none

Park: An Open Platform for Learning-Augmented Computer Systems

Title Park: An Open Platform for Learning-Augmented Computer Systems
Authors Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh
Abstract We present Park, a platform for researchers to experiment with Reinforcement Learning (RL) for computer systems. Using RL for improving the performance of systems has a lot of potential, but is also in many ways very different from, for example, using RL for games. Thus, in this work we first discuss the unique challenges RL for systems has, and then propose Park an open extensible platform, which makes it easier for ML researchers to work on systems problems. Currently, Park consists of 12 real world system-centric optimization problems with one common easy to use interface. Finally, we present the performance of existing RL approaches over those 12 problems and outline potential areas of future work.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8519-park-an-open-platform-for-learning-augmented-computer-systems
PDF http://papers.nips.cc/paper/8519-park-an-open-platform-for-learning-augmented-computer-systems.pdf
PWC https://paperswithcode.com/paper/park-an-open-platform-for-learning-augmented
Repo https://github.com/park-project/park
Framework tf
comments powered by Disqus