October 20, 2019

3215 words 16 mins read

Paper Group AWR 208

Attentive Neural Network for Named Entity Recognition in Vietnamese. Disentangling Disentanglement in Variational Autoencoders. The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models. Implicit Argument Prediction with Event Knowledge. Intel nGraph: An Intermediate Representation, Compiler, and Executor fo …

Attentive Neural Network for Named Entity Recognition in Vietnamese


Title	Attentive Neural Network for Named Entity Recognition in Vietnamese
Authors	Kim Anh Nguyen, Ngan Dong, Cam-Tu Nguyen
Abstract	We propose an attentive neural network for the task of named entity recognition in Vietnamese. The proposed attentive neural model makes use of character-based language models and word embeddings to encode words as vector representations. A neural network architecture of encoder, attention, and decoder layers is then utilized to encode knowledge of input sentences and to label entity tags. The experimental results show that the proposed attentive neural network achieves the state-of-the-art results on the benchmark named entity recognition datasets in Vietnamese in comparison to both hand-crafted features based models and neural models.
Tasks	Named Entity Recognition, Named Entity Recognition In Vietnamese, Word Embeddings
Published	2018-10-31
URL	https://arxiv.org/abs/1810.13097v2
PDF	https://arxiv.org/pdf/1810.13097v2.pdf
PWC	https://paperswithcode.com/paper/attentive-neural-network-for-named-entity
Repo	https://github.com/minhpqn/vietner
Framework	none

Disentangling Disentanglement in Variational Autoencoders


Title	Disentangling Disentanglement in Variational Autoencoders
Authors	Emile Mathieu, Tom Rainforth, N. Siddharth, Yee Whye Teh
Abstract	We develop a generalisation of disentanglement in VAEs—decomposition of the latent representation—characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior. Decomposition permits disentanglement, i.e. explicit independence between latents, as a special case, but also allows for a much richer class of properties to be imposed on the learnt representation, such as sparsity, clustering, independent subspaces, or even intricate hierarchical dependency relationships. We show that the $\beta$-VAE varies from the standard VAE predominantly in its control of latent overlap and that for the standard choice of an isotropic Gaussian prior, its objective is invariant to rotations of the latent representation. Viewed from the decomposition perspective, breaking this invariance with simple manipulations of the prior can yield better disentanglement with little or no detriment to reconstructions. We further demonstrate how other choices of prior can assist in producing different decompositions and introduce an alternative training objective that allows the control of both decomposition factors in a principled manner.
Tasks
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02833v3
PDF	https://arxiv.org/pdf/1812.02833v3.pdf
PWC	https://paperswithcode.com/paper/disentangling-disentanglement-in-variational
Repo	https://github.com/iffsid/disentangling-disentanglement
Framework	pytorch

The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models


Title	The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models
Authors	Shengjia Zhao, Jiaming Song, Stefano Ermon
Abstract	A large number of objectives have been proposed to train latent variable generative models. We show that many of them are Lagrangian dual functions of the same primal optimization problem. The primal problem optimizes the mutual information between latent and visible variables, subject to the constraints of accurately modeling the data distribution and performing correct amortized inference. By choosing to maximize or minimize mutual information, and choosing different Lagrange multipliers, we obtain different objectives including InfoGAN, ALI/BiGAN, ALICE, CycleGAN, beta-VAE, adversarial autoencoders, AVB, AS-VAE and InfoVAE. Based on this observation, we provide an exhaustive characterization of the statistical and computational trade-offs made by all the training objectives in this class of Lagrangian duals. Next, we propose a dual optimization method where we optimize model parameters as well as the Lagrange multipliers. This method achieves Pareto optimal solutions in terms of optimizing information and satisfying the constraints.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06514v2
PDF	http://arxiv.org/pdf/1806.06514v2.pdf
PWC	https://paperswithcode.com/paper/the-information-autoencoding-family-a
Repo	https://github.com/ermongroup/lagvae
Framework	tf

Implicit Argument Prediction with Event Knowledge


Title	Implicit Argument Prediction with Event Knowledge
Authors	Pengxiang Cheng, Katrin Erk
Abstract	Implicit arguments are not syntactically connected to their predicates, and are therefore hard to extract. Previous work has used models with large numbers of features, evaluated on very small datasets. We propose to train models for implicit argument prediction on a simple cloze task, for which data can be generated automatically at scale. This allows us to use a neural model, which draws on narrative coherence and entity salience for predictions. We show that our model has superior performance on both synthetic and natural data.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07226v2
PDF	http://arxiv.org/pdf/1802.07226v2.pdf
PWC	https://paperswithcode.com/paper/implicit-argument-prediction-with-event
Repo	https://github.com/pxch/event_imp_arg
Framework	none

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning


Title	Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
Authors	Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb
Abstract	The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call “direct optimization”, requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations).
Tasks	graph partitioning
Published	2018-01-24
URL	http://arxiv.org/abs/1801.08058v2
PDF	http://arxiv.org/pdf/1801.08058v2.pdf
PWC	https://paperswithcode.com/paper/intel-ngraph-an-intermediate-representation
Repo	https://github.com/NervanaSystems/ngraph-python
Framework	none

Overcoming Catastrophic Forgetting by Soft Parameter Pruning


Title	Overcoming Catastrophic Forgetting by Soft Parameter Pruning
Authors	Jian Peng, Jiang Hao, Zhuo Li, Enqiang Guo, Xiaohong Wan, Deng Min, Qing Zhu, Haifeng Li
Abstract	Catastrophic forgetting is a challenge issue in continual learning when a deep neural network forgets the knowledge acquired from the former task after learning on subsequent tasks. However, existing methods try to find the joint distribution of parameters shared with all tasks. This idea can be questionable because this joint distribution may not present when the number of tasks increase. On the other hand, It also leads to “long-term” memory issue when the network capacity is limited since adding tasks will “eat” the network capacity. In this paper, we proposed a Soft Parameters Pruning (SPP) strategy to reach the trade-off between short-term and long-term profit of a learning model by freeing those parameters less contributing to remember former task domain knowledge to learn future tasks, and preserving memories about previous tasks via those parameters effectively encoding knowledge about tasks at the same time. The SPP also measures the importance of parameters by information entropy in a label free manner. The experiments on several tasks shows SPP model achieved the best performance compared with others state-of-the-art methods. Experiment results also indicate that our method is less sensitive to hyper-parameter and better generalization. Our research suggests that a softer strategy, i.e. approximate optimize or sub-optimal solution, will benefit alleviating the dilemma of memory. The source codes are available at https://github.com/lehaifeng/Learning_by_memory.
Tasks	Continual Learning
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01640v1
PDF	http://arxiv.org/pdf/1812.01640v1.pdf
PWC	https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-by-soft
Repo	https://github.com/lehaifeng/Learning_by_memory
Framework	tf

Gaining Free or Low-Cost Transparency with Interpretable Partial Substitute


Title	Gaining Free or Low-Cost Transparency with Interpretable Partial Substitute
Authors	Tong Wang
Abstract	This work addresses the situation where a black-box model with good predictive performance is chosen over its interpretable competitors, and we show interpretability is still achievable in this case. Our solution is to find an interpretable substitute on a subset of data where the black-box model is overkill or nearly overkill while leaving the rest to the black-box. This transparency is obtained at minimal cost or no cost of the predictive performance. Under this framework, we develop a Hybrid Rule Sets (HyRS) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided. To train a HyRS, we devise an efficient search algorithm that iteratively finds the optimal model and exploits theoretically grounded strategies to reduce computation. Our framework is agnostic to the black-box during training. Experiments on structured and text data show that HyRS obtains an effective trade-off between transparency and interpretability.
Tasks	Decision Making, Interpretable Machine Learning
Published	2018-02-12
URL	https://arxiv.org/abs/1802.04346v2
PDF	https://arxiv.org/pdf/1802.04346v2.pdf
PWC	https://paperswithcode.com/paper/hybrid-decision-making-when-interpretable
Repo	https://github.com/wangtongada/HyRS
Framework	none

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks


Title	ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Authors	Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang
Abstract	The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN .
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-09-01
URL	http://arxiv.org/abs/1809.00219v2
PDF	http://arxiv.org/pdf/1809.00219v2.pdf
PWC	https://paperswithcode.com/paper/esrgan-enhanced-super-resolution-generative
Repo	https://github.com/xinntao/ESRGAN
Framework	pytorch

A Survey on Deep Learning for Named Entity Recognition


Title	A Survey on Deep Learning for Named Entity Recognition
Authors	Jing Li, Aixin Sun, Jianglei Han, Chenliang Li
Abstract	Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation. Early NER systems got a huge success in achieving good performance with the cost of human engineering in designing domain-specific features and rules. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.
Tasks	Machine Translation, Named Entity Recognition, Question Answering, Semantic Composition, Text Summarization
Published	2018-12-22
URL	https://arxiv.org/abs/1812.09449v3
PDF	https://arxiv.org/pdf/1812.09449v3.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-deep-learning-for-named-entity
Repo	https://github.com/DA-southampton/ner
Framework	none

LF-Net: Learning Local Features from Images


Title	LF-Net: Learning Local Features from Images
Authors	Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Moo Yi
Abstract	We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets, while running at 60+ fps for QVGA images.
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09662v2
PDF	http://arxiv.org/pdf/1805.09662v2.pdf
PWC	https://paperswithcode.com/paper/lf-net-learning-local-features-from-images
Repo	https://github.com/vcg-uvic/lf-net-release
Framework	tf

Learning short-term past as predictor of human behavior in commercial buildings


Title	Learning short-term past as predictor of human behavior in commercial buildings
Authors	Romana Markovic, Jérôme Frisch, Christoph van Treeck
Abstract	This paper addresses the question of identifying the time-window in short-term past from which the information regarding the future occupant’s window opening actions and resulting window states in buildings can be predicted. The addressed sequence duration was in the range between 30 and 240 time-steps of indoor climate data, where the applied temporal discretization was one minute. For that purpose, a deep neural network is trained to predict the window states, where the input sequence duration is handled as an additional hyperparameter. Eventually, the relationship between the prediction accuracy and the time-lag of the predicted window state in future is analyzed. The results pointed out, that the optimal predictive performance was achieved for the case where 60 time-steps of the indoor climate data were used as input. Additionally, the results showed that very long sequences (120-240 time-steps) could be addressed efficiently, given the right hyperprameters. Hence, the use of the memory over previous hours of high-resolution indoor climate data did not improve the predictive performance, when compared to the case where 30/60 minutes indoor sequences were used. The analysis of the prediction accuracy in the form of F1 score for the different time-lag of future window states dropped from 0.51 to 0.27, when shifting the prediction target from 10 to 60 minutes in future.
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.10020v1
PDF	http://arxiv.org/pdf/1809.10020v1.pdf
PWC	https://paperswithcode.com/paper/learning-short-term-past-as-predictor-of
Repo	https://github.com/littlejiao/figureextract
Framework	none

Quality Diversity Through Surprise


Title	Quality Diversity Through Surprise
Authors	Daniele Gravina, Antonios Liapis, Georgios N. Yannakakis
Abstract	Quality diversity is a recent family of evolutionary search algorithms which focus on finding several well-performing (quality) yet different (diversity) solutions with the aim to maintain an appropriate balance between divergence and convergence during search. While quality diversity has already delivered promising results in complex problems, the capacity of divergent search variants for quality diversity remains largely unexplored. Inspired by the notion of surprise as an effective driver of divergent search and its orthogonal nature to novelty this paper investigates the impact of the former to quality diversity performance. For that purpose we introduce three new quality diversity algorithms which employ surprise as a diversity measure, either on its own or combined with novelty, and compare their performance against novelty search with local competition, the state of the art quality diversity algorithm. The algorithms are tested in a robot navigation task across 60 highly deceptive mazes. Our findings suggest that allowing surprise and novelty to operate synergistically for divergence and in combination with local competition leads to quality diversity algorithms of significantly higher efficiency, speed and robustness.
Tasks	Robot Navigation
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02397v4
PDF	http://arxiv.org/pdf/1807.02397v4.pdf
PWC	https://paperswithcode.com/paper/quality-diversity-through-surprise
Repo	https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework	none

Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions


Title	Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions
Authors	Denis Gudovskiy, Alec Hodgkinson, Takuya Yamaguchi, Yasunori Ishii, Sotaro Tsukizawa
Abstract	Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values. We qualitatively and quantitatively show that the proposed explanation method can be used to find image features which cause failures in DNN object detection. The developed software tool combined into the “Explain to Fix” (E2X) framework has a factor of 10 higher computational efficiency than prior methods and can be used for cluster processing using graphics processing units (GPUs). Lastly, we propose a potential extension of the E2X framework where the discovered missing features can be added into training dataset to overcome failures after model retraining.
Tasks	Feature Importance, Object Detection
Published	2018-11-19
URL	http://arxiv.org/abs/1811.08011v1
PDF	http://arxiv.org/pdf/1811.08011v1.pdf
PWC	https://paperswithcode.com/paper/explain-to-fix-a-framework-to-interpret-and
Repo	https://github.com/gudovskiy/e2x
Framework	none

Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network


Title	Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network
Authors	Qiang Zhang, Qiangqiang Yuan, Chao Zeng, Xinghua Li, Yancong Wei
Abstract	Because of the internal malfunction of satellite sensors and poor atmospheric conditions such as thick cloud, the acquired remote sensing data often suffer from missing information, i.e., the data usability is greatly reduced. In this paper, a novel method of missing information reconstruction in remote sensing images is proposed. The unified spatial-temporal-spectral framework based on a deep convolutional neural network (STS-CNN) employs a unified deep convolutional neural network combined with spatial-temporal-spectral supplementary information. In addition, to address the fact that most methods can only deal with a single missing information reconstruction task, the proposed approach can solve three typical missing information reconstruction tasks: 1) dead lines in Aqua MODIS band 6; 2) the Landsat ETM+ Scan Line Corrector (SLC)-off problem; and 3) thick cloud removal. It should be noted that the proposed model can use multi-source data (spatial, spectral, and temporal) as the input of the unified framework. The results of both simulated and real-data experiments demonstrate that the proposed model exhibits high effectiveness in the three missing information reconstruction tasks listed above.
Tasks
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08369v1
PDF	http://arxiv.org/pdf/1802.08369v1.pdf
PWC	https://paperswithcode.com/paper/missing-data-reconstruction-in-remote-sensing
Repo	https://github.com/WHUQZhang/STS-CNN
Framework	none

Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network


Title	Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network
Authors	Qiangqiang Yuan, Qiang Zhang, Jie Li, Huanfeng Shen, Liangpei Zhang
Abstract	Hyperspectral image (HSI) denoising is a crucial preprocessing procedure to improve the performance of the subsequent HSI interpretation and applications. In this paper, a novel deep learning-based method for this task is proposed, by learning a non-linear end-to-end mapping between the noisy and clean HSIs with a combined spatial-spectral deep convolutional neural network (HSID-CNN). Both the spatial and spectral information are simultaneously assigned to the proposed network. In addition, multi-scale feature extraction and multi-level feature representation are respectively employed to capture both the multi-scale spatial-spectral feature and fuse the feature representations with different levels for the final restoration. The simulated and real-data experiments demonstrate that the proposed HSID-CNN outperforms many of the mainstream methods in both the quantitative evaluation indexes, visual effects, and HSI classification accuracy.
Tasks	Denoising, Image Denoising
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00183v3
PDF	http://arxiv.org/pdf/1806.00183v3.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-image-denoising-employing-a
Repo	https://github.com/WHUQZhang/HSID-CNN
Framework	none