Paper Group AWR 208
Attentive Neural Network for Named Entity Recognition in Vietnamese. Disentangling Disentanglement in Variational Autoencoders. The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models. Implicit Argument Prediction with Event Knowledge. Intel nGraph: An Intermediate Representation, Compiler, and Executor fo …
Attentive Neural Network for Named Entity Recognition in Vietnamese
Title | Attentive Neural Network for Named Entity Recognition in Vietnamese |
Authors | Kim Anh Nguyen, Ngan Dong, Cam-Tu Nguyen |
Abstract | We propose an attentive neural network for the task of named entity recognition in Vietnamese. The proposed attentive neural model makes use of character-based language models and word embeddings to encode words as vector representations. A neural network architecture of encoder, attention, and decoder layers is then utilized to encode knowledge of input sentences and to label entity tags. The experimental results show that the proposed attentive neural network achieves the state-of-the-art results on the benchmark named entity recognition datasets in Vietnamese in comparison to both hand-crafted features based models and neural models. |
Tasks | Named Entity Recognition, Named Entity Recognition In Vietnamese, Word Embeddings |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1810.13097v2 |
https://arxiv.org/pdf/1810.13097v2.pdf | |
PWC | https://paperswithcode.com/paper/attentive-neural-network-for-named-entity |
Repo | https://github.com/minhpqn/vietner |
Framework | none |
Disentangling Disentanglement in Variational Autoencoders
Title | Disentangling Disentanglement in Variational Autoencoders |
Authors | Emile Mathieu, Tom Rainforth, N. Siddharth, Yee Whye Teh |
Abstract | We develop a generalisation of disentanglement in VAEs—decomposition of the latent representation—characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior. Decomposition permits disentanglement, i.e. explicit independence between latents, as a special case, but also allows for a much richer class of properties to be imposed on the learnt representation, such as sparsity, clustering, independent subspaces, or even intricate hierarchical dependency relationships. We show that the $\beta$-VAE varies from the standard VAE predominantly in its control of latent overlap and that for the standard choice of an isotropic Gaussian prior, its objective is invariant to rotations of the latent representation. Viewed from the decomposition perspective, breaking this invariance with simple manipulations of the prior can yield better disentanglement with little or no detriment to reconstructions. We further demonstrate how other choices of prior can assist in producing different decompositions and introduce an alternative training objective that allows the control of both decomposition factors in a principled manner. |
Tasks | |
Published | 2018-12-06 |
URL | https://arxiv.org/abs/1812.02833v3 |
https://arxiv.org/pdf/1812.02833v3.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-disentanglement-in-variational |
Repo | https://github.com/iffsid/disentangling-disentanglement |
Framework | pytorch |
The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models
Title | The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models |
Authors | Shengjia Zhao, Jiaming Song, Stefano Ermon |
Abstract | A large number of objectives have been proposed to train latent variable generative models. We show that many of them are Lagrangian dual functions of the same primal optimization problem. The primal problem optimizes the mutual information between latent and visible variables, subject to the constraints of accurately modeling the data distribution and performing correct amortized inference. By choosing to maximize or minimize mutual information, and choosing different Lagrange multipliers, we obtain different objectives including InfoGAN, ALI/BiGAN, ALICE, CycleGAN, beta-VAE, adversarial autoencoders, AVB, AS-VAE and InfoVAE. Based on this observation, we provide an exhaustive characterization of the statistical and computational trade-offs made by all the training objectives in this class of Lagrangian duals. Next, we propose a dual optimization method where we optimize model parameters as well as the Lagrange multipliers. This method achieves Pareto optimal solutions in terms of optimizing information and satisfying the constraints. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06514v2 |
http://arxiv.org/pdf/1806.06514v2.pdf | |
PWC | https://paperswithcode.com/paper/the-information-autoencoding-family-a |
Repo | https://github.com/ermongroup/lagvae |
Framework | tf |
Implicit Argument Prediction with Event Knowledge
Title | Implicit Argument Prediction with Event Knowledge |
Authors | Pengxiang Cheng, Katrin Erk |
Abstract | Implicit arguments are not syntactically connected to their predicates, and are therefore hard to extract. Previous work has used models with large numbers of features, evaluated on very small datasets. We propose to train models for implicit argument prediction on a simple cloze task, for which data can be generated automatically at scale. This allows us to use a neural model, which draws on narrative coherence and entity salience for predictions. We show that our model has superior performance on both synthetic and natural data. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07226v2 |
http://arxiv.org/pdf/1802.07226v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-argument-prediction-with-event |
Repo | https://github.com/pxch/event_imp_arg |
Framework | none |
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
Title | Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning |
Authors | Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb |
Abstract | The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call “direct optimization”, requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations). |
Tasks | graph partitioning |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08058v2 |
http://arxiv.org/pdf/1801.08058v2.pdf | |
PWC | https://paperswithcode.com/paper/intel-ngraph-an-intermediate-representation |
Repo | https://github.com/NervanaSystems/ngraph-python |
Framework | none |
Overcoming Catastrophic Forgetting by Soft Parameter Pruning
Title | Overcoming Catastrophic Forgetting by Soft Parameter Pruning |
Authors | Jian Peng, Jiang Hao, Zhuo Li, Enqiang Guo, Xiaohong Wan, Deng Min, Qing Zhu, Haifeng Li |
Abstract | Catastrophic forgetting is a challenge issue in continual learning when a deep neural network forgets the knowledge acquired from the former task after learning on subsequent tasks. However, existing methods try to find the joint distribution of parameters shared with all tasks. This idea can be questionable because this joint distribution may not present when the number of tasks increase. On the other hand, It also leads to “long-term” memory issue when the network capacity is limited since adding tasks will “eat” the network capacity. In this paper, we proposed a Soft Parameters Pruning (SPP) strategy to reach the trade-off between short-term and long-term profit of a learning model by freeing those parameters less contributing to remember former task domain knowledge to learn future tasks, and preserving memories about previous tasks via those parameters effectively encoding knowledge about tasks at the same time. The SPP also measures the importance of parameters by information entropy in a label free manner. The experiments on several tasks shows SPP model achieved the best performance compared with others state-of-the-art methods. Experiment results also indicate that our method is less sensitive to hyper-parameter and better generalization. Our research suggests that a softer strategy, i.e. approximate optimize or sub-optimal solution, will benefit alleviating the dilemma of memory. The source codes are available at https://github.com/lehaifeng/Learning_by_memory. |
Tasks | Continual Learning |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01640v1 |
http://arxiv.org/pdf/1812.01640v1.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-catastrophic-forgetting-by-soft |
Repo | https://github.com/lehaifeng/Learning_by_memory |
Framework | tf |
Gaining Free or Low-Cost Transparency with Interpretable Partial Substitute
Title | Gaining Free or Low-Cost Transparency with Interpretable Partial Substitute |
Authors | Tong Wang |
Abstract | This work addresses the situation where a black-box model with good predictive performance is chosen over its interpretable competitors, and we show interpretability is still achievable in this case. Our solution is to find an interpretable substitute on a subset of data where the black-box model is overkill or nearly overkill while leaving the rest to the black-box. This transparency is obtained at minimal cost or no cost of the predictive performance. Under this framework, we develop a Hybrid Rule Sets (HyRS) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided. To train a HyRS, we devise an efficient search algorithm that iteratively finds the optimal model and exploits theoretically grounded strategies to reduce computation. Our framework is agnostic to the black-box during training. Experiments on structured and text data show that HyRS obtains an effective trade-off between transparency and interpretability. |
Tasks | Decision Making, Interpretable Machine Learning |
Published | 2018-02-12 |
URL | https://arxiv.org/abs/1802.04346v2 |
https://arxiv.org/pdf/1802.04346v2.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-decision-making-when-interpretable |
Repo | https://github.com/wangtongada/HyRS |
Framework | none |
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Title | ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks |
Authors | Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang |
Abstract | The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN . |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-09-01 |
URL | http://arxiv.org/abs/1809.00219v2 |
http://arxiv.org/pdf/1809.00219v2.pdf | |
PWC | https://paperswithcode.com/paper/esrgan-enhanced-super-resolution-generative |
Repo | https://github.com/xinntao/ESRGAN |
Framework | pytorch |
A Survey on Deep Learning for Named Entity Recognition
Title | A Survey on Deep Learning for Named Entity Recognition |
Authors | Jing Li, Aixin Sun, Jianglei Han, Chenliang Li |
Abstract | Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation. Early NER systems got a huge success in achieving good performance with the cost of human engineering in designing domain-specific features and rules. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area. |
Tasks | Machine Translation, Named Entity Recognition, Question Answering, Semantic Composition, Text Summarization |
Published | 2018-12-22 |
URL | https://arxiv.org/abs/1812.09449v3 |
https://arxiv.org/pdf/1812.09449v3.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-deep-learning-for-named-entity |
Repo | https://github.com/DA-southampton/ner |
Framework | none |
LF-Net: Learning Local Features from Images
Title | LF-Net: Learning Local Features from Images |
Authors | Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Moo Yi |
Abstract | We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets, while running at 60+ fps for QVGA images. |
Tasks | |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09662v2 |
http://arxiv.org/pdf/1805.09662v2.pdf | |
PWC | https://paperswithcode.com/paper/lf-net-learning-local-features-from-images |
Repo | https://github.com/vcg-uvic/lf-net-release |
Framework | tf |
Learning short-term past as predictor of human behavior in commercial buildings
Title | Learning short-term past as predictor of human behavior in commercial buildings |
Authors | Romana Markovic, Jérôme Frisch, Christoph van Treeck |
Abstract | This paper addresses the question of identifying the time-window in short-term past from which the information regarding the future occupant’s window opening actions and resulting window states in buildings can be predicted. The addressed sequence duration was in the range between 30 and 240 time-steps of indoor climate data, where the applied temporal discretization was one minute. For that purpose, a deep neural network is trained to predict the window states, where the input sequence duration is handled as an additional hyperparameter. Eventually, the relationship between the prediction accuracy and the time-lag of the predicted window state in future is analyzed. The results pointed out, that the optimal predictive performance was achieved for the case where 60 time-steps of the indoor climate data were used as input. Additionally, the results showed that very long sequences (120-240 time-steps) could be addressed efficiently, given the right hyperprameters. Hence, the use of the memory over previous hours of high-resolution indoor climate data did not improve the predictive performance, when compared to the case where 30/60 minutes indoor sequences were used. The analysis of the prediction accuracy in the form of F1 score for the different time-lag of future window states dropped from 0.51 to 0.27, when shifting the prediction target from 10 to 60 minutes in future. |
Tasks | |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.10020v1 |
http://arxiv.org/pdf/1809.10020v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-short-term-past-as-predictor-of |
Repo | https://github.com/littlejiao/figureextract |
Framework | none |
Quality Diversity Through Surprise
Title | Quality Diversity Through Surprise |
Authors | Daniele Gravina, Antonios Liapis, Georgios N. Yannakakis |
Abstract | Quality diversity is a recent family of evolutionary search algorithms which focus on finding several well-performing (quality) yet different (diversity) solutions with the aim to maintain an appropriate balance between divergence and convergence during search. While quality diversity has already delivered promising results in complex problems, the capacity of divergent search variants for quality diversity remains largely unexplored. Inspired by the notion of surprise as an effective driver of divergent search and its orthogonal nature to novelty this paper investigates the impact of the former to quality diversity performance. For that purpose we introduce three new quality diversity algorithms which employ surprise as a diversity measure, either on its own or combined with novelty, and compare their performance against novelty search with local competition, the state of the art quality diversity algorithm. The algorithms are tested in a robot navigation task across 60 highly deceptive mazes. Our findings suggest that allowing surprise and novelty to operate synergistically for divergence and in combination with local competition leads to quality diversity algorithms of significantly higher efficiency, speed and robustness. |
Tasks | Robot Navigation |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02397v4 |
http://arxiv.org/pdf/1807.02397v4.pdf | |
PWC | https://paperswithcode.com/paper/quality-diversity-through-surprise |
Repo | https://github.com/DanieleGravina/divergence-and-quality-diversity |
Framework | none |
Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions
Title | Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions |
Authors | Denis Gudovskiy, Alec Hodgkinson, Takuya Yamaguchi, Yasunori Ishii, Sotaro Tsukizawa |
Abstract | Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values. We qualitatively and quantitatively show that the proposed explanation method can be used to find image features which cause failures in DNN object detection. The developed software tool combined into the “Explain to Fix” (E2X) framework has a factor of 10 higher computational efficiency than prior methods and can be used for cluster processing using graphics processing units (GPUs). Lastly, we propose a potential extension of the E2X framework where the discovered missing features can be added into training dataset to overcome failures after model retraining. |
Tasks | Feature Importance, Object Detection |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.08011v1 |
http://arxiv.org/pdf/1811.08011v1.pdf | |
PWC | https://paperswithcode.com/paper/explain-to-fix-a-framework-to-interpret-and |
Repo | https://github.com/gudovskiy/e2x |
Framework | none |
Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network
Title | Missing Data Reconstruction in Remote Sensing image with a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network |
Authors | Qiang Zhang, Qiangqiang Yuan, Chao Zeng, Xinghua Li, Yancong Wei |
Abstract | Because of the internal malfunction of satellite sensors and poor atmospheric conditions such as thick cloud, the acquired remote sensing data often suffer from missing information, i.e., the data usability is greatly reduced. In this paper, a novel method of missing information reconstruction in remote sensing images is proposed. The unified spatial-temporal-spectral framework based on a deep convolutional neural network (STS-CNN) employs a unified deep convolutional neural network combined with spatial-temporal-spectral supplementary information. In addition, to address the fact that most methods can only deal with a single missing information reconstruction task, the proposed approach can solve three typical missing information reconstruction tasks: 1) dead lines in Aqua MODIS band 6; 2) the Landsat ETM+ Scan Line Corrector (SLC)-off problem; and 3) thick cloud removal. It should be noted that the proposed model can use multi-source data (spatial, spectral, and temporal) as the input of the unified framework. The results of both simulated and real-data experiments demonstrate that the proposed model exhibits high effectiveness in the three missing information reconstruction tasks listed above. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08369v1 |
http://arxiv.org/pdf/1802.08369v1.pdf | |
PWC | https://paperswithcode.com/paper/missing-data-reconstruction-in-remote-sensing |
Repo | https://github.com/WHUQZhang/STS-CNN |
Framework | none |
Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network
Title | Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network |
Authors | Qiangqiang Yuan, Qiang Zhang, Jie Li, Huanfeng Shen, Liangpei Zhang |
Abstract | Hyperspectral image (HSI) denoising is a crucial preprocessing procedure to improve the performance of the subsequent HSI interpretation and applications. In this paper, a novel deep learning-based method for this task is proposed, by learning a non-linear end-to-end mapping between the noisy and clean HSIs with a combined spatial-spectral deep convolutional neural network (HSID-CNN). Both the spatial and spectral information are simultaneously assigned to the proposed network. In addition, multi-scale feature extraction and multi-level feature representation are respectively employed to capture both the multi-scale spatial-spectral feature and fuse the feature representations with different levels for the final restoration. The simulated and real-data experiments demonstrate that the proposed HSID-CNN outperforms many of the mainstream methods in both the quantitative evaluation indexes, visual effects, and HSI classification accuracy. |
Tasks | Denoising, Image Denoising |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00183v3 |
http://arxiv.org/pdf/1806.00183v3.pdf | |
PWC | https://paperswithcode.com/paper/hyperspectral-image-denoising-employing-a |
Repo | https://github.com/WHUQZhang/HSID-CNN |
Framework | none |