May 7, 2019

3186 words 15 mins read

Paper Group AWR 34

Sequence-to-Sequence Learning as Beam-Search Optimization. Key-Value Memory Networks for Directly Reading Documents. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. Deep Feature Interpolation for Image Content Changes. Range Loss for Deep Face Recognition with Long-tail. To Frontalize or Not To Front …

Sequence-to-Sequence Learning as Beam-Search Optimization


Title	Sequence-to-Sequence Learning as Beam-Search Optimization
Authors	Sam Wiseman, Alexander M. Rush
Abstract	Sequence-to-Sequence (seq2seq) modeling has rapidly become an important general-purpose NLP tool that has proven effective for many text-generation and sequence-labeling tasks. Seq2seq builds on deep neural language modeling and inherits its remarkable accuracy in estimating local, next-word distributions. In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores. This structured approach avoids classical biases associated with local training and unifies the training loss with the test-time usage, while preserving the proven model architecture of seq2seq and its efficient training approach. We show that our system outperforms a highly-optimized attention-based seq2seq system and other baselines on three different sequence to sequence tasks: word ordering, parsing, and machine translation.
Tasks	Language Modelling, Machine Translation, Text Generation
Published	2016-06-09
URL	http://arxiv.org/abs/1606.02960v2
PDF	http://arxiv.org/pdf/1606.02960v2.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-learning-as-beam-search
Repo	https://github.com/juliakreutzer/joeynmt
Framework	pytorch

Key-Value Memory Networks for Directly Reading Documents


Title	Key-Value Memory Networks for Directly Reading Documents
Authors	Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, Jason Weston
Abstract	Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often suffer from being too restrictive, as the schema cannot support certain types of answers, and too sparse, e.g. Wikipedia contains much more information than Freebase. In this work we introduce a new method, Key-Value Memory Networks, that makes reading documents more viable by utilizing different encodings in the addressing and output stages of the memory read operation. To compare using KBs, information extraction or Wikipedia documents directly in a single framework we construct an analysis tool, WikiMovies, a QA dataset that contains raw text alongside a preprocessed KB, in the domain of movies. Our method reduces the gap between all three settings. It also achieves state-of-the-art results on the existing WikiQA benchmark.
Tasks	Question Answering
Published	2016-06-09
URL	http://arxiv.org/abs/1606.03126v2
PDF	http://arxiv.org/pdf/1606.03126v2.pdf
PWC	https://paperswithcode.com/paper/key-value-memory-networks-for-directly
Repo	https://github.com/jojonki/key-value-memory-networks
Framework	none

RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism


Title	RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
Authors	Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun
Abstract	Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.
Tasks	Disease Trajectory Forecasting
Published	2016-08-19
URL	http://arxiv.org/abs/1608.05745v4
PDF	http://arxiv.org/pdf/1608.05745v4.pdf
PWC	https://paperswithcode.com/paper/retain-an-interpretable-predictive-model-for
Repo	https://github.com/mp2893/retain
Framework	none

Deep Feature Interpolation for Image Content Changes


Title	Deep Feature Interpolation for Image Content Changes
Authors	Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, Kilian Weinberger
Abstract	We propose Deep Feature Interpolation (DFI), a new data-driven baseline for automatic high-resolution image transformation. As the name suggests, it relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like “make older/younger”, “make bespectacled”, “add smile”, among others, surprisingly well - sometimes even matching or outperforming the state-of-the-art. This is particularly unexpected as DFI requires no specialized network architecture or even any deep network to be trained for these tasks. DFI therefore can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging in the rise of deep learning.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05507v2
PDF	http://arxiv.org/pdf/1611.05507v2.pdf
PWC	https://paperswithcode.com/paper/deep-feature-interpolation-for-image-content
Repo	https://github.com/Berndinio/ORIU
Framework	pytorch

Range Loss for Deep Face Recognition with Long-tail


Title	Range Loss for Deep Face Recognition with Long-tail
Authors	Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao
Abstract	Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities. To train such a well-designed deep network, tremendous amounts of data is indispensable. Long tail distribution specifically refers to the fact that a small number of generic entities appear frequently while other objects far less existing. Considering the existence of long tail distribution of the real world data, large but uniform distributed data are usually hard to retrieve. Empirical experiences and analysis show that classes with more samples will pose greater impact on the feature learning process and inversely cripple the whole models feature extracting ability on tail part data. Contrary to most of the existing works that alleviate this problem by simply cutting the tailed data for uniform distributions across the classes, this paper proposes a new loss function called range loss to effectively utilize the whole long tailed data in training process. More specifically, range loss is designed to reduce overall intra-personal variations while enlarging inter-personal differences within one mini-batch simultaneously when facing even extremely unbalanced data. The optimization objective of range loss is the $k$ greatest range’s harmonic mean values in one class and the shortest inter-class distance within one batch. Extensive experiments on two famous and challenging face recognition benchmarks (Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) not only demonstrate the effectiveness of the proposed approach in overcoming the long tail effect but also show the good generalization ability of the proposed approach.
Tasks	Face Recognition
Published	2016-11-28
URL	http://arxiv.org/abs/1611.08976v1
PDF	http://arxiv.org/pdf/1611.08976v1.pdf
PWC	https://paperswithcode.com/paper/range-loss-for-deep-face-recognition-with
Repo	https://github.com/DongDem/RangeLoss_Marginal_Loss_Tensorflow
Framework	tf

To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-processing To Improve Face Recognition?


Title	To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-processing To Improve Face Recognition?
Authors	Sandipan Banerjee, Joel Brogan, Janez Krizaj, Aparna Bharati, Brandon RichardWebster, Vitomir Struc, Patrick Flynn, Walter Scheirer
Abstract	Face recognition performance has improved remarkably in the last decade. Much of this success can be attributed to the development of deep learning techniques such as convolutional neural networks (CNNs). While CNNs have pushed the state-of-the-art forward, their training process requires a large amount of clean and correctly labelled training data. If a CNN is intended to tolerate facial pose, then we face an important question: should this training data be diverse in its pose distribution, or should face images be normalized to a single pose in a pre-processing step? To address this question, we evaluate a number of popular facial landmarking and pose correction algorithms to understand their effect on facial recognition performance. Additionally, we introduce a new, automatic, single-image frontalization scheme that exceeds the performance of current algorithms. CNNs trained using sets of different pre-processing methods are used to extract features from the Point and Shoot Challenge (PaSC) and CMU Multi-PIE datasets. We assert that the subsequent verification and recognition performance serves to quantify the effectiveness of each pose correction scheme.
Tasks	Face Recognition
Published	2016-10-16
URL	http://arxiv.org/abs/1610.04823v4
PDF	http://arxiv.org/pdf/1610.04823v4.pdf
PWC	https://paperswithcode.com/paper/to-frontalize-or-not-to-frontalize-do-we
Repo	https://github.com/joelb92/ND_Frontalization_Project
Framework	none

MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes


Title	MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Authors	Ethan Rudd, Manuel Günther, Terrance Boult
Abstract	Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network.
Tasks	Face Recognition
Published	2016-03-22
URL	http://arxiv.org/abs/1603.07027v2
PDF	http://arxiv.org/pdf/1603.07027v2.pdf
PWC	https://paperswithcode.com/paper/moon-a-mixed-objective-optimization-network
Repo	https://github.com/feiyunzhang/person_attribute_mxnet
Framework	mxnet

A Dual Embedding Space Model for Document Ranking


Title	A Dual Embedding Space Model for Document Ranking
Authors	Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana
Abstract	A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
Tasks	Document Ranking, Word Embeddings
Published	2016-02-02
URL	http://arxiv.org/abs/1602.01137v1
PDF	http://arxiv.org/pdf/1602.01137v1.pdf
PWC	https://paperswithcode.com/paper/a-dual-embedding-space-model-for-document
Repo	https://github.com/jonanem/Loan
Framework	none

On Complex Valued Convolutional Neural Networks


Title	On Complex Valued Convolutional Neural Networks
Authors	Nitzan Guberman
Abstract	Convolutional neural networks (CNNs) are the cutting edge model for supervised machine learning in computer vision. In recent years CNNs have outperformed traditional approaches in many computer vision tasks such as object detection, image classification and face recognition. CNNs are vulnerable to overfitting, and a lot of research focuses on finding regularization methods to overcome it. One approach is designing task specific models based on prior knowledge. Several works have shown that properties of natural images can be easily captured using complex numbers. Motivated by these works, we present a variation of the CNN model with complex valued input and weights. We construct the complex model as a generalization of the real model. Lack of order over the complex field raises several difficulties both in the definition and in the training of the network. We address these issues and suggest possible solutions. The resulting model is shown to be a restricted form of a real valued CNN with twice the parameters. It is sensitive to phase structure, and we suggest it serves as a regularized model for problems where such structure is important. This suggestion is verified empirically by comparing the performance of a complex and a real network in the problem of cell detection. The two networks achieve comparable results, and although the complex model is hard to train, it is significantly less vulnerable to overfitting. We also demonstrate that the complex network detects meaningful phase structure in the data.
Tasks	Face Recognition, Image Classification, Object Detection
Published	2016-02-29
URL	http://arxiv.org/abs/1602.09046v1
PDF	http://arxiv.org/pdf/1602.09046v1.pdf
PWC	https://paperswithcode.com/paper/on-complex-valued-convolutional-neural
Repo	https://github.com/Doyosae/Deep_Complex_Networks
Framework	none

Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation


Title	Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
Authors	Alexander Kolesnikov, Christoph H. Lampert
Abstract	We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries. We show experimentally that training a deep convolutional neural network using the proposed loss function leads to substantially better segmentations than previous state-of-the-art methods on the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the working mechanism of our method by a detailed experimental study that illustrates how the segmentation quality is affected by each term of the proposed loss function as well as their combinations.
Tasks	Semantic Segmentation
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06098v3
PDF	http://arxiv.org/pdf/1603.06098v3.pdf
PWC	https://paperswithcode.com/paper/seed-expand-and-constrain-three-principles
Repo	https://github.com/kolesman/SEC
Framework	none

Stacked Hourglass Networks for Human Pose Estimation


Title	Stacked Hourglass Networks for Human Pose Estimation
Authors	Alejandro Newell, Kaiyu Yang, Jia Deng
Abstract	This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.
Tasks	Pose Estimation
Published	2016-03-22
URL	http://arxiv.org/abs/1603.06937v2
PDF	http://arxiv.org/pdf/1603.06937v2.pdf
PWC	https://paperswithcode.com/paper/stacked-hourglass-networks-for-human-pose
Repo	https://github.com/neherh/HyperStackNet
Framework	torch

Advances in All-Neural Speech Recognition


Title	Advances in All-Neural Speech Recognition
Authors	G. Zweig, C. Yu, J. Droppo, A. Stolcke
Abstract	This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly used NIST 2000 conversational telephony test set, and significantly exceed the previously published performance of similar systems, both with and without the use of an external language model and decoding technology.
Tasks	Language Modelling, Speech Recognition
Published	2016-09-19
URL	http://arxiv.org/abs/1609.05935v2
PDF	http://arxiv.org/pdf/1609.05935v2.pdf
PWC	https://paperswithcode.com/paper/advances-in-all-neural-speech-recognition
Repo	https://github.com/knlee-voice/PaperNotes
Framework	none

Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks


Title	Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks
Authors	Zhen Li, Yizhou Yu
Abstract	Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available.
Tasks	Multi-Task Learning, Protein Secondary Structure Prediction
Published	2016-04-25
URL	http://arxiv.org/abs/1604.07176v1
PDF	http://arxiv.org/pdf/1604.07176v1.pdf
PWC	https://paperswithcode.com/paper/protein-secondary-structure-prediction-using
Repo	https://github.com/icemansina/IJCAI2016
Framework	none

Neural Text Generation from Structured Data with Application to the Biography Domain


Title	Neural Text Generation from Structured Data with Application to the Biography Domain
Authors	Remi Lebret, David Grangier, Michael Auli
Abstract	This paper introduces a neural model for concept-to-text generation that scales to large, rich domains. We experiment with a new dataset of biographies from Wikipedia that is an order of magnitude larger than existing resources with over 700k samples. The dataset is also vastly more diverse with a 400k vocabulary, compared to a few hundred words for Weathergov or Robocup. Our model builds upon recent work on conditional neural language model for text generation. To deal with the large vocabulary, we extend these models to mix a fixed vocabulary with copy actions that transfer sample-specific words from the input database to the generated output sentence. Our neural model significantly out-performs a classical Kneser-Ney language model adapted to this task by nearly 15 BLEU.
Tasks	Concept-To-Text Generation, Language Modelling, Table-to-Text Generation, Text Generation
Published	2016-03-24
URL	http://arxiv.org/abs/1603.07771v3
PDF	http://arxiv.org/pdf/1603.07771v3.pdf
PWC	https://paperswithcode.com/paper/neural-text-generation-from-structured-data
Repo	https://github.com/tyliupku/wiki2bio
Framework	tf

Unbounded Human Learning: Optimal Scheduling for Spaced Repetition


Title	Unbounded Human Learning: Optimal Scheduling for Spaced Repetition
Authors	Siddharth Reddy, Igor Labutov, Siddhartha Banerjee, Thorsten Joachims
Abstract	In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our paper addresses this gap in three ways. First, we mine log data from spaced repetition software to establish the functional dependence of retention on reinforcement and delay. Second, we use this memory model to develop a stochastic model for spaced repetition systems. We propose a queueing network model of the Leitner system for reviewing flashcards, along with a heuristic approximation that admits a tractable optimization problem for review scheduling. Finally, we empirically evaluate our queueing model through a Mechanical Turk experiment, verifying a key qualitative prediction of our model: the existence of a sharp phase transition in learning outcomes upon increasing the rate of new item introductions.
Tasks
Published	2016-02-23
URL	http://arxiv.org/abs/1602.07032v2
PDF	http://arxiv.org/pdf/1602.07032v2.pdf
PWC	https://paperswithcode.com/paper/unbounded-human-learning-optimal-scheduling
Repo	https://github.com/rddy/leitnerq
Framework	none