Paper Group AWR 34
Sequence-to-Sequence Learning as Beam-Search Optimization. Key-Value Memory Networks for Directly Reading Documents. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. Deep Feature Interpolation for Image Content Changes. Range Loss for Deep Face Recognition with Long-tail. To Frontalize or Not To Front …
Sequence-to-Sequence Learning as Beam-Search Optimization
Title | Sequence-to-Sequence Learning as Beam-Search Optimization |
Authors | Sam Wiseman, Alexander M. Rush |
Abstract | Sequence-to-Sequence (seq2seq) modeling has rapidly become an important general-purpose NLP tool that has proven effective for many text-generation and sequence-labeling tasks. Seq2seq builds on deep neural language modeling and inherits its remarkable accuracy in estimating local, next-word distributions. In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores. This structured approach avoids classical biases associated with local training and unifies the training loss with the test-time usage, while preserving the proven model architecture of seq2seq and its efficient training approach. We show that our system outperforms a highly-optimized attention-based seq2seq system and other baselines on three different sequence to sequence tasks: word ordering, parsing, and machine translation. |
Tasks | Language Modelling, Machine Translation, Text Generation |
Published | 2016-06-09 |
URL | http://arxiv.org/abs/1606.02960v2 |
http://arxiv.org/pdf/1606.02960v2.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-sequence-learning-as-beam-search |
Repo | https://github.com/juliakreutzer/joeynmt |
Framework | pytorch |
Key-Value Memory Networks for Directly Reading Documents
Title | Key-Value Memory Networks for Directly Reading Documents |
Authors | Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, Jason Weston |
Abstract | Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often suffer from being too restrictive, as the schema cannot support certain types of answers, and too sparse, e.g. Wikipedia contains much more information than Freebase. In this work we introduce a new method, Key-Value Memory Networks, that makes reading documents more viable by utilizing different encodings in the addressing and output stages of the memory read operation. To compare using KBs, information extraction or Wikipedia documents directly in a single framework we construct an analysis tool, WikiMovies, a QA dataset that contains raw text alongside a preprocessed KB, in the domain of movies. Our method reduces the gap between all three settings. It also achieves state-of-the-art results on the existing WikiQA benchmark. |
Tasks | Question Answering |
Published | 2016-06-09 |
URL | http://arxiv.org/abs/1606.03126v2 |
http://arxiv.org/pdf/1606.03126v2.pdf | |
PWC | https://paperswithcode.com/paper/key-value-memory-networks-for-directly |
Repo | https://github.com/jojonki/key-value-memory-networks |
Framework | none |
RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
Title | RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism |
Authors | Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun |
Abstract | Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models. |
Tasks | Disease Trajectory Forecasting |
Published | 2016-08-19 |
URL | http://arxiv.org/abs/1608.05745v4 |
http://arxiv.org/pdf/1608.05745v4.pdf | |
PWC | https://paperswithcode.com/paper/retain-an-interpretable-predictive-model-for |
Repo | https://github.com/mp2893/retain |
Framework | none |
Deep Feature Interpolation for Image Content Changes
Title | Deep Feature Interpolation for Image Content Changes |
Authors | Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, Kilian Weinberger |
Abstract | We propose Deep Feature Interpolation (DFI), a new data-driven baseline for automatic high-resolution image transformation. As the name suggests, it relies only on simple linear interpolation of deep convolutional features from pre-trained convnets. We show that despite its simplicity, DFI can perform high-level semantic transformations like “make older/younger”, “make bespectacled”, “add smile”, among others, surprisingly well - sometimes even matching or outperforming the state-of-the-art. This is particularly unexpected as DFI requires no specialized network architecture or even any deep network to be trained for these tasks. DFI therefore can be used as a new baseline to evaluate more complex algorithms and provides a practical answer to the question of which image transformation tasks are still challenging in the rise of deep learning. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05507v2 |
http://arxiv.org/pdf/1611.05507v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-feature-interpolation-for-image-content |
Repo | https://github.com/Berndinio/ORIU |
Framework | pytorch |
Range Loss for Deep Face Recognition with Long-tail
Title | Range Loss for Deep Face Recognition with Long-tail |
Authors | Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao |
Abstract | Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities. To train such a well-designed deep network, tremendous amounts of data is indispensable. Long tail distribution specifically refers to the fact that a small number of generic entities appear frequently while other objects far less existing. Considering the existence of long tail distribution of the real world data, large but uniform distributed data are usually hard to retrieve. Empirical experiences and analysis show that classes with more samples will pose greater impact on the feature learning process and inversely cripple the whole models feature extracting ability on tail part data. Contrary to most of the existing works that alleviate this problem by simply cutting the tailed data for uniform distributions across the classes, this paper proposes a new loss function called range loss to effectively utilize the whole long tailed data in training process. More specifically, range loss is designed to reduce overall intra-personal variations while enlarging inter-personal differences within one mini-batch simultaneously when facing even extremely unbalanced data. The optimization objective of range loss is the $k$ greatest range’s harmonic mean values in one class and the shortest inter-class distance within one batch. Extensive experiments on two famous and challenging face recognition benchmarks (Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) not only demonstrate the effectiveness of the proposed approach in overcoming the long tail effect but also show the good generalization ability of the proposed approach. |
Tasks | Face Recognition |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.08976v1 |
http://arxiv.org/pdf/1611.08976v1.pdf | |
PWC | https://paperswithcode.com/paper/range-loss-for-deep-face-recognition-with |
Repo | https://github.com/DongDem/RangeLoss_Marginal_Loss_Tensorflow |
Framework | tf |
To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-processing To Improve Face Recognition?
Title | To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-processing To Improve Face Recognition? |
Authors | Sandipan Banerjee, Joel Brogan, Janez Krizaj, Aparna Bharati, Brandon RichardWebster, Vitomir Struc, Patrick Flynn, Walter Scheirer |
Abstract | Face recognition performance has improved remarkably in the last decade. Much of this success can be attributed to the development of deep learning techniques such as convolutional neural networks (CNNs). While CNNs have pushed the state-of-the-art forward, their training process requires a large amount of clean and correctly labelled training data. If a CNN is intended to tolerate facial pose, then we face an important question: should this training data be diverse in its pose distribution, or should face images be normalized to a single pose in a pre-processing step? To address this question, we evaluate a number of popular facial landmarking and pose correction algorithms to understand their effect on facial recognition performance. Additionally, we introduce a new, automatic, single-image frontalization scheme that exceeds the performance of current algorithms. CNNs trained using sets of different pre-processing methods are used to extract features from the Point and Shoot Challenge (PaSC) and CMU Multi-PIE datasets. We assert that the subsequent verification and recognition performance serves to quantify the effectiveness of each pose correction scheme. |
Tasks | Face Recognition |
Published | 2016-10-16 |
URL | http://arxiv.org/abs/1610.04823v4 |
http://arxiv.org/pdf/1610.04823v4.pdf | |
PWC | https://paperswithcode.com/paper/to-frontalize-or-not-to-frontalize-do-we |
Repo | https://github.com/joelb92/ND_Frontalization_Project |
Framework | none |
MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Title | MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes |
Authors | Ethan Rudd, Manuel Günther, Terrance Boult |
Abstract | Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network. |
Tasks | Face Recognition |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.07027v2 |
http://arxiv.org/pdf/1603.07027v2.pdf | |
PWC | https://paperswithcode.com/paper/moon-a-mixed-objective-optimization-network |
Repo | https://github.com/feiyunzhang/person_attribute_mxnet |
Framework | mxnet |
A Dual Embedding Space Model for Document Ranking
Title | A Dual Embedding Space Model for Document Ranking |
Authors | Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana |
Abstract | A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features. |
Tasks | Document Ranking, Word Embeddings |
Published | 2016-02-02 |
URL | http://arxiv.org/abs/1602.01137v1 |
http://arxiv.org/pdf/1602.01137v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dual-embedding-space-model-for-document |
Repo | https://github.com/jonanem/Loan |
Framework | none |
On Complex Valued Convolutional Neural Networks
Title | On Complex Valued Convolutional Neural Networks |
Authors | Nitzan Guberman |
Abstract | Convolutional neural networks (CNNs) are the cutting edge model for supervised machine learning in computer vision. In recent years CNNs have outperformed traditional approaches in many computer vision tasks such as object detection, image classification and face recognition. CNNs are vulnerable to overfitting, and a lot of research focuses on finding regularization methods to overcome it. One approach is designing task specific models based on prior knowledge. Several works have shown that properties of natural images can be easily captured using complex numbers. Motivated by these works, we present a variation of the CNN model with complex valued input and weights. We construct the complex model as a generalization of the real model. Lack of order over the complex field raises several difficulties both in the definition and in the training of the network. We address these issues and suggest possible solutions. The resulting model is shown to be a restricted form of a real valued CNN with twice the parameters. It is sensitive to phase structure, and we suggest it serves as a regularized model for problems where such structure is important. This suggestion is verified empirically by comparing the performance of a complex and a real network in the problem of cell detection. The two networks achieve comparable results, and although the complex model is hard to train, it is significantly less vulnerable to overfitting. We also demonstrate that the complex network detects meaningful phase structure in the data. |
Tasks | Face Recognition, Image Classification, Object Detection |
Published | 2016-02-29 |
URL | http://arxiv.org/abs/1602.09046v1 |
http://arxiv.org/pdf/1602.09046v1.pdf | |
PWC | https://paperswithcode.com/paper/on-complex-valued-convolutional-neural |
Repo | https://github.com/Doyosae/Deep_Complex_Networks |
Framework | none |
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
Title | Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation |
Authors | Alexander Kolesnikov, Christoph H. Lampert |
Abstract | We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries. We show experimentally that training a deep convolutional neural network using the proposed loss function leads to substantially better segmentations than previous state-of-the-art methods on the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the working mechanism of our method by a detailed experimental study that illustrates how the segmentation quality is affected by each term of the proposed loss function as well as their combinations. |
Tasks | Semantic Segmentation |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06098v3 |
http://arxiv.org/pdf/1603.06098v3.pdf | |
PWC | https://paperswithcode.com/paper/seed-expand-and-constrain-three-principles |
Repo | https://github.com/kolesman/SEC |
Framework | none |
Stacked Hourglass Networks for Human Pose Estimation
Title | Stacked Hourglass Networks for Human Pose Estimation |
Authors | Alejandro Newell, Kaiyu Yang, Jia Deng |
Abstract | This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods. |
Tasks | Pose Estimation |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.06937v2 |
http://arxiv.org/pdf/1603.06937v2.pdf | |
PWC | https://paperswithcode.com/paper/stacked-hourglass-networks-for-human-pose |
Repo | https://github.com/neherh/HyperStackNet |
Framework | torch |
Advances in All-Neural Speech Recognition
Title | Advances in All-Neural Speech Recognition |
Authors | G. Zweig, C. Yu, J. Droppo, A. Stolcke |
Abstract | This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly used NIST 2000 conversational telephony test set, and significantly exceed the previously published performance of similar systems, both with and without the use of an external language model and decoding technology. |
Tasks | Language Modelling, Speech Recognition |
Published | 2016-09-19 |
URL | http://arxiv.org/abs/1609.05935v2 |
http://arxiv.org/pdf/1609.05935v2.pdf | |
PWC | https://paperswithcode.com/paper/advances-in-all-neural-speech-recognition |
Repo | https://github.com/knlee-voice/PaperNotes |
Framework | none |
Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks
Title | Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks |
Authors | Zhen Li, Yizhou Yu |
Abstract | Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available. |
Tasks | Multi-Task Learning, Protein Secondary Structure Prediction |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07176v1 |
http://arxiv.org/pdf/1604.07176v1.pdf | |
PWC | https://paperswithcode.com/paper/protein-secondary-structure-prediction-using |
Repo | https://github.com/icemansina/IJCAI2016 |
Framework | none |
Neural Text Generation from Structured Data with Application to the Biography Domain
Title | Neural Text Generation from Structured Data with Application to the Biography Domain |
Authors | Remi Lebret, David Grangier, Michael Auli |
Abstract | This paper introduces a neural model for concept-to-text generation that scales to large, rich domains. We experiment with a new dataset of biographies from Wikipedia that is an order of magnitude larger than existing resources with over 700k samples. The dataset is also vastly more diverse with a 400k vocabulary, compared to a few hundred words for Weathergov or Robocup. Our model builds upon recent work on conditional neural language model for text generation. To deal with the large vocabulary, we extend these models to mix a fixed vocabulary with copy actions that transfer sample-specific words from the input database to the generated output sentence. Our neural model significantly out-performs a classical Kneser-Ney language model adapted to this task by nearly 15 BLEU. |
Tasks | Concept-To-Text Generation, Language Modelling, Table-to-Text Generation, Text Generation |
Published | 2016-03-24 |
URL | http://arxiv.org/abs/1603.07771v3 |
http://arxiv.org/pdf/1603.07771v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-text-generation-from-structured-data |
Repo | https://github.com/tyliupku/wiki2bio |
Framework | tf |
Unbounded Human Learning: Optimal Scheduling for Spaced Repetition
Title | Unbounded Human Learning: Optimal Scheduling for Spaced Repetition |
Authors | Siddharth Reddy, Igor Labutov, Siddhartha Banerjee, Thorsten Joachims |
Abstract | In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our paper addresses this gap in three ways. First, we mine log data from spaced repetition software to establish the functional dependence of retention on reinforcement and delay. Second, we use this memory model to develop a stochastic model for spaced repetition systems. We propose a queueing network model of the Leitner system for reviewing flashcards, along with a heuristic approximation that admits a tractable optimization problem for review scheduling. Finally, we empirically evaluate our queueing model through a Mechanical Turk experiment, verifying a key qualitative prediction of our model: the existence of a sharp phase transition in learning outcomes upon increasing the rate of new item introductions. |
Tasks | |
Published | 2016-02-23 |
URL | http://arxiv.org/abs/1602.07032v2 |
http://arxiv.org/pdf/1602.07032v2.pdf | |
PWC | https://paperswithcode.com/paper/unbounded-human-learning-optimal-scheduling |
Repo | https://github.com/rddy/leitnerq |
Framework | none |