July 30, 2019

3298 words 16 mins read

Paper Group AWR 65

Mixed Precision Training. Hidden Two-Stream Convolutional Networks for Action Recognition. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. Stacked Thompson Bandits. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. Order-Planning Neural Text Generation From Structured Data. VIDOS …

Mixed Precision Training


Title	Mixed Precision Training
Authors	Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu
Abstract	Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of information with half-precision gradients. We demonstrate that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. This technique works for large scale models with more than 100 million parameters trained on large datasets. Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x. In future processors, we can also expect a significant computation speedup using half-precision hardware units.
Tasks
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03740v3
PDF	http://arxiv.org/pdf/1710.03740v3.pdf
PWC	https://paperswithcode.com/paper/mixed-precision-training
Repo	https://github.com/NVIDIA/DeepRecommender
Framework	pytorch

Hidden Two-Stream Convolutional Networks for Action Recognition


Title	Hidden Two-Stream Convolutional Networks for Action Recognition
Authors	Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
Abstract	Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs. Such a two-stage approach is computationally expensive, storage demanding, and not end-to-end trainable. In this paper, we present a novel CNN architecture that implicitly captures motion information between adjacent frames. We name our approach hidden two-stream CNNs because it only takes raw video frames as input and directly predicts action classes without explicitly computing optical flow. Our end-to-end approach is 10x faster than its two-stage baseline. Experimental results on four challenging action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show that our approach significantly outperforms the previous best real-time approaches.
Tasks	Action Recognition In Videos, Optical Flow Estimation, Temporal Action Localization
Published	2017-04-02
URL	http://arxiv.org/abs/1704.00389v4
PDF	http://arxiv.org/pdf/1704.00389v4.pdf
PWC	https://paperswithcode.com/paper/hidden-two-stream-convolutional-networks-for
Repo	https://github.com/bryanyzhu/two-stream-pytorch
Framework	pytorch

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations


Title	Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
Authors	Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez
Abstract	Neural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. We apply these penalties both based on expert annotation and in an unsupervised fashion that encourages diverse models with qualitatively different decision boundaries for the same classification problem. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test.
Tasks
Published	2017-03-10
URL	http://arxiv.org/abs/1703.03717v2
PDF	http://arxiv.org/pdf/1703.03717v2.pdf
PWC	https://paperswithcode.com/paper/right-for-the-right-reasons-training
Repo	https://github.com/dtak/rrr
Framework	none

Stacked Thompson Bandits


Title	Stacked Thompson Bandits
Authors	Lenz Belzner, Thomas Gabor
Abstract	We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.
Tasks
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08726v1
PDF	http://arxiv.org/pdf/1702.08726v1.pdf
PWC	https://paperswithcode.com/paper/stacked-thompson-bandits
Repo	https://github.com/jazzbob/stb
Framework	none

Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks


Title	Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks
Authors	Federico Monti, Michael M. Bronstein, Xavier Bresson
Abstract	Matrix completion models are among the most common formulations of recommender systems. Recent works have showed a boost of performance of these techniques when introducing the pairwise relationships between users/items in the form of graphs, and imposing smoothness priors on these graphs. However, such techniques do not fully exploit the local stationarity structures of user/item graphs, and the number of parameters to learn is linear w.r.t. the number of users and items. We propose a novel approach to overcome these limitations by using geometric deep learning on graphs. Our matrix completion architecture combines graph convolutional neural networks and recurrent neural networks to learn meaningful statistical graph-structured patterns and the non-linear diffusion process that generates the known ratings. This neural network system requires a constant number of parameters independent of the matrix size. We apply our method on both synthetic and real datasets, showing that it outperforms state-of-the-art techniques.
Tasks	Matrix Completion, Recommendation Systems
Published	2017-04-22
URL	http://arxiv.org/abs/1704.06803v1
PDF	http://arxiv.org/pdf/1704.06803v1.pdf
PWC	https://paperswithcode.com/paper/geometric-matrix-completion-with-recurrent
Repo	https://github.com/fmonti/mgcnn
Framework	tf

Order-Planning Neural Text Generation From Structured Data


Title	Order-Planning Neural Text Generation From Structured Data
Authors	Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, Sujian Li, Baobao Chang, Zhifang Sui
Abstract	Generating texts from structured data (e.g., a table) is important for various natural language processing tasks such as question answering and dialog systems. In recent studies, researchers use neural language models and encoder-decoder frameworks for table-to-text generation. However, these neural network-based approaches do not model the order of contents during text generation. When a human writes a summary based on a given table, he or she would probably consider the content order before wording. In a biography, for example, the nationality of a person is typically mentioned before occupation in a biography. In this paper, we propose an order-planning text generation model to capture the relationship between different fields and use such relationship to make the generated text more fluent and smooth. We conducted experiments on the WikiBio dataset and achieve significantly higher performance than previous methods in terms of BLEU, ROUGE, and NIST scores.
Tasks	Question Answering, Table-to-Text Generation, Text Generation
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00155v1
PDF	http://arxiv.org/pdf/1709.00155v1.pdf
PWC	https://paperswithcode.com/paper/order-planning-neural-text-generation-from
Repo	https://github.com/anindyasarkarIITH/Structure_data_to_summary
Framework	tf

VIDOSAT: High-dimensional Sparsifying Transform Learning for Online Video Denoising


Title	VIDOSAT: High-dimensional Sparsifying Transform Learning for Online Video Denoising
Authors	Bihan Wen, Saiprasad Ravishankar, Yoram Bresler
Abstract	Techniques exploiting the sparsity of images in a transform domain have been effective for various applications in image and video processing. Transform learning methods involve cheap computations and have been demonstrated to perform well in applications such as image denoising and medical image reconstruction. Recently, we proposed methods for online learning of sparsifying transforms from streaming signals, which enjoy good convergence guarantees, and involve lower computational costs than online synthesis dictionary learning. In this work, we apply online transform learning to video denoising. We present a novel framework for online video denoising based on high-dimensional sparsifying transform learning for spatio-temporal patches. The patches are constructed either from corresponding 2D patches in successive frames or using an online block matching technique. The proposed online video denoising requires little memory, and offers efficient processing. Numerical experiments compare the performance to the proposed video denoising scheme but fixing the transform to be 3D DCT, as well as prior schemes such as dictionary learning-based schemes, and the state-of-the-art VBM3D and VBM4D on several video data sets, demonstrating the promising performance of the proposed methods.
Tasks	Denoising, Dictionary Learning, Image Denoising, Image Reconstruction, Video Denoising
Published	2017-10-03
URL	http://arxiv.org/abs/1710.00947v1
PDF	http://arxiv.org/pdf/1710.00947v1.pdf
PWC	https://paperswithcode.com/paper/vidosat-high-dimensional-sparsifying
Repo	https://github.com/wenbihan/vidosat_icip2015
Framework	none

Hand Keypoint Detection in Single Images using Multiview Bootstrapping


Title	Hand Keypoint Detection in Single Images using Multiview Bootstrapping
Authors	Tomas Simon, Hanbyul Joo, Iain Matthews, Yaser Sheikh
Abstract	We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrapping: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outliers. Finally, the reprojected triangulations are used as new labeled training data to improve the detector. We repeat this process, generating more labeled data in each iteration. We derive a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector. The method is used to train a hand keypoint detector for single images. The resulting keypoint detector runs in realtime on RGB images and has accuracy comparable to methods that use depth sensors. The single view detector, triangulated over multiple views, enables 3D markerless hand motion capture with complex object interactions.
Tasks	Keypoint Detection, Motion Capture
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07809v1
PDF	http://arxiv.org/pdf/1704.07809v1.pdf
PWC	https://paperswithcode.com/paper/hand-keypoint-detection-in-single-images
Repo	https://github.com/laobaiswag/openpose1
Framework	pytorch

Direct Multitype Cardiac Indices Estimation via Joint Representation and Regression Learning


Title	Direct Multitype Cardiac Indices Estimation via Joint Representation and Regression Learning
Authors	Wufeng Xue, Ali Islam, Mousumi Bhaduri, Shuo Li
Abstract	Cardiac indices estimation is of great importance during identification and diagnosis of cardiac disease in clinical routine. However, estimation of multitype cardiac indices with consistently reliable and high accuracy is still a great challenge due to the high variability of cardiac structures and complexity of temporal dynamics in cardiac MR sequences. While efforts have been devoted into cardiac volumes estimation through feature engineering followed by a independent regression model, these methods suffer from the vulnerable feature representation and incompatible regression model. In this paper, we propose a semi-automated method for multitype cardiac indices estimation. After manual labelling of two landmarks for ROI cropping, an integrated deep neural network Indices-Net is designed to jointly learn the representation and regression models. It comprises two tightly-coupled networks: a deep convolution autoencoder (DCAE) for cardiac image representation, and a multiple output convolution neural network (CNN) for indices regression. Joint learning of the two networks effectively enhances the expressiveness of image representation with respect to cardiac indices, and the compatibility between image representation and indices regression, thus leading to accurate and reliable estimations for all the cardiac indices. When applied with five-fold cross validation on MR images of 145 subjects, Indices-Net achieves consistently low estimation error for LV wall thicknesses (1.44$\pm$0.71mm) and areas of cavity and myocardium (204$\pm$133mm$^2$). It outperforms, with significant error reductions, segmentation method (55.1% and 17.4%) and two-phase direct volume-only methods (12.7% and 14.6%) for wall thicknesses and areas, respectively. These advantages endow the proposed method a great potential in clinical cardiac function assessment.
Tasks	Feature Engineering
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09307v1
PDF	http://arxiv.org/pdf/1705.09307v1.pdf
PWC	https://paperswithcode.com/paper/direct-multitype-cardiac-indices-estimation
Repo	https://github.com/alejandrodebus/IndicesNet
Framework	pytorch

DeepCCI: End-to-end Deep Learning for Chemical-Chemical Interaction Prediction


Title	DeepCCI: End-to-end Deep Learning for Chemical-Chemical Interaction Prediction
Authors	Sunyoung Kwon, Sungroh Yoon
Abstract	Chemical-chemical interaction (CCI) plays a key role in predicting candidate drugs, toxicity, therapeutic effects, and biological functions. In various types of chemical analyses, computational approaches are often required due to the amount of data that needs to be handled. The recent remarkable growth and outstanding performance of deep learning have attracted considerable research attention. However,even in state-of-the-art drug analysis methods, deep learning continues to be used only as a classifier, although deep learning is capable of not only simple classification but also automated feature extraction. In this paper, we propose the first end-to-end learning method for CCI, named DeepCCI. Hidden features are derived from a simplified molecular input line entry system (SMILES), which is a string notation representing the chemical structure, instead of learning from crafted features. To discover hidden representations for the SMILES strings, we use convolutional neural networks (CNNs). To guarantee the commutative property for homogeneous interaction, we apply model sharing and hidden representation merging techniques. The performance of DeepCCI was compared with a plain deep classifier and conventional machine learning methods. The proposed DeepCCI showed the best performance in all seven evaluation metrics used. In addition, the commutative property was experimentally validated. The automatically extracted features through end-to-end SMILES learning alleviates the significant efforts required for manual feature engineering. It is expected to improve prediction performance, in drug analyses.
Tasks	Feature Engineering
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08432v3
PDF	http://arxiv.org/pdf/1704.08432v3.pdf
PWC	https://paperswithcode.com/paper/deepcci-end-to-end-deep-learning-for-chemical
Repo	https://github.com/cool21th/ai_drug_discovery
Framework	none

Sentiment Analysis of Citations Using Word2vec


Title	Sentiment Analysis of Citations Using Word2vec
Authors	Haixia Liu
Abstract	Citation sentiment analysis is an important task in scientific paper analysis. Existing machine learning techniques for citation sentiment analysis are focusing on labor-intensive feature engineering, which requires large annotated corpus. As an automatic feature extraction tool, word2vec has been successfully applied to sentiment analysis of short texts. In this work, I conducted empirical research with the question: how well does word2vec work on the sentiment analysis of citations? The proposed method constructed sentence vectors (sent2vec) by averaging the word embeddings, which were learned from Anthology Collections (ACL-Embeddings). I also investigated polarity-specific word embeddings (PS-Embeddings) for classifying positive and negative citations. The sentence vectors formed a feature space, to which the examined citation sentence was mapped to. Those features were input into classifiers (support vector machines) for supervised classification. Using 10-cross-validation scheme, evaluation was conducted on a set of annotated citations. The results showed that word embeddings are effective on classifying positive and negative citations. However, hand-crafted features performed better for the overall classification.
Tasks	Feature Engineering, Sentiment Analysis, Word Embeddings
Published	2017-04-01
URL	http://arxiv.org/abs/1704.00177v1
PDF	http://arxiv.org/pdf/1704.00177v1.pdf
PWC	https://paperswithcode.com/paper/sentiment-analysis-of-citations-using
Repo	https://github.com/liuhaixiachina/Sentiment-Analysis-of-Citations-Using-Word2vec
Framework	none

Automatic Argumentative-Zoning Using Word2vec


Title	Automatic Argumentative-Zoning Using Word2vec
Authors	Haixia Liu
Abstract	In comparison with document summarization on the articles from social media and newswire, argumentative zoning (AZ) is an important task in scientific paper analysis. Traditional methodology to carry on this task relies on feature engineering from different levels. In this paper, three models of generating sentence vectors for the task of sentence classification were explored and compared. The proposed approach builds sentence representations using learned embeddings based on neural network. The learned word embeddings formed a feature space, to which the examined sentence is mapped to. Those features are input into the classifiers for supervised classification. Using 10-cross-validation scheme, evaluation was conducted on the Argumentative-Zoning (AZ) annotated articles. The results showed that simply averaging the word vectors in a sentence works better than the paragraph to vector algorithm and by integrating specific cuewords into the loss function of the neural network can improve the classification performance. In comparison with the hand-crafted features, the word2vec method won for most of the categories. However, the hand-crafted features showed their strength on classifying some of the categories.
Tasks	Document Summarization, Feature Engineering, Sentence Classification, Word Embeddings
Published	2017-03-29
URL	http://arxiv.org/abs/1703.10152v1
PDF	http://arxiv.org/pdf/1703.10152v1.pdf
PWC	https://paperswithcode.com/paper/automatic-argumentative-zoning-using-word2vec
Repo	https://github.com/abstatic/ire_project18
Framework	tf

A Mention-Ranking Model for Abstract Anaphora Resolution


Title	A Mention-Ranking Model for Abstract Anaphora Resolution
Authors	Ana Marasović, Leo Born, Juri Opitz, Anette Frank
Abstract	Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence–antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and – if disregarding syntax – discriminates candidates using deeper features.
Tasks	Abstract Anaphora Resolution, Representation Learning
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02256v2
PDF	http://arxiv.org/pdf/1706.02256v2.pdf
PWC	https://paperswithcode.com/paper/a-mention-ranking-model-for-abstract-anaphora
Repo	https://github.com/amarasovic/neural-abstract-anaphora
Framework	tf

Towards Deep Learning Models Resistant to Adversarial Attacks


Title	Towards Deep Learning Models Resistant to Adversarial Attacks
Authors	Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
Abstract	Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples—inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.
Tasks	Adversarial Defense
Published	2017-06-19
URL	https://arxiv.org/abs/1706.06083v4
PDF	https://arxiv.org/pdf/1706.06083v4.pdf
PWC	https://paperswithcode.com/paper/towards-deep-learning-models-resistant-to
Repo	https://github.com/MadryLab/cifar10_challenge
Framework	tf

Solving internal covariate shift in deep learning with linked neurons


Title	Solving internal covariate shift in deep learning with linked neurons
Authors	Carles Roger Riera Molina, Oriol Pujol Vila
Abstract	This work proposes a novel solution to the problem of internal covariate shift and dying neurons using the concept of linked neurons. We define the neuron linkage in terms of two constraints: first, all neuron activations in the linkage must have the same operating point. That is to say, all of them share input weights. Secondly, a set of neurons is linked if and only if there is at least one member of the linkage that has a non-zero gradient in regard to the input of the activation function. This means that for any input in the activation function, there is at least one member of the linkage that operates in a non-flat and non-zero area. This simple change has profound implications in the network learning dynamics. In this article we explore the consequences of this proposal and show that by using this kind of units, internal covariate shift is implicitly solved. As a result of this, the use of linked neurons allows to train arbitrarily large networks without any architectural or algorithmic trick, effectively removing the need of using re-normalization schemes such as Batch Normalization, which leads to halving the required training time. It also solves the problem of the need for standarized input data. Results show that the units using the linkage not only do effectively solve the aforementioned problems, but are also a competitive alternative with respect to state-of-the-art with very promising results.
Tasks
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02609v1
PDF	http://arxiv.org/pdf/1712.02609v1.pdf
PWC	https://paperswithcode.com/paper/solving-internal-covariate-shift-in-deep
Repo	https://github.com/blauigris/linked_neurons
Framework	tf