July 30, 2019

3262 words 16 mins read

Paper Group AWR 35

SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks. InterpNET: Neural Introspection for Interpretable Deep Learning. To Index or Not to Index: Optimizing Exact Maximum Inner Product Search. Deep Architectures for Neural Machine Translatio …

SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks


Title	SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks
Authors	Patrick Ferdinand Christ, Florian Ettlinger, Georgios Kaissis, Sebastian Schlecht, Freba Ahmaddy, Felix Grün, Alexander Valentinitsch, Seyed-Ahmad Ahmadi, Rickmer Braren, Bjoern Menze
Abstract	Automatic non-invasive assessment of hepatocellular carcinoma (HCC) malignancy has the potential to substantially enhance tumor treatment strategies for HCC patients. In this work we present a novel framework to automatically characterize the malignancy of HCC lesions from DWI images. We predict HCC malignancy in two steps: As a first step we automatically segment HCC tumor lesions using cascaded fully convolutional neural networks (CFCN). A 3D neural network (SurvivalNet) then predicts the HCC lesions’ malignancy from the HCC tumor segmentation. We formulate this task as a classification problem with classes being “low risk” and “high risk” represented by longer or shorter survival times than the median survival. We evaluated our method on DWI of 31 HCC patients. Our proposed framework achieves an end-to-end accuracy of 65% with a Dice score for the automatic lesion segmentation of 69% and an accuracy of 68% for tumor malignancy classification based on expert annotations. We compared the SurvivalNet to classical handcrafted features such as Histogram and Haralick and show experimentally that SurvivalNet outperforms the handcrafted features in HCC malignancy classification. End-to-end assessment of tumor malignancy based on our proposed fully automatic framework corresponds to assessment based on expert annotations with high significance (p>0.95).
Tasks	Lesion Segmentation
Published	2017-02-20
URL	http://arxiv.org/abs/1702.05941v1
PDF	http://arxiv.org/pdf/1702.05941v1.pdf
PWC	https://paperswithcode.com/paper/survivalnet-predicting-patient-survival-from
Repo	https://github.com/FelixGruen/tensorflow-u-net
Framework	tf

InterpNET: Neural Introspection for Interpretable Deep Learning


Title	InterpNET: Neural Introspection for Interpretable Deep Learning
Authors	Shane Barratt
Abstract	Humans are able to explain their reasoning. On the contrary, deep neural networks are not. This paper attempts to bridge this gap by introducing a new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system’s inner-workings. This paper proposes a neural network design paradigm, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications. The success of the module relies on the assumption that the network’s computation and reasoning is represented in its internal layer activations. While in principle InterpNET could be applied to any existing classification architecture, it is evaluated via an image classification and explanation task. Experiments on a CUB bird classification and explanation dataset show qualitatively and quantitatively that the model is able to generate high-quality explanations. While the current state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a much higher METEOR score of 37.9.
Tasks	Image Classification
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09511v2
PDF	http://arxiv.org/pdf/1710.09511v2.pdf
PWC	https://paperswithcode.com/paper/interpnet-neural-introspection-for
Repo	https://github.com/sbarratt/interpnet
Framework	tf

To Index or Not to Index: Optimizing Exact Maximum Inner Product Search


Title	To Index or Not to Index: Optimizing Exact Maximum Inner Product Search
Authors	Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia
Abstract	Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some – but not all – inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2$\times$ on average, and up to 10.9$\times$, on widely studied MIPS datasets.
Tasks	Recommendation Systems
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01449v3
PDF	http://arxiv.org/pdf/1706.01449v3.pdf
PWC	https://paperswithcode.com/paper/to-index-or-not-to-index-optimizing-exact
Repo	https://github.com/stanford-futuredata/optimus-maximus
Framework	none

Deep Architectures for Neural Machine Translation


Title	Deep Architectures for Neural Machine Translation
Authors	Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch
Abstract	It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel “BiDeep” RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.
Tasks	Machine Translation
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07631v1
PDF	http://arxiv.org/pdf/1707.07631v1.pdf
PWC	https://paperswithcode.com/paper/deep-architectures-for-neural-machine
Repo	https://github.com/bhaddow/dev-nematus
Framework	tf

DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning


Title	DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning
Authors	Shubhra Aich, Anique Josuttes, Ilya Ovsyannikov, Keegan Strueby, Imran Ahmed, Hema Sudhakar Duddu, Curtis Pozniak, Steve Shirtliffe, Ian Stavness
Abstract	In this paper, we investigate estimating emergence and biomass traits from color images and elevation maps of wheat field plots. We employ a state-of-the-art deconvolutional network for segmentation and convolutional architectures, with residual and Inception-like layers, to estimate traits via high dimensional nonlinear regression. Evaluation was performed on two different species of wheat, grown in field plots for an experimental plant breeding study. Our framework achieves satisfactory performance with mean and standard deviation of absolute difference of 1.05 and 1.40 counts for emergence and 1.45 and 2.05 for biomass estimation. Our results for counting wheat plants from field images are better than the accuracy reported for the similar, but arguably less difficult, task of counting leaves from indoor images of rosette plants. Our results for biomass estimation, even with a very small dataset, improve upon all previously proposed approaches in the literature.
Tasks
Published	2017-09-30
URL	http://arxiv.org/abs/1710.00241v2
PDF	http://arxiv.org/pdf/1710.00241v2.pdf
PWC	https://paperswithcode.com/paper/deepwheat-estimating-phenotypic-traits-from
Repo	https://github.com/p2irc/deepwheat_WACV-2018
Framework	none

MoleculeNet: A Benchmark for Molecular Machine Learning


Title	MoleculeNet: A Benchmark for Molecular Machine Learning
Authors	Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande
Abstract	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
Tasks
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00564v3
PDF	http://arxiv.org/pdf/1703.00564v3.pdf
PWC	https://paperswithcode.com/paper/moleculenet-a-benchmark-for-molecular-machine
Repo	https://github.com/LeeJunHyun/The-Databases-for-Drug-Discovery
Framework	tf

Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport


Title	Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport
Authors	Hiroyuki Sato, Hiroyuki Kasai, Bamdev Mishra
Abstract	In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic variance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. For the proposed algorithm, we present a global convergence analysis with a decaying step size as well as a local convergence rate analysis with a fixed step size under some natural assumptions. In addition, the proposed algorithm is applied to the computation problem of the Riemannian centroid on the symmetric positive definite (SPD) manifold as well as the principal component analysis and low-rank matrix completion problems on the Grassmann manifold. The results show that the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm in each case.
Tasks	Low-Rank Matrix Completion, Matrix Completion
Published	2017-02-18
URL	https://arxiv.org/abs/1702.05594v3
PDF	https://arxiv.org/pdf/1702.05594v3.pdf
PWC	https://paperswithcode.com/paper/riemannian-stochastic-variance-reduced-1
Repo	https://github.com/hiroyuki-kasai/RSOpt
Framework	none

The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants


Title	The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Authors	Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, Benno Stein
Abstract	Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.
Tasks	Common Sense Reasoning
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01425v4
PDF	http://arxiv.org/pdf/1708.01425v4.pdf
PWC	https://paperswithcode.com/paper/the-argument-reasoning-comprehension-task
Repo	https://github.com/hongking9/SemEval-2018-task12
Framework	none

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks


Title	Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
Authors	Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
Abstract	In this paper we propose the utterance-level Permutation Invariant Training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep learning based solution for speaker independent multi-talker speech separation. Specifically, uPIT extends the recently proposed Permutation Invariant Training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using Recurrent Neural Networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output stream. In practice, this allows RNNs, trained with uPIT, to separate multi-talker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity or gender. We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet). Furthermore, we found that models trained with uPIT generalize well to unseen speakers and languages. Finally, we found that a single model, trained with uPIT, can handle both two-speaker, and three-speaker speech mixtures.
Tasks	Speech Separation
Published	2017-03-18
URL	http://arxiv.org/abs/1703.06284v2
PDF	http://arxiv.org/pdf/1703.06284v2.pdf
PWC	https://paperswithcode.com/paper/multi-talker-speech-separation-with-utterance
Repo	https://github.com/snsun/pit-speech-separation
Framework	tf

Collaborative Low-Rank Subspace Clustering


Title	Collaborative Low-Rank Subspace Clustering
Authors	Stephen Tierney, Yi Guo, Junbin Gao
Abstract	In this paper we present Collaborative Low-Rank Subspace Clustering. Given multiple observations of a phenomenon we learn a unified representation matrix. This unified matrix incorporates the features from all the observations, thus increasing the discriminative power compared with learning the representation matrix on each observation separately. Experimental evaluation shows that our method outperforms subspace clustering on separate observations and the state of the art collaborative learning algorithm.
Tasks
Published	2017-04-13
URL	http://arxiv.org/abs/1704.03966v1
PDF	http://arxiv.org/pdf/1704.03966v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-low-rank-subspace-clustering
Repo	https://github.com/sjtrny/collab_lrsc
Framework	none

Efficient Data Representation by Selecting Prototypes with Importance Weights


Title	Efficient Data Representation by Selecting Prototypes with Importance Weights
Authors	Karthik S. Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi, Charu Aggarwal
Abstract	Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.
Tasks
Published	2017-07-05
URL	https://arxiv.org/abs/1707.01212v4
PDF	https://arxiv.org/pdf/1707.01212v4.pdf
PWC	https://paperswithcode.com/paper/protodash-fast-interpretable-prototype
Repo	https://github.com/IBM/AIX360
Framework	pytorch

Train longer, generalize better: closing the generalization gap in large batch training of neural networks


Title	Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Authors	Elad Hoffer, Itay Hubara, Daniel Soudry
Abstract	Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the “generalization gap” phenomena. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase. We find that the weight distance from its initialization grows logarithmically with the number of weight updates. We therefore propose a “random walk on random landscape” statistical model which is known to exhibit similar “ultra-slow” diffusion behavior. Following this hypothesis we conducted experiments to show empirically that the “generalization gap” stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used. We further investigate different techniques to train models in the large-batch regime and present a novel algorithm named “Ghost Batch Normalization” which enables significant decrease in the generalization gap without increasing the number of updates. To validate our findings we conduct several additional experiments on MNIST, CIFAR-10, CIFAR-100 and ImageNet. Finally, we reassess common practices and beliefs concerning training of deep models and suggest they may not be optimal to achieve good generalization.
Tasks
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08741v2
PDF	http://arxiv.org/pdf/1705.08741v2.pdf
PWC	https://paperswithcode.com/paper/train-longer-generalize-better-closing-the
Repo	https://github.com/eladhoffer/bigBatch
Framework	pytorch

Multi-way Interacting Regression via Factorization Machines


Title	Multi-way Interacting Regression via Factorization Machines
Authors	Mikhail Yurochkin, XuanLong Nguyen, Nikolaos Vasiloglou
Abstract	We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting.
Tasks
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09301v1
PDF	http://arxiv.org/pdf/1709.09301v1.pdf
PWC	https://paperswithcode.com/paper/multi-way-interacting-regression-via
Repo	https://github.com/moonfolk/MiFM
Framework	none

Self-Supervised Damage-Avoiding Manipulation Strategy Optimization via Mental Simulation


Title	Self-Supervised Damage-Avoiding Manipulation Strategy Optimization via Mental Simulation
Authors	Tobias Doernbach
Abstract	Everyday robotics are challenged to deal with autonomous product handling in applications like logistics or retail, possibly causing damage on the items during manipulation. Traditionally, most approaches try to minimize physical interaction with goods. However, this paper proposes to take into account any unintended object motion and to learn damage-minimizing manipulation strategies in a self-supervised way. The presented approach consists of a simulation-based planning method for an optimal manipulation sequence with respect to possible damage. The planned manipulation sequences are generalized to new, unseen scenes in the same application scenario using machine learning. This learned manipulation strategy is continuously refined in a self-supervised, simulation-in-the-loop optimization cycle during load-free times of the system, commonly known as mental simulation. In parallel, the generated manipulation strategies can be deployed in near-real time in an anytime fashion. The approach is validated on an industrial container-unloading scenario and on a retail shelf-replenishment scenario.
Tasks
Published	2017-12-20
URL	https://arxiv.org/abs/1712.07452v2
PDF	https://arxiv.org/pdf/1712.07452v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-damage-avoiding-manipulation
Repo	https://github.com/jacobs-robotics/gazebo-mental-simulation-meta-example
Framework	none

Understanding Synthetic Gradients and Decoupled Neural Interfaces


Title	Understanding Synthetic Gradients and Decoupled Neural Interfaces
Authors	Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
Abstract	When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better understand their behaviour and elucidate their effect on optimisation. We show that the incorporation of SGs does not affect the representational strength of the learning system for a neural network, and prove the convergence of the learning system for linear and deep linear models. On practical problems we investigate the mechanism by which synthetic gradient estimators approximate the true loss, and, surprisingly, how that leads to drastically different layer-wise representations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00522v1
PDF	http://arxiv.org/pdf/1703.00522v1.pdf
PWC	https://paperswithcode.com/paper/understanding-synthetic-gradients-and
Repo	https://github.com/quangvu0702/Synthetic-Gradients
Framework	pytorch