Paper Group AWR 35
SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks. InterpNET: Neural Introspection for Interpretable Deep Learning. To Index or Not to Index: Optimizing Exact Maximum Inner Product Search. Deep Architectures for Neural Machine Translatio …
SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks
Title | SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks |
Authors | Patrick Ferdinand Christ, Florian Ettlinger, Georgios Kaissis, Sebastian Schlecht, Freba Ahmaddy, Felix Grün, Alexander Valentinitsch, Seyed-Ahmad Ahmadi, Rickmer Braren, Bjoern Menze |
Abstract | Automatic non-invasive assessment of hepatocellular carcinoma (HCC) malignancy has the potential to substantially enhance tumor treatment strategies for HCC patients. In this work we present a novel framework to automatically characterize the malignancy of HCC lesions from DWI images. We predict HCC malignancy in two steps: As a first step we automatically segment HCC tumor lesions using cascaded fully convolutional neural networks (CFCN). A 3D neural network (SurvivalNet) then predicts the HCC lesions’ malignancy from the HCC tumor segmentation. We formulate this task as a classification problem with classes being “low risk” and “high risk” represented by longer or shorter survival times than the median survival. We evaluated our method on DWI of 31 HCC patients. Our proposed framework achieves an end-to-end accuracy of 65% with a Dice score for the automatic lesion segmentation of 69% and an accuracy of 68% for tumor malignancy classification based on expert annotations. We compared the SurvivalNet to classical handcrafted features such as Histogram and Haralick and show experimentally that SurvivalNet outperforms the handcrafted features in HCC malignancy classification. End-to-end assessment of tumor malignancy based on our proposed fully automatic framework corresponds to assessment based on expert annotations with high significance (p>0.95). |
Tasks | Lesion Segmentation |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.05941v1 |
http://arxiv.org/pdf/1702.05941v1.pdf | |
PWC | https://paperswithcode.com/paper/survivalnet-predicting-patient-survival-from |
Repo | https://github.com/FelixGruen/tensorflow-u-net |
Framework | tf |
InterpNET: Neural Introspection for Interpretable Deep Learning
Title | InterpNET: Neural Introspection for Interpretable Deep Learning |
Authors | Shane Barratt |
Abstract | Humans are able to explain their reasoning. On the contrary, deep neural networks are not. This paper attempts to bridge this gap by introducing a new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system’s inner-workings. This paper proposes a neural network design paradigm, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications. The success of the module relies on the assumption that the network’s computation and reasoning is represented in its internal layer activations. While in principle InterpNET could be applied to any existing classification architecture, it is evaluated via an image classification and explanation task. Experiments on a CUB bird classification and explanation dataset show qualitatively and quantitatively that the model is able to generate high-quality explanations. While the current state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a much higher METEOR score of 37.9. |
Tasks | Image Classification |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09511v2 |
http://arxiv.org/pdf/1710.09511v2.pdf | |
PWC | https://paperswithcode.com/paper/interpnet-neural-introspection-for |
Repo | https://github.com/sbarratt/interpnet |
Framework | tf |
To Index or Not to Index: Optimizing Exact Maximum Inner Product Search
Title | To Index or Not to Index: Optimizing Exact Maximum Inner Product Search |
Authors | Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia |
Abstract | Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some – but not all – inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2$\times$ on average, and up to 10.9$\times$, on widely studied MIPS datasets. |
Tasks | Recommendation Systems |
Published | 2017-06-05 |
URL | http://arxiv.org/abs/1706.01449v3 |
http://arxiv.org/pdf/1706.01449v3.pdf | |
PWC | https://paperswithcode.com/paper/to-index-or-not-to-index-optimizing-exact |
Repo | https://github.com/stanford-futuredata/optimus-maximus |
Framework | none |
Deep Architectures for Neural Machine Translation
Title | Deep Architectures for Neural Machine Translation |
Authors | Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch |
Abstract | It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel “BiDeep” RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption. |
Tasks | Machine Translation |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07631v1 |
http://arxiv.org/pdf/1707.07631v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-architectures-for-neural-machine |
Repo | https://github.com/bhaddow/dev-nematus |
Framework | tf |
DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning
Title | DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning |
Authors | Shubhra Aich, Anique Josuttes, Ilya Ovsyannikov, Keegan Strueby, Imran Ahmed, Hema Sudhakar Duddu, Curtis Pozniak, Steve Shirtliffe, Ian Stavness |
Abstract | In this paper, we investigate estimating emergence and biomass traits from color images and elevation maps of wheat field plots. We employ a state-of-the-art deconvolutional network for segmentation and convolutional architectures, with residual and Inception-like layers, to estimate traits via high dimensional nonlinear regression. Evaluation was performed on two different species of wheat, grown in field plots for an experimental plant breeding study. Our framework achieves satisfactory performance with mean and standard deviation of absolute difference of 1.05 and 1.40 counts for emergence and 1.45 and 2.05 for biomass estimation. Our results for counting wheat plants from field images are better than the accuracy reported for the similar, but arguably less difficult, task of counting leaves from indoor images of rosette plants. Our results for biomass estimation, even with a very small dataset, improve upon all previously proposed approaches in the literature. |
Tasks | |
Published | 2017-09-30 |
URL | http://arxiv.org/abs/1710.00241v2 |
http://arxiv.org/pdf/1710.00241v2.pdf | |
PWC | https://paperswithcode.com/paper/deepwheat-estimating-phenotypic-traits-from |
Repo | https://github.com/p2irc/deepwheat_WACV-2018 |
Framework | none |
MoleculeNet: A Benchmark for Molecular Machine Learning
Title | MoleculeNet: A Benchmark for Molecular Machine Learning |
Authors | Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande |
Abstract | Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00564v3 |
http://arxiv.org/pdf/1703.00564v3.pdf | |
PWC | https://paperswithcode.com/paper/moleculenet-a-benchmark-for-molecular-machine |
Repo | https://github.com/LeeJunHyun/The-Databases-for-Drug-Discovery |
Framework | tf |
Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport
Title | Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport |
Authors | Hiroyuki Sato, Hiroyuki Kasai, Bamdev Mishra |
Abstract | In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic variance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. For the proposed algorithm, we present a global convergence analysis with a decaying step size as well as a local convergence rate analysis with a fixed step size under some natural assumptions. In addition, the proposed algorithm is applied to the computation problem of the Riemannian centroid on the symmetric positive definite (SPD) manifold as well as the principal component analysis and low-rank matrix completion problems on the Grassmann manifold. The results show that the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm in each case. |
Tasks | Low-Rank Matrix Completion, Matrix Completion |
Published | 2017-02-18 |
URL | https://arxiv.org/abs/1702.05594v3 |
https://arxiv.org/pdf/1702.05594v3.pdf | |
PWC | https://paperswithcode.com/paper/riemannian-stochastic-variance-reduced-1 |
Repo | https://github.com/hiroyuki-kasai/RSOpt |
Framework | none |
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Title | The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants |
Authors | Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, Benno Stein |
Abstract | Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice. |
Tasks | Common Sense Reasoning |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.01425v4 |
http://arxiv.org/pdf/1708.01425v4.pdf | |
PWC | https://paperswithcode.com/paper/the-argument-reasoning-comprehension-task |
Repo | https://github.com/hongking9/SemEval-2018-task12 |
Framework | none |
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
Title | Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks |
Authors | Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen |
Abstract | In this paper we propose the utterance-level Permutation Invariant Training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep learning based solution for speaker independent multi-talker speech separation. Specifically, uPIT extends the recently proposed Permutation Invariant Training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using Recurrent Neural Networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output stream. In practice, this allows RNNs, trained with uPIT, to separate multi-talker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity or gender. We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet). Furthermore, we found that models trained with uPIT generalize well to unseen speakers and languages. Finally, we found that a single model, trained with uPIT, can handle both two-speaker, and three-speaker speech mixtures. |
Tasks | Speech Separation |
Published | 2017-03-18 |
URL | http://arxiv.org/abs/1703.06284v2 |
http://arxiv.org/pdf/1703.06284v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-talker-speech-separation-with-utterance |
Repo | https://github.com/snsun/pit-speech-separation |
Framework | tf |
Collaborative Low-Rank Subspace Clustering
Title | Collaborative Low-Rank Subspace Clustering |
Authors | Stephen Tierney, Yi Guo, Junbin Gao |
Abstract | In this paper we present Collaborative Low-Rank Subspace Clustering. Given multiple observations of a phenomenon we learn a unified representation matrix. This unified matrix incorporates the features from all the observations, thus increasing the discriminative power compared with learning the representation matrix on each observation separately. Experimental evaluation shows that our method outperforms subspace clustering on separate observations and the state of the art collaborative learning algorithm. |
Tasks | |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.03966v1 |
http://arxiv.org/pdf/1704.03966v1.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-low-rank-subspace-clustering |
Repo | https://github.com/sjtrny/collab_lrsc |
Framework | none |
Efficient Data Representation by Selecting Prototypes with Importance Weights
Title | Efficient Data Representation by Selecting Prototypes with Importance Weights |
Authors | Karthik S. Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi, Charu Aggarwal |
Abstract | Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method. |
Tasks | |
Published | 2017-07-05 |
URL | https://arxiv.org/abs/1707.01212v4 |
https://arxiv.org/pdf/1707.01212v4.pdf | |
PWC | https://paperswithcode.com/paper/protodash-fast-interpretable-prototype |
Repo | https://github.com/IBM/AIX360 |
Framework | pytorch |
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Title | Train longer, generalize better: closing the generalization gap in large batch training of neural networks |
Authors | Elad Hoffer, Itay Hubara, Daniel Soudry |
Abstract | Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the “generalization gap” phenomena. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase. We find that the weight distance from its initialization grows logarithmically with the number of weight updates. We therefore propose a “random walk on random landscape” statistical model which is known to exhibit similar “ultra-slow” diffusion behavior. Following this hypothesis we conducted experiments to show empirically that the “generalization gap” stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used. We further investigate different techniques to train models in the large-batch regime and present a novel algorithm named “Ghost Batch Normalization” which enables significant decrease in the generalization gap without increasing the number of updates. To validate our findings we conduct several additional experiments on MNIST, CIFAR-10, CIFAR-100 and ImageNet. Finally, we reassess common practices and beliefs concerning training of deep models and suggest they may not be optimal to achieve good generalization. |
Tasks | |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08741v2 |
http://arxiv.org/pdf/1705.08741v2.pdf | |
PWC | https://paperswithcode.com/paper/train-longer-generalize-better-closing-the |
Repo | https://github.com/eladhoffer/bigBatch |
Framework | pytorch |
Multi-way Interacting Regression via Factorization Machines
Title | Multi-way Interacting Regression via Factorization Machines |
Authors | Mikhail Yurochkin, XuanLong Nguyen, Nikolaos Vasiloglou |
Abstract | We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting. |
Tasks | |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09301v1 |
http://arxiv.org/pdf/1709.09301v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-way-interacting-regression-via |
Repo | https://github.com/moonfolk/MiFM |
Framework | none |
Self-Supervised Damage-Avoiding Manipulation Strategy Optimization via Mental Simulation
Title | Self-Supervised Damage-Avoiding Manipulation Strategy Optimization via Mental Simulation |
Authors | Tobias Doernbach |
Abstract | Everyday robotics are challenged to deal with autonomous product handling in applications like logistics or retail, possibly causing damage on the items during manipulation. Traditionally, most approaches try to minimize physical interaction with goods. However, this paper proposes to take into account any unintended object motion and to learn damage-minimizing manipulation strategies in a self-supervised way. The presented approach consists of a simulation-based planning method for an optimal manipulation sequence with respect to possible damage. The planned manipulation sequences are generalized to new, unseen scenes in the same application scenario using machine learning. This learned manipulation strategy is continuously refined in a self-supervised, simulation-in-the-loop optimization cycle during load-free times of the system, commonly known as mental simulation. In parallel, the generated manipulation strategies can be deployed in near-real time in an anytime fashion. The approach is validated on an industrial container-unloading scenario and on a retail shelf-replenishment scenario. |
Tasks | |
Published | 2017-12-20 |
URL | https://arxiv.org/abs/1712.07452v2 |
https://arxiv.org/pdf/1712.07452v2.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-damage-avoiding-manipulation |
Repo | https://github.com/jacobs-robotics/gazebo-mental-simulation-meta-example |
Framework | none |
Understanding Synthetic Gradients and Decoupled Neural Interfaces
Title | Understanding Synthetic Gradients and Decoupled Neural Interfaces |
Authors | Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu |
Abstract | When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better understand their behaviour and elucidate their effect on optimisation. We show that the incorporation of SGs does not affect the representational strength of the learning system for a neural network, and prove the convergence of the learning system for linear and deep linear models. On practical problems we investigate the mechanism by which synthetic gradient estimators approximate the true loss, and, surprisingly, how that leads to drastically different layer-wise representations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00522v1 |
http://arxiv.org/pdf/1703.00522v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-synthetic-gradients-and |
Repo | https://github.com/quangvu0702/Synthetic-Gradients |
Framework | pytorch |