July 29, 2019

3178 words 15 mins read

Paper Group AWR 95

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words. Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation. Sequence-to-Sequence Models …

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification


Title	Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
Authors	Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
Abstract	Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved. The whole deep neural network is trained end-to-end with only image-level annotations, thus requires no additional efforts on image annotations. Extensive evaluations on 3 public datasets with different types of labels show that our approach significantly outperforms state-of-the-arts and has strong generalization capability. Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.
Tasks	Image Classification
Published	2017-02-20
URL	http://arxiv.org/abs/1702.05891v2
PDF	http://arxiv.org/pdf/1702.05891v2.pdf
PWC	https://paperswithcode.com/paper/learning-spatial-regularization-with-image
Repo	https://github.com/Enjia/Spatial-Regularization-Network-in-Tensorflow
Framework	tf

Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words


Title	Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words
Authors	Lea Frermann, Michael C. Frank
Abstract	The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale multi-modal corpus of prosody-text aligned child directed speech. Our corpus contains automatically extracted word-level prosodic features, and we investigate the utility of this information as predictors of age of acquisition. We show that prosody features boost predictive power in a regularized regression, and demonstrate their utility in the context of a multi-modal factorized language models trained and tested on child-directed speech.
Tasks
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09443v1
PDF	http://arxiv.org/pdf/1709.09443v1.pdf
PWC	https://paperswithcode.com/paper/prosodic-features-from-large-corpora-of-child
Repo	https://github.com/ColiLea/prosodyAOA
Framework	torch

Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation


Title	Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
Authors	Qihang Yu, Lingxi Xie, Yan Wang, Yuyin Zhou, Elliot K. Fishman, Alan L. Yuille
Abstract	We aim at segmenting small organs (e.g., the pancreas) from abdominal CT scans. As the target often occupies a relatively small region in the input image, deep neural networks can be easily confused by the complex and variable background. To alleviate this, researchers proposed a coarse-to-fine approach, which used prediction from the first (coarse) stage to indicate a smaller input region for the second (fine) stage. Despite its effectiveness, this algorithm dealt with two stages individually, which lacked optimizing a global energy function, and limited its ability to incorporate multi-stage visual cues. Missing contextual information led to unsatisfying convergence in iterations, and that the fine stage sometimes produced even lower segmentation accuracy than the coarse stage. This paper presents a Recurrent Saliency Transformation Network. The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration. This brings us two-fold benefits. In training, it allows joint optimization over the deep networks dealing with different input scales. In testing, it propagates multi-stage visual information throughout iterations to improve segmentation accuracy. Experiments in the NIH pancreas segmentation dataset demonstrate the state-of-the-art accuracy, which outperforms the previous best by an average of over 2%. Much higher accuracies are also reported on several small organs in a larger dataset collected by ourselves. In addition, our approach enjoys better convergence properties, making it more efficient and reliable in practice.
Tasks	Pancreas Segmentation
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04518v4
PDF	http://arxiv.org/pdf/1709.04518v4.pdf
PWC	https://paperswithcode.com/paper/recurrent-saliency-transformation-network
Repo	https://github.com/twni2016/OrganSegRSTN_PyTorch
Framework	pytorch

Sequence-to-Sequence Models Can Directly Translate Foreign Speech


Title	Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Authors	Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
Abstract	We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it require supervision from the ground truth source language transcription during training. We apply a slightly modified sequence-to-sequence with attention architecture that has previously been used for speech recognition and show that it can be repurposed for this more complex task, illustrating the power of attention-based models. A single model trained end-to-end obtains state-of-the-art performance on the Fisher Callhome Spanish-English speech translation task, outperforming a cascade of independently trained sequence-to-sequence speech recognition and machine translation models by 1.8 BLEU points on the Fisher test set. In addition, we find that making use of the training data in both languages by multi-task training sequence-to-sequence speech translation and recognition models with a shared encoder network can improve performance by a further 1.4 BLEU points.
Tasks	Machine Translation, Sequence-To-Sequence Speech Recognition, Speech Recognition
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08581v2
PDF	http://arxiv.org/pdf/1703.08581v2.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-models-can-directly
Repo	https://github.com/colaprograms/speechify
Framework	tf

End-to-end Video-level Representation Learning for Action Recognition


Title	End-to-end Video-level Representation Learning for Action Recognition
Authors	Jiagang Zhu, Wei Zou, Zheng Zhu
Abstract	From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on. In this paper, we build upon two-stream ConvNets and propose Deep networks with Temporal Pyramid Pooling (DTPP), an end-to-end video-level representation learning approach, to address these problems. Specifically, at first, RGB images and optical flow stacks are sparsely sampled across the whole video. Then a temporal pyramid pooling layer is used to aggregate the frame-level features which consist of spatial and temporal cues. Lastly, the trained model has compact video-level representation with multiple temporal scales, which is both global and sequence-aware. Experimental results show that DTPP achieves the state-of-the-art performance on two challenging video action datasets: UCF101 and HMDB51, either by ImageNet pre-training or Kinetics pre-training.
Tasks	Optical Flow Estimation, Representation Learning, Temporal Action Localization
Published	2017-11-11
URL	http://arxiv.org/abs/1711.04161v7
PDF	http://arxiv.org/pdf/1711.04161v7.pdf
PWC	https://paperswithcode.com/paper/end-to-end-video-level-representation
Repo	https://github.com/zhujiagang/DTPP
Framework	none

Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking


Title	Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
Authors	Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, Mounia Lalmas
Abstract	Machine-learned models are often described as “black boxes”. In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.
Tasks	Feature Engineering
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06691v1
PDF	http://arxiv.org/pdf/1706.06691v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-predictions-of-tree-based
Repo	https://github.com/katokohaku/featureTweakR
Framework	none

R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering


Title	R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering
Authors	Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang
Abstract	In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al., 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags considerably behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.
Tasks	Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2017-08-31
URL	http://arxiv.org/abs/1709.00023v2
PDF	http://arxiv.org/pdf/1709.00023v2.pdf
PWC	https://paperswithcode.com/paper/r3-reinforced-reader-ranker-for-open-domain
Repo	https://github.com/shuohangwang/mprc
Framework	torch

Multiplicative Normalizing Flows for Variational Bayesian Neural Networks


Title	Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
Authors	Christos Louizos, Max Welling
Abstract	We reinterpret multiplicative noise in neural networks as auxiliary random variables that augment the approximate posterior in a variational setting for Bayesian neural networks. We show that through this interpretation it is both efficient and straightforward to improve the approximation by employing normalizing flows while still allowing for local reparametrizations and a tractable lower bound. In experiments we show that with this new approximation we can significantly improve upon classical mean field for Bayesian neural networks on both predictive accuracy as well as predictive uncertainty.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01961v2
PDF	http://arxiv.org/pdf/1703.01961v2.pdf
PWC	https://paperswithcode.com/paper/multiplicative-normalizing-flows-for
Repo	https://github.com/AMLab-Amsterdam/MNF_VBNN
Framework	tf

Recurrent Neural Networks for Semantic Instance Segmentation


Title	Recurrent Neural Networks for Semantic Instance Segmentation
Authors	Amaia Salvador, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto
Abstract	We present a recurrent model for semantic instance segmentation that sequentially generates binary masks and their associated class probabilities for every object in an image. Our proposed system is trainable end-to-end from an input image to a sequence of labeled masks and, compared to methods relying on object proposals, does not require post-processing steps on its output. We study the suitability of our recurrent model on three different instance segmentation benchmarks, namely Pascal VOC 2012, CVPPP Plant Leaf Segmentation and Cityscapes. Further, we analyze the object sorting patterns generated by our model and observe that it learns to follow a consistent pattern, which correlates with the activations learned in the encoder part of our network. Source code and models are available at https://imatge-upc.github.io/rsis/
Tasks	Instance Segmentation, Semantic Segmentation
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00617v4
PDF	http://arxiv.org/pdf/1712.00617v4.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-for-semantic
Repo	https://github.com/imatge-upc/rsis
Framework	pytorch

Auto-Encoding Sequential Monte Carlo


Title	Auto-Encoding Sequential Monte Carlo
Authors	Tuan Anh Le, Maximilian Igl, Tom Rainforth, Tom Jin, Frank Wood
Abstract	We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models. Our approach relies on the efficiency of sequential Monte Carlo (SMC) for performing inference in structured probabilistic models and the flexibility of deep neural networks to model complex conditional probability distributions. We develop additional theoretical insights and introduce a new training procedure which improves both model and proposal learning. We demonstrate that our approach provides a fast, easy-to-implement and scalable means for simultaneous model learning and proposal adaptation in deep generative models.
Tasks
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10306v2
PDF	http://arxiv.org/pdf/1705.10306v2.pdf
PWC	https://paperswithcode.com/paper/auto-encoding-sequential-monte-carlo
Repo	https://github.com/amoretti86/PSVO
Framework	tf

Deep Mean-Shift Priors for Image Restoration


Title	Deep Mean-Shift Priors for Image Restoration
Authors	Siavash Arjomand Bigdeli, Meiguang Jin, Paolo Favaro, Matthias Zwicker
Abstract	In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In addition, we learn the mean-shift vector field using denoising autoencoders, and use it in a gradient descent approach to perform Bayes risk minimization. We demonstrate competitive results for noise-blind deblurring, super-resolution, and demosaicing.
Tasks	Deblurring, Demosaicking, Denoising, Image Restoration, Image Super-Resolution, Super-Resolution
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03749v2
PDF	http://arxiv.org/pdf/1709.03749v2.pdf
PWC	https://paperswithcode.com/paper/deep-mean-shift-priors-for-image-restoration
Repo	https://github.com/siavashbigdeli/DMSP
Framework	tf

Modeling Relational Data with Graph Convolutional Networks


Title	Modeling Relational Data with Graph Convolutional Networks
Authors	Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling
Abstract	Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to deal with the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved by enriching them with an encoder model to accumulate evidence over multiple inference steps in the relational graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.
Tasks	Graph Classification, Information Retrieval, Knowledge Base Completion, Knowledge Graphs, Link Prediction, Node Classification
Published	2017-03-17
URL	http://arxiv.org/abs/1703.06103v4
PDF	http://arxiv.org/pdf/1703.06103v4.pdf
PWC	https://paperswithcode.com/paper/modeling-relational-data-with-graph
Repo	https://github.com/tkipf/gae
Framework	tf

TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization


Title	TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization
Authors	Dominik Marek Loroch, Norbert Wehn, Franz-Josef Pfreundt, Janis Keuper
Abstract	Recent research implies that training and inference of deep neural networks (DNN) can be computed with low precision numerical representations of the training/test data, weights and gradients without a general loss in accuracy. The benefit of such compact representations is twofold: they allow a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. Several quantization methods have been proposed to map the original 32-bit floating point problem to low-bit representations. While most related publications validate the proposed approach on a single DNN topology, it appears to be evident, that the optimal choice of the quantization method and number of coding bits is topology dependent. To this end, there is no general theory available, which would allow users to derive the optimal quantization during the design of a DNN topology. In this paper, we present a quantization tool box for the TensorFlow framework. TensorQuant allows a transparent quantization simulation of existing DNN topologies during training and inference. TensorQuant supports generic quantization methods and allows experimental evaluation of the impact of the quantization on single layers as well as on the full topology. In a first series of experiments with TensorQuant, we show an analysis of fix-point quantizations of popular CNN topologies.
Tasks	Quantization
Published	2017-10-13
URL	http://arxiv.org/abs/1710.05758v1
PDF	http://arxiv.org/pdf/1710.05758v1.pdf
PWC	https://paperswithcode.com/paper/tensorquant-a-simulation-toolbox-for-deep
Repo	https://github.com/Spiritator/DNN-fault-simulator
Framework	tf

NeuroNER: an easy-to-use program for named-entity recognition based on neural networks


Title	NeuroNER: an easy-to-use program for named-entity recognition based on neural networks
Authors	Franck Dernoncourt, Ji Young Lee, Peter Szolovits
Abstract	Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this paper, we present NeuroNER, an easy-to-use named-entity recognition tool based on ANNs. Users can annotate entities using a graphical web-based user interface (BRAT): the annotations are then used to train an ANN, which in turn predict entities’ locations and categories in new texts. NeuroNER makes this annotation-training-prediction flow smooth and accessible to anyone.
Tasks	Named Entity Recognition
Published	2017-05-16
URL	http://arxiv.org/abs/1705.05487v1
PDF	http://arxiv.org/pdf/1705.05487v1.pdf
PWC	https://paperswithcode.com/paper/neuroner-an-easy-to-use-program-for-named
Repo	https://github.com/Franck-Dernoncourt/NeuroNER
Framework	tf

Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach


Title	Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach
Authors	Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Petri Lehenkari, Simo Saarakkala
Abstract	Knee osteoarthritis (OA) is the most common musculoskeletal disorder. OA diagnosis is currently conducted by assessing symptoms and evaluating plain radiographs, but this process suffers from subjectivity. In this study, we present a new transparent computer-aided diagnosis method based on the Deep Siamese Convolutional Neural Network to automatically score knee OA severity according to the Kellgren-Lawrence grading scale. We trained our method using the data solely from the Multicenter Osteoarthritis Study and validated it on randomly selected 3,000 subjects (5,960 knees) from Osteoarthritis Initiative dataset. Our method yielded a quadratic Kappa coefficient of 0.83 and average multiclass accuracy of 66.71% compared to the annotations given by a committee of clinical experts. Here, we also report a radiological OA diagnosis area under the ROC curve of 0.93. We also present attention maps – given as a class probability distribution – highlighting the radiological features affecting the network decision. This information makes the decision process transparent for the practitioner, which builds better trust toward automatic methods. We believe that our model is useful for clinical decision making and for OA research; therefore, we openly release our training codes and the data set created in this study.
Tasks	Decision Making
Published	2017-10-29
URL	http://arxiv.org/abs/1710.10589v1
PDF	http://arxiv.org/pdf/1710.10589v1.pdf
PWC	https://paperswithcode.com/paper/automatic-knee-osteoarthritis-diagnosis-from
Repo	https://github.com/lext/DeepKnee
Framework	pytorch