Paper Group AWR 95
Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words. Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation. Sequence-to-Sequence Models …
Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
Title | Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification |
Authors | Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang |
Abstract | Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved. The whole deep neural network is trained end-to-end with only image-level annotations, thus requires no additional efforts on image annotations. Extensive evaluations on 3 public datasets with different types of labels show that our approach significantly outperforms state-of-the-arts and has strong generalization capability. Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance. |
Tasks | Image Classification |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.05891v2 |
http://arxiv.org/pdf/1702.05891v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-spatial-regularization-with-image |
Repo | https://github.com/Enjia/Spatial-Regularization-Network-in-Tensorflow |
Framework | tf |
Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words
Title | Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words |
Authors | Lea Frermann, Michael C. Frank |
Abstract | The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale multi-modal corpus of prosody-text aligned child directed speech. Our corpus contains automatically extracted word-level prosodic features, and we investigate the utility of this information as predictors of age of acquisition. We show that prosody features boost predictive power in a regularized regression, and demonstrate their utility in the context of a multi-modal factorized language models trained and tested on child-directed speech. |
Tasks | |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09443v1 |
http://arxiv.org/pdf/1709.09443v1.pdf | |
PWC | https://paperswithcode.com/paper/prosodic-features-from-large-corpora-of-child |
Repo | https://github.com/ColiLea/prosodyAOA |
Framework | torch |
Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation
Title | Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation |
Authors | Qihang Yu, Lingxi Xie, Yan Wang, Yuyin Zhou, Elliot K. Fishman, Alan L. Yuille |
Abstract | We aim at segmenting small organs (e.g., the pancreas) from abdominal CT scans. As the target often occupies a relatively small region in the input image, deep neural networks can be easily confused by the complex and variable background. To alleviate this, researchers proposed a coarse-to-fine approach, which used prediction from the first (coarse) stage to indicate a smaller input region for the second (fine) stage. Despite its effectiveness, this algorithm dealt with two stages individually, which lacked optimizing a global energy function, and limited its ability to incorporate multi-stage visual cues. Missing contextual information led to unsatisfying convergence in iterations, and that the fine stage sometimes produced even lower segmentation accuracy than the coarse stage. This paper presents a Recurrent Saliency Transformation Network. The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration. This brings us two-fold benefits. In training, it allows joint optimization over the deep networks dealing with different input scales. In testing, it propagates multi-stage visual information throughout iterations to improve segmentation accuracy. Experiments in the NIH pancreas segmentation dataset demonstrate the state-of-the-art accuracy, which outperforms the previous best by an average of over 2%. Much higher accuracies are also reported on several small organs in a larger dataset collected by ourselves. In addition, our approach enjoys better convergence properties, making it more efficient and reliable in practice. |
Tasks | Pancreas Segmentation |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04518v4 |
http://arxiv.org/pdf/1709.04518v4.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-saliency-transformation-network |
Repo | https://github.com/twni2016/OrganSegRSTN_PyTorch |
Framework | pytorch |
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Title | Sequence-to-Sequence Models Can Directly Translate Foreign Speech |
Authors | Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen |
Abstract | We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it require supervision from the ground truth source language transcription during training. We apply a slightly modified sequence-to-sequence with attention architecture that has previously been used for speech recognition and show that it can be repurposed for this more complex task, illustrating the power of attention-based models. A single model trained end-to-end obtains state-of-the-art performance on the Fisher Callhome Spanish-English speech translation task, outperforming a cascade of independently trained sequence-to-sequence speech recognition and machine translation models by 1.8 BLEU points on the Fisher test set. In addition, we find that making use of the training data in both languages by multi-task training sequence-to-sequence speech translation and recognition models with a shared encoder network can improve performance by a further 1.4 BLEU points. |
Tasks | Machine Translation, Sequence-To-Sequence Speech Recognition, Speech Recognition |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08581v2 |
http://arxiv.org/pdf/1703.08581v2.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-sequence-models-can-directly |
Repo | https://github.com/colaprograms/speechify |
Framework | tf |
End-to-end Video-level Representation Learning for Action Recognition
Title | End-to-end Video-level Representation Learning for Action Recognition |
Authors | Jiagang Zhu, Wei Zou, Zheng Zhu |
Abstract | From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on. In this paper, we build upon two-stream ConvNets and propose Deep networks with Temporal Pyramid Pooling (DTPP), an end-to-end video-level representation learning approach, to address these problems. Specifically, at first, RGB images and optical flow stacks are sparsely sampled across the whole video. Then a temporal pyramid pooling layer is used to aggregate the frame-level features which consist of spatial and temporal cues. Lastly, the trained model has compact video-level representation with multiple temporal scales, which is both global and sequence-aware. Experimental results show that DTPP achieves the state-of-the-art performance on two challenging video action datasets: UCF101 and HMDB51, either by ImageNet pre-training or Kinetics pre-training. |
Tasks | Optical Flow Estimation, Representation Learning, Temporal Action Localization |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04161v7 |
http://arxiv.org/pdf/1711.04161v7.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-video-level-representation |
Repo | https://github.com/zhujiagang/DTPP |
Framework | none |
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
Title | Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking |
Authors | Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, Mounia Lalmas |
Abstract | Machine-learned models are often described as “black boxes”. In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini. |
Tasks | Feature Engineering |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06691v1 |
http://arxiv.org/pdf/1706.06691v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-predictions-of-tree-based |
Repo | https://github.com/katokohaku/featureTweakR |
Framework | none |
R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering
Title | R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering |
Authors | Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang |
Abstract | In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al., 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags considerably behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets. |
Tasks | Information Retrieval, Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1709.00023v2 |
http://arxiv.org/pdf/1709.00023v2.pdf | |
PWC | https://paperswithcode.com/paper/r3-reinforced-reader-ranker-for-open-domain |
Repo | https://github.com/shuohangwang/mprc |
Framework | torch |
Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
Title | Multiplicative Normalizing Flows for Variational Bayesian Neural Networks |
Authors | Christos Louizos, Max Welling |
Abstract | We reinterpret multiplicative noise in neural networks as auxiliary random variables that augment the approximate posterior in a variational setting for Bayesian neural networks. We show that through this interpretation it is both efficient and straightforward to improve the approximation by employing normalizing flows while still allowing for local reparametrizations and a tractable lower bound. In experiments we show that with this new approximation we can significantly improve upon classical mean field for Bayesian neural networks on both predictive accuracy as well as predictive uncertainty. |
Tasks | |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01961v2 |
http://arxiv.org/pdf/1703.01961v2.pdf | |
PWC | https://paperswithcode.com/paper/multiplicative-normalizing-flows-for |
Repo | https://github.com/AMLab-Amsterdam/MNF_VBNN |
Framework | tf |
Recurrent Neural Networks for Semantic Instance Segmentation
Title | Recurrent Neural Networks for Semantic Instance Segmentation |
Authors | Amaia Salvador, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto |
Abstract | We present a recurrent model for semantic instance segmentation that sequentially generates binary masks and their associated class probabilities for every object in an image. Our proposed system is trainable end-to-end from an input image to a sequence of labeled masks and, compared to methods relying on object proposals, does not require post-processing steps on its output. We study the suitability of our recurrent model on three different instance segmentation benchmarks, namely Pascal VOC 2012, CVPPP Plant Leaf Segmentation and Cityscapes. Further, we analyze the object sorting patterns generated by our model and observe that it learns to follow a consistent pattern, which correlates with the activations learned in the encoder part of our network. Source code and models are available at https://imatge-upc.github.io/rsis/ |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2017-12-02 |
URL | http://arxiv.org/abs/1712.00617v4 |
http://arxiv.org/pdf/1712.00617v4.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-for-semantic |
Repo | https://github.com/imatge-upc/rsis |
Framework | pytorch |
Auto-Encoding Sequential Monte Carlo
Title | Auto-Encoding Sequential Monte Carlo |
Authors | Tuan Anh Le, Maximilian Igl, Tom Rainforth, Tom Jin, Frank Wood |
Abstract | We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models. Our approach relies on the efficiency of sequential Monte Carlo (SMC) for performing inference in structured probabilistic models and the flexibility of deep neural networks to model complex conditional probability distributions. We develop additional theoretical insights and introduce a new training procedure which improves both model and proposal learning. We demonstrate that our approach provides a fast, easy-to-implement and scalable means for simultaneous model learning and proposal adaptation in deep generative models. |
Tasks | |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10306v2 |
http://arxiv.org/pdf/1705.10306v2.pdf | |
PWC | https://paperswithcode.com/paper/auto-encoding-sequential-monte-carlo |
Repo | https://github.com/amoretti86/PSVO |
Framework | tf |
Deep Mean-Shift Priors for Image Restoration
Title | Deep Mean-Shift Priors for Image Restoration |
Authors | Siavash Arjomand Bigdeli, Meiguang Jin, Paolo Favaro, Matthias Zwicker |
Abstract | In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In addition, we learn the mean-shift vector field using denoising autoencoders, and use it in a gradient descent approach to perform Bayes risk minimization. We demonstrate competitive results for noise-blind deblurring, super-resolution, and demosaicing. |
Tasks | Deblurring, Demosaicking, Denoising, Image Restoration, Image Super-Resolution, Super-Resolution |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03749v2 |
http://arxiv.org/pdf/1709.03749v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-mean-shift-priors-for-image-restoration |
Repo | https://github.com/siavashbigdeli/DMSP |
Framework | tf |
Modeling Relational Data with Graph Convolutional Networks
Title | Modeling Relational Data with Graph Convolutional Networks |
Authors | Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling |
Abstract | Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to deal with the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved by enriching them with an encoder model to accumulate evidence over multiple inference steps in the relational graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline. |
Tasks | Graph Classification, Information Retrieval, Knowledge Base Completion, Knowledge Graphs, Link Prediction, Node Classification |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.06103v4 |
http://arxiv.org/pdf/1703.06103v4.pdf | |
PWC | https://paperswithcode.com/paper/modeling-relational-data-with-graph |
Repo | https://github.com/tkipf/gae |
Framework | tf |
TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization
Title | TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization |
Authors | Dominik Marek Loroch, Norbert Wehn, Franz-Josef Pfreundt, Janis Keuper |
Abstract | Recent research implies that training and inference of deep neural networks (DNN) can be computed with low precision numerical representations of the training/test data, weights and gradients without a general loss in accuracy. The benefit of such compact representations is twofold: they allow a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. Several quantization methods have been proposed to map the original 32-bit floating point problem to low-bit representations. While most related publications validate the proposed approach on a single DNN topology, it appears to be evident, that the optimal choice of the quantization method and number of coding bits is topology dependent. To this end, there is no general theory available, which would allow users to derive the optimal quantization during the design of a DNN topology. In this paper, we present a quantization tool box for the TensorFlow framework. TensorQuant allows a transparent quantization simulation of existing DNN topologies during training and inference. TensorQuant supports generic quantization methods and allows experimental evaluation of the impact of the quantization on single layers as well as on the full topology. In a first series of experiments with TensorQuant, we show an analysis of fix-point quantizations of popular CNN topologies. |
Tasks | Quantization |
Published | 2017-10-13 |
URL | http://arxiv.org/abs/1710.05758v1 |
http://arxiv.org/pdf/1710.05758v1.pdf | |
PWC | https://paperswithcode.com/paper/tensorquant-a-simulation-toolbox-for-deep |
Repo | https://github.com/Spiritator/DNN-fault-simulator |
Framework | tf |
NeuroNER: an easy-to-use program for named-entity recognition based on neural networks
Title | NeuroNER: an easy-to-use program for named-entity recognition based on neural networks |
Authors | Franck Dernoncourt, Ji Young Lee, Peter Szolovits |
Abstract | Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this paper, we present NeuroNER, an easy-to-use named-entity recognition tool based on ANNs. Users can annotate entities using a graphical web-based user interface (BRAT): the annotations are then used to train an ANN, which in turn predict entities’ locations and categories in new texts. NeuroNER makes this annotation-training-prediction flow smooth and accessible to anyone. |
Tasks | Named Entity Recognition |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05487v1 |
http://arxiv.org/pdf/1705.05487v1.pdf | |
PWC | https://paperswithcode.com/paper/neuroner-an-easy-to-use-program-for-named |
Repo | https://github.com/Franck-Dernoncourt/NeuroNER |
Framework | tf |
Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach
Title | Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach |
Authors | Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Petri Lehenkari, Simo Saarakkala |
Abstract | Knee osteoarthritis (OA) is the most common musculoskeletal disorder. OA diagnosis is currently conducted by assessing symptoms and evaluating plain radiographs, but this process suffers from subjectivity. In this study, we present a new transparent computer-aided diagnosis method based on the Deep Siamese Convolutional Neural Network to automatically score knee OA severity according to the Kellgren-Lawrence grading scale. We trained our method using the data solely from the Multicenter Osteoarthritis Study and validated it on randomly selected 3,000 subjects (5,960 knees) from Osteoarthritis Initiative dataset. Our method yielded a quadratic Kappa coefficient of 0.83 and average multiclass accuracy of 66.71% compared to the annotations given by a committee of clinical experts. Here, we also report a radiological OA diagnosis area under the ROC curve of 0.93. We also present attention maps – given as a class probability distribution – highlighting the radiological features affecting the network decision. This information makes the decision process transparent for the practitioner, which builds better trust toward automatic methods. We believe that our model is useful for clinical decision making and for OA research; therefore, we openly release our training codes and the data set created in this study. |
Tasks | Decision Making |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10589v1 |
http://arxiv.org/pdf/1710.10589v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-knee-osteoarthritis-diagnosis-from |
Repo | https://github.com/lext/DeepKnee |
Framework | pytorch |