May 7, 2019

3295 words 16 mins read

Paper Group AWR 38

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. Fast L1-NMF for Mult …

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks


Title	Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
Authors	Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao
Abstract	Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, we propose a deep cascaded multi-task framework which exploits the inherent correlation between them to boost up their performance. In particular, our framework adopts a cascaded structure with three stages of carefully designed deep convolutional networks that predict face and landmark location in a coarse-to-fine manner. In addition, in the learning process, we propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection. Our method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance.
Tasks	Face Alignment, Face Detection
Published	2016-04-11
URL	http://arxiv.org/abs/1604.02878v1
PDF	http://arxiv.org/pdf/1604.02878v1.pdf
PWC	https://paperswithcode.com/paper/joint-face-detection-and-alignment-using
Repo	https://github.com/wanjinchang/MTCNN_USE_TF_E2E
Framework	tf

Perceptual Losses for Real-Time Style Transfer and Super-Resolution


Title	Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Authors	Justin Johnson, Alexandre Alahi, Li Fei-Fei
Abstract	We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a \emph{per-pixel} loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing \emph{perceptual} loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
Tasks	Image Super-Resolution, Nuclear Segmentation, Style Transfer, Super-Resolution
Published	2016-03-27
URL	http://arxiv.org/abs/1603.08155v1
PDF	http://arxiv.org/pdf/1603.08155v1.pdf
PWC	https://paperswithcode.com/paper/perceptual-losses-for-real-time-style
Repo	https://github.com/DmitryUlyanov/texture_nets
Framework	torch

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning


Title	SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Authors	Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua
Abstract	Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism — a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.
Tasks	Image Captioning
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05594v2
PDF	http://arxiv.org/pdf/1611.05594v2.pdf
PWC	https://paperswithcode.com/paper/sca-cnn-spatial-and-channel-wise-attention-in
Repo	https://github.com/zjuchenlong/sca-cnn
Framework	none


Title	Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
Authors	Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi
Abstract	Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to the task of target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows to better generalize. To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and across scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment. The supplementary video can be accessed at the following link: https://youtu.be/SmBxMDiOrvs.
Tasks	3D Reconstruction, Feature Engineering, Visual Navigation
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05143v1
PDF	http://arxiv.org/pdf/1609.05143v1.pdf
PWC	https://paperswithcode.com/paper/target-driven-visual-navigation-in-indoor
Repo	https://github.com/shamanez/Target-Driven-Visual-Navigation-with-Distributed-PPO
Framework	tf

Fast L1-NMF for Multiple Parametric Model Estimation


Title	Fast L1-NMF for Multiple Parametric Model Estimation
Authors	Mariano Tepper, Guillermo Sapiro
Abstract	In this work we introduce a comprehensive algorithmic pipeline for multiple parametric model estimation. The proposed approach analyzes the information produced by a random sampling algorithm (e.g., RANSAC) from a machine learning/optimization perspective, using a \textit{parameterless} biclustering algorithm based on L1 nonnegative matrix factorization (L1-NMF). The proposed framework exploits consistent patterns that naturally arise during the RANSAC execution, while explicitly avoiding spurious inconsistencies. Contrarily to the main trends in the literature, the proposed technique does not impose non-intersecting parametric models. A new accelerated algorithm to compute L1-NMFs allows to handle medium-sized problems faster while also extending the usability of the algorithm to much larger datasets. This accelerated algorithm has applications in any other context where an L1-NMF is needed, beyond the biclustering approach to parameter estimation here addressed. We accompany the algorithmic presentation with theoretical foundations and numerous and diverse examples.
Tasks
Published	2016-10-18
URL	http://arxiv.org/abs/1610.05712v2
PDF	http://arxiv.org/pdf/1610.05712v2.pdf
PWC	https://paperswithcode.com/paper/fast-l1-nmf-for-multiple-parametric-model
Repo	https://github.com/marianotepper/arse
Framework	none

Bayesian latent structure discovery from multi-neuron recordings


Title	Bayesian latent structure discovery from multi-neuron recordings
Authors	Scott W. Linderman, Ryan P. Adams, Jonathan W. Pillow
Abstract	Neural circuits contain heterogeneous groups of neurons that differ in type, location, connectivity, and basic response properties. However, traditional methods for dimensionality reduction and clustering are ill-suited to recovering the structure underlying the organization of neural circuits. In particular, they do not take advantage of the rich temporal dependencies in multi-neuron recordings and fail to account for the noise in neural spike trains. Here we describe new tools for inferring latent structure from simultaneously recorded spike train data using a hierarchical extension of a multi-neuron point process model commonly known as the generalized linear model (GLM). Our approach combines the GLM with flexible graph-theoretic priors governing the relationship between latent features and neural connectivity patterns. Fully Bayesian inference via P'olya-gamma augmentation of the resulting model allows us to classify neurons and infer latent dimensions of circuit organization from correlated spike trains. We demonstrate the effectiveness of our method with applications to synthetic data and multi-neuron recordings in primate retina, revealing latent patterns of neural types and locations from spike trains alone.
Tasks	Bayesian Inference, Dimensionality Reduction
Published	2016-10-26
URL	http://arxiv.org/abs/1610.08465v1
PDF	http://arxiv.org/pdf/1610.08465v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-latent-structure-discovery-from
Repo	https://github.com/slinderman/pypolyagamma
Framework	none

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression


Title	CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression
Authors	Lukas Cavigelli, Pascal Hager, Luca Benini
Abstract	Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks. We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image - a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.
Tasks	Image Compression
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07233v1
PDF	http://arxiv.org/pdf/1611.07233v1.pdf
PWC	https://paperswithcode.com/paper/cas-cnn-a-deep-convolutional-neural-network
Repo	https://github.com/ShakedDovrat/JpegArtifactRemoval
Framework	none

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts


Title	Building a comprehensive syntactic and semantic corpus of Chinese clinical texts
Authors	Bin He, Bin Dong, Yi Guan, Jinfeng Yang, Zhipeng Jiang, Qiubin Yu, Jianyi Cheng, Chunyan Qu
Abstract	Objective: To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. Materials and methods: An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. Results: The syntactic corpus consists of 138 Chinese clinical documents with 47,424 tokens and 2553 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7695 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. Discussion: The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. Conclusions: In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain.
Tasks	Active Learning
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02091v2
PDF	http://arxiv.org/pdf/1611.02091v2.pdf
PWC	https://paperswithcode.com/paper/building-a-comprehensive-syntactic-and
Repo	https://github.com/WILAB-HIT/Resources
Framework	none

Nested Mini-Batch K-Means


Title	Nested Mini-Batch K-Means
Authors	James Newling, François Fleuret
Abstract	A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 times earlier than the standard mini-batch algorithm.
Tasks
Published	2016-02-09
URL	http://arxiv.org/abs/1602.02934v5
PDF	http://arxiv.org/pdf/1602.02934v5.pdf
PWC	https://paperswithcode.com/paper/nested-mini-batch-k-means
Repo	https://github.com/idiap/eakmeans
Framework	none

Recurrent Orthogonal Networks and Long-Memory Tasks


Title	Recurrent Orthogonal Networks and Long-Memory Tasks
Authors	Mikael Henaff, Arthur Szlam, Yann LeCun
Abstract	Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illuminate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions furthermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.
Tasks
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06662v2
PDF	http://arxiv.org/pdf/1602.06662v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-orthogonal-networks-and-long-memory
Repo	https://github.com/solgaardlab/neurophox
Framework	tf

Towards Automated Melanoma Screening: Proper Computer Vision & Reliable Results


Title	Towards Automated Melanoma Screening: Proper Computer Vision & Reliable Results
Authors	Michel Fornaciali, Micael Carvalho, Flávia Vasques Bittencourt, Sandra Avila, Eduardo Valle
Abstract	In this paper we survey, analyze and criticize current art on automated melanoma screening, reimplementing a baseline technique, and proposing two novel ones. Melanoma, although highly curable when detected early, ends as one of the most dangerous types of cancer, due to delayed diagnosis and treatment. Its incidence is soaring, much faster than the number of trained professionals able to diagnose it. Automated screening appears as an alternative to make the most of those professionals, focusing their time on the patients at risk while safely discharging the other patients. However, the potential of automated melanoma diagnosis is currently unfulfilled, due to the emphasis of current literature on outdated computer vision models. Even more problematic is the irreproducibility of current art. We show how streamlined pipelines based upon current Computer Vision outperform conventional models - a model based on an advanced bags of words reaches an AUC of 84.6%, and a model based on deep neural networks reaches 89.3%, while the baseline (a classical bag of words) stays at 81.2%. We also initiate a dialog to improve reproducibility in our community
Tasks
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04024v3
PDF	http://arxiv.org/pdf/1604.04024v3.pdf
PWC	https://paperswithcode.com/paper/towards-automated-melanoma-screening-proper
Repo	https://github.com/learningtitans/data-depth-design
Framework	tf

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search


Title	A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Authors	Deng Cai
Abstract	Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.
Tasks
Published	2016-12-22
URL	https://arxiv.org/abs/1612.07545v6
PDF	https://arxiv.org/pdf/1612.07545v6.pdf
PWC	https://paperswithcode.com/paper/a-revisit-of-hashing-algorithms-for
Repo	https://github.com/ZJULearning/hashingSearch
Framework	none

Learning Multiagent Communication with Backpropagation


Title	Learning Multiagent Communication with Backpropagation
Authors	Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
Abstract	Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.
Tasks
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07736v2
PDF	http://arxiv.org/pdf/1605.07736v2.pdf
PWC	https://paperswithcode.com/paper/learning-multiagent-communication-with
Repo	https://github.com/MUmarJaved/MultiAgent-Distributed-Reinforcement-Learning
Framework	tf

EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph


Title	EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Authors	Cong Fu, Deng Cai
Abstract	Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The main idea is that \emph{a neighbor of a neighbor is also likely to be a neighbor}, which we refer as \emph{NN-expansion}. These methods construct a $k$-nearest neighbor ($k$NN) graph offline. And at online search stage, these methods find candidate neighbors of a query point in some way (\eg, random selection), and then check the neighbors of these candidate neighbors for closer ones iteratively. Despite some promising results, there are mainly two problems with these approaches: 1) These approaches tend to converge to local optima. 2) Constructing a $k$NN graph is time consuming. We find that these two problems can be nicely solved when we provide a good initialization for NN-expansion. In this paper, we propose EFANNA, an extremely fast approximate nearest neighbor search algorithm based on $k$NN Graph. Efanna nicely combines the advantages of hierarchical structure based methods and nearest-neighbor-graph based methods. Extensive experiments have shown that EFANNA outperforms the state-of-art algorithms both on approximate nearest neighbor search and approximate nearest neighbor graph construction. To the best of our knowledge, EFANNA is the fastest algorithm so far both on approximate nearest neighbor graph construction and approximate nearest neighbor search. A library EFANNA based on this research is released on Github.
Tasks	graph construction
Published	2016-09-23
URL	http://arxiv.org/abs/1609.07228v3
PDF	http://arxiv.org/pdf/1609.07228v3.pdf
PWC	https://paperswithcode.com/paper/efanna-an-extremely-fast-approximate-nearest
Repo	https://github.com/fc731097343/efanna
Framework	none

A Latent Variable Recurrent Neural Network for Discourse Relation Language Models


Title	A Latent Variable Recurrent Neural Network for Discourse Relation Language Models
Authors	Yangfeng Ji, Gholamreza Haffari, Jacob Eisenstein
Abstract	This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective that includes not only discourse relation classification, but also word prediction. As a result, it outperforms state-of-the-art alternatives for two tasks: implicit discourse relation classification in the Penn Discourse Treebank, and dialog act classification in the Switchboard corpus. Furthermore, by marginalizing over latent discourse relations at test time, we obtain a discourse informed language model, which improves over a strong LSTM baseline.
Tasks	Dialog Act Classification, Implicit Discourse Relation Classification, Language Modelling, Relation Classification
Published	2016-03-07
URL	http://arxiv.org/abs/1603.01913v2
PDF	http://arxiv.org/pdf/1603.01913v2.pdf
PWC	https://paperswithcode.com/paper/a-latent-variable-recurrent-neural-network
Repo	https://github.com/jiyfeng/drlm
Framework	none