May 7, 2019

3322 words 16 mins read

Paper Group AWR 64

Is the deconvolution layer the same as a convolutional layer?. 3D Bounding Box Estimation Using Deep Learning and Geometry. The Language of Generalization. A Fast Ellipse Detector Using Projective Invariant Pruning. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. Deep Joint Rain Detection and Removal from a Single Im …

Is the deconvolution layer the same as a convolutional layer?


Title	Is the deconvolution layer the same as a convolutional layer?
Authors	Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang
Abstract	In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space.
Tasks
Published	2016-09-22
URL	http://arxiv.org/abs/1609.07009v1
PDF	http://arxiv.org/pdf/1609.07009v1.pdf
PWC	https://paperswithcode.com/paper/is-the-deconvolution-layer-the-same-as-a
Repo	https://github.com/zsdonghao/SRGAN
Framework	tf

3D Bounding Box Estimation Using Deep Learning and Geometry


Title	3D Bounding Box Estimation Using Deep Learning and Geometry
Authors	Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka
Abstract	We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.
Tasks	3D Object Detection, Object Detection, Pose Estimation, Semantic Segmentation, Viewpoint Estimation
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00496v2
PDF	http://arxiv.org/pdf/1612.00496v2.pdf
PWC	https://paperswithcode.com/paper/3d-bounding-box-estimation-using-deep
Repo	https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image
Framework	none

The Language of Generalization


Title	The Language of Generalization
Authors	Michael Henry Tessler, Noah D. Goodman
Abstract	Language provides simple ways of communicating generalizable knowledge to each other (e.g., “Birds fly”, “John hikes”, “Fire makes smoke”). Though found in every language and emerging early in development, the language of generalization is philosophically puzzling and has resisted precise formalization. Here, we propose the first formal account of generalizations conveyed with language that makes quantitative predictions about human understanding. We test our model in three diverse domains: generalizations about categories (generic language), events (habitual language), and causes (causal language). The model explains the gradience in human endorsement through the interplay between a simple truth-conditional semantic theory and diverse beliefs about properties, formalized in a probabilistic model of language understanding. This work opens the door to understanding precisely how abstract knowledge is learned from language.
Tasks
Published	2016-08-09
URL	http://arxiv.org/abs/1608.02926v4
PDF	http://arxiv.org/pdf/1608.02926v4.pdf
PWC	https://paperswithcode.com/paper/the-language-of-generalization
Repo	https://github.com/mhtess/genlang-paper
Framework	none

A Fast Ellipse Detector Using Projective Invariant Pruning


Title	A Fast Ellipse Detector Using Projective Invariant Pruning
Authors	Qi Jia, Xin Fan, Zhongxuan Luo, Lianbo Song, Tie Qiu
Abstract	Detecting elliptical objects from an image is a central task in robot navigation and industrial diagnosis where the detection time is always a critical issue. Existing methods are hardly applicable to these real-time scenarios of limited hardware resource due to the huge number of fragment candidates (edges or arcs) for fitting ellipse equations. In this paper, we present a fast algorithm detecting ellipses with high accuracy. The algorithm leverage a newly developed projective invariant to significantly prune the undesired candidates and to pick out elliptical ones. The invariant is able to reflect the intrinsic geometry of a planar curve, giving the value of -1 on any three collinear points and +1 for any six points on an ellipse. Thus, we apply the pruning and picking by simply comparing these binary values. Moreover, the calculation of the invariant only involves the determinant of a 3*3 matrix. Extensive experiments on three challenging data sets with 650 images demonstrate that our detector runs 20%-50% faster than the state-of-the-art algorithms with the comparable or higher precision.
Tasks	Robot Navigation
Published	2016-08-26
URL	http://arxiv.org/abs/1608.07470v1
PDF	http://arxiv.org/pdf/1608.07470v1.pdf
PWC	https://paperswithcode.com/paper/a-fast-ellipse-detector-using-projective
Repo	https://github.com/TomsonBoylett/Real-Time-Ellipse-Detection
Framework	none

Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning


Title	Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
Authors	William Lotter, Gabriel Kreiman, David Cox
Abstract	While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning - leveraging unlabeled examples to learn about the structure of a domain - remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network (“PredNet”) architecture that is inspired by the concept of “predictive coding” from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.
Tasks	Object Recognition, Video Prediction
Published	2016-05-25
URL	http://arxiv.org/abs/1605.08104v5
PDF	http://arxiv.org/pdf/1605.08104v5.pdf
PWC	https://paperswithcode.com/paper/deep-predictive-coding-networks-for-video
Repo	https://github.com/kunimasa-kawasaki/keras-prednet
Framework	none

Deep Joint Rain Detection and Removal from a Single Image


Title	Deep Joint Rain Detection and Removal from a Single Image
Authors	Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, Shuicheng Yan
Abstract	In this paper, we address a rain removal problem from a single image, even in the presence of heavy rain and rain streak accumulation. Our core ideas lie in the new rain image models and a novel deep learning architecture. We first modify an existing model comprising a rain streak layer and a background layer, by adding a binary map that locates rain streak regions. Second, we create a new model consisting of a component representing rain streak accumulation (where individual streaks cannot be seen, and thus visually similar to mist or fog), and another component representing various shapes and directions of overlapping rain streaks, which usually happen in heavy rain. Based on the first model, we develop a multi-task deep learning architecture that learns the binary rain streak map, the appearance of rain streaks, and the clean background, which is our ultimate output. The additional binary map is critically beneficial, since its loss function can provide additional strong information to the network. To handle rain streak accumulation (again, a phenomenon visually similar to mist or fog) and various shapes and directions of overlapping rain streaks, we propose a recurrent rain detection and removal network that removes rain streaks and clears up the rain accumulation iteratively and progressively. In each recurrence of our method, a new contextualized dilated network is developed to exploit regional contextual information and outputs better representation for rain detection. The evaluation on real images, particularly on heavy rain, shows the effectiveness of our novel models and architecture, outperforming the state-of-the-art methods significantly. Our codes and data sets will be publicly available.
Tasks	Rain Removal
Published	2016-09-25
URL	http://arxiv.org/abs/1609.07769v3
PDF	http://arxiv.org/pdf/1609.07769v3.pdf
PWC	https://paperswithcode.com/paper/deep-joint-rain-detection-and-removal-from-a
Repo	https://github.com/ZhangXinNan/RainDetectionAndRemoval
Framework	none

Automatic Text Scoring Using Neural Networks


Title	Automatic Text Scoring Using Neural Networks
Authors	Dimitrios Alikaniotis, Helen Yannakoudakis, Marek Rei
Abstract	Automated Text Scoring (ATS) provides a cost-effective and consistent alternative to human marking. However, in order to achieve good performance, the predictive features of the system need to be manually engineered by human experts. We introduce a model that forms word representations by learning the extent to which specific words contribute to the text’s score. Using Long-Short Term Memory networks to represent the meaning of texts, we demonstrate that a fully automated framework is able to achieve excellent results over similar approaches. In an attempt to make our results more interpretable, and inspired by recent advances in visualizing neural networks, we introduce a novel method for identifying the regions of the text that the model has found more discriminative.
Tasks
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04289v2
PDF	http://arxiv.org/pdf/1606.04289v2.pdf
PWC	https://paperswithcode.com/paper/automatic-text-scoring-using-neural-networks
Repo	https://github.com/mankadronit/Automated-Essay--Scoring
Framework	tf

Variational Bayes In Private Settings (VIPS)


Title	Variational Bayes In Private Settings (VIPS)
Authors	Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling
Abstract	Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilistic models, called the Conjugate Exponential (CE) family. We observe that we can straightforwardly privatise VB’s approximate posterior distributions for models in the CE family, by perturbing the expected sufficient statistics of the complete-data likelihood. For a broadly-used class of non-CE models, those with binomial likelihoods, we show how to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible, using the Polya-Gamma data augmentation scheme. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) an improved composition method for differential privacy, called the moments accountant, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.
Tasks	Bayesian Inference, Data Augmentation
Published	2016-11-01
URL	http://arxiv.org/abs/1611.00340v5
PDF	http://arxiv.org/pdf/1611.00340v5.pdf
PWC	https://paperswithcode.com/paper/variational-bayes-in-private-settings-vips
Repo	https://github.com/mijungi/vips_code
Framework	none

Feynman Machine: The Universal Dynamical Systems Computer


Title	Feynman Machine: The Universal Dynamical Systems Computer
Authors	Eric Laukien, Richard Crowder, Fergal Byrne
Abstract	Efforts at understanding the computational processes in the brain have met with limited success, despite their importance and potential uses in building intelligent machines. We propose a simple new model which draws on recent findings in Neuroscience and the Applied Mathematics of interacting Dynamical Systems. The Feynman Machine is a Universal Computer for Dynamical Systems, analogous to the Turing Machine for symbolic computing, but with several important differences. We demonstrate that networks and hierarchies of simple interacting Dynamical Systems, each adaptively learning to forecast its evolution, are capable of automatically building sensorimotor models of the external and internal world. We identify such networks in mammalian neocortex, and show how existing theories of cortical computation combine with our model to explain the power and flexibility of mammalian intelligence. These findings lead directly to new architectures for machine intelligence. A suite of software implementations has been built based on these principles, and applied to a number of spatiotemporal learning tasks.
Tasks
Published	2016-09-13
URL	http://arxiv.org/abs/1609.03971v1
PDF	http://arxiv.org/pdf/1609.03971v1.pdf
PWC	https://paperswithcode.com/paper/feynman-machine-the-universal-dynamical
Repo	https://github.com/ogmacorp/OgmaNeo
Framework	none

Clearing the Skies: A deep network architecture for single-image rain removal


Title	Clearing the Skies: A deep network architecture for single-image rain removal
Authors	Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, John Paisley
Abstract	We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the mapping relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to other common strategies that increase depth or breadth of the network, we use image processing domain knowledge to modify the objective function and improve deraining with a modestly-sized CNN. Specifically, we train our DerainNet on the detail (high-pass) layer rather than in the image domain. Though DerainNet is trained on synthetic data, we find that the learned network translates very effectively to real-world images for testing. Moreover, we augment the CNN framework with image enhancement to improve the visual results. Compared with state-of-the-art single image de-raining methods, our method has improved rain removal and much faster computation time after network training.
Tasks	Image Enhancement, Rain Removal
Published	2016-09-07
URL	http://arxiv.org/abs/1609.02087v2
PDF	http://arxiv.org/pdf/1609.02087v2.pdf
PWC	https://paperswithcode.com/paper/clearing-the-skies-a-deep-network
Repo	https://github.com/jinnovation/rainy-image-dataset
Framework	tf

Reset-free Trial-and-Error Learning for Robot Damage Recovery


Title	Reset-free Trial-and-Error Learning for Robot Damage Recovery
Authors	Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret
Abstract	The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called “Reset-free Trial-and-Error” (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.
Tasks	Legged Robots
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04213v4
PDF	http://arxiv.org/pdf/1610.04213v4.pdf
PWC	https://paperswithcode.com/paper/reset-free-trial-and-error-learning-for-robot
Repo	https://github.com/resibots/chatzilygeroudis_2018_rte
Framework	tf

Learning Global Features for Coreference Resolution


Title	Learning Global Features for Coreference Resolution
Authors	Sam Wiseman, Alexander M. Rush, Stuart M. Shieber
Abstract	There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters. Yet, state-of-the-art performance can be achieved with systems treating each mention prediction independently, which we attribute to the inherent difficulty of crafting informative cluster-level features. We instead propose to use recurrent neural networks (RNNs) to learn latent, global representations of entity clusters directly from their mentions. We show that such representations are especially useful for the prediction of pronominal mentions, and can be incorporated into an end-to-end coreference system that outperforms the state of the art without requiring any additional search.
Tasks	Coreference Resolution
Published	2016-04-11
URL	http://arxiv.org/abs/1604.03035v1
PDF	http://arxiv.org/pdf/1604.03035v1.pdf
PWC	https://paperswithcode.com/paper/learning-global-features-for-coreference
Repo	https://github.com/swiseman/nn_coref
Framework	torch

Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging


Title	Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Authors	Jiren Jin, Hideki Nakayama
Abstract	Automatic image annotation has been an important research topic in facilitating large scale image management and retrieval. Existing methods focus on learning image-tag correlation or correlation between tags to improve annotation accuracy. However, most of these methods evaluate their performance using top-k retrieval performance, where k is fixed. Although such setting gives convenience for comparing different methods, it is not the natural way that humans annotate images. The number of annotated tags should depend on image contents. Inspired by the recent progress in machine translation and image captioning, we propose a novel Recurrent Image Annotator (RIA) model that forms image annotation task as a sequence generation problem so that RIA can natively predict the proper length of tags according to image contents. We evaluate the proposed model on various image annotation datasets. In addition to comparing our model with existing methods using the conventional top-k evaluation measures, we also provide our model as a high quality baseline for the arbitrary length image tagging task. Moreover, the results of our experiments show that the order of tags in training phase has a great impact on the final annotation performance.
Tasks	Image Captioning, Machine Translation
Published	2016-04-18
URL	http://arxiv.org/abs/1604.05225v3
PDF	http://arxiv.org/pdf/1604.05225v3.pdf
PWC	https://paperswithcode.com/paper/annotation-order-matters-recurrent-image
Repo	https://github.com/jinjiren/recurrent-image-annotator-web-demo
Framework	none

Globally Consistent Multi-People Tracking using Motion Patterns


Title	Globally Consistent Multi-People Tracking using Motion Patterns
Authors	Andrii Maksai, Xinchao Wang, Francois Fleuret, Pascal Fua
Abstract	Many state-of-the-art approaches to people tracking rely on detecting them in each frame independently, grouping detections into short but reliable trajectory segments, and then further grouping them into full trajectories. This grouping typically relies on imposing local smoothness constraints but almost never on enforcing more global constraints on the trajectories. In this paper, we propose an approach to imposing global consistency by first inferring behavioral patterns from the ground truth and then using them to guide the tracking algorithm. When used in conjunction with several state-of-the-art algorithms, this further increases their already good performance. Furthermore, we propose an unsupervised scheme that yields almost similar improvements without the need for ground truth.
Tasks
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00604v1
PDF	http://arxiv.org/pdf/1612.00604v1.pdf
PWC	https://paperswithcode.com/paper/globally-consistent-multi-people-tracking
Repo	https://github.com/maksay/ptrack_cpp
Framework	none

Top-N Recommendation with Novel Rank Approximation


Title	Top-N Recommendation with Novel Rank Approximation
Authors	Zhao Kang, Qiang Cheng
Abstract	The importance of accurate recommender systems has been widely recognized by academia and industry. However, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nuclear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than convex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-$N$ recommendation accuracy substantially.
Tasks	Recommendation Systems
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07783v2
PDF	http://arxiv.org/pdf/1602.07783v2.pdf
PWC	https://paperswithcode.com/paper/top-n-recommendation-with-novel-rank
Repo	https://github.com/sckangz/SDM16
Framework	none