Paper Group AWR 64
Is the deconvolution layer the same as a convolutional layer?. 3D Bounding Box Estimation Using Deep Learning and Geometry. The Language of Generalization. A Fast Ellipse Detector Using Projective Invariant Pruning. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. Deep Joint Rain Detection and Removal from a Single Im …
Is the deconvolution layer the same as a convolutional layer?
Title | Is the deconvolution layer the same as a convolutional layer? |
Authors | Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang |
Abstract | In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space. |
Tasks | |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.07009v1 |
http://arxiv.org/pdf/1609.07009v1.pdf | |
PWC | https://paperswithcode.com/paper/is-the-deconvolution-layer-the-same-as-a |
Repo | https://github.com/zsdonghao/SRGAN |
Framework | tf |
3D Bounding Box Estimation Using Deep Learning and Geometry
Title | 3D Bounding Box Estimation Using Deep Learning and Geometry |
Authors | Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka |
Abstract | We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset. |
Tasks | 3D Object Detection, Object Detection, Pose Estimation, Semantic Segmentation, Viewpoint Estimation |
Published | 2016-12-01 |
URL | http://arxiv.org/abs/1612.00496v2 |
http://arxiv.org/pdf/1612.00496v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-bounding-box-estimation-using-deep |
Repo | https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image |
Framework | none |
The Language of Generalization
Title | The Language of Generalization |
Authors | Michael Henry Tessler, Noah D. Goodman |
Abstract | Language provides simple ways of communicating generalizable knowledge to each other (e.g., “Birds fly”, “John hikes”, “Fire makes smoke”). Though found in every language and emerging early in development, the language of generalization is philosophically puzzling and has resisted precise formalization. Here, we propose the first formal account of generalizations conveyed with language that makes quantitative predictions about human understanding. We test our model in three diverse domains: generalizations about categories (generic language), events (habitual language), and causes (causal language). The model explains the gradience in human endorsement through the interplay between a simple truth-conditional semantic theory and diverse beliefs about properties, formalized in a probabilistic model of language understanding. This work opens the door to understanding precisely how abstract knowledge is learned from language. |
Tasks | |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02926v4 |
http://arxiv.org/pdf/1608.02926v4.pdf | |
PWC | https://paperswithcode.com/paper/the-language-of-generalization |
Repo | https://github.com/mhtess/genlang-paper |
Framework | none |
A Fast Ellipse Detector Using Projective Invariant Pruning
Title | A Fast Ellipse Detector Using Projective Invariant Pruning |
Authors | Qi Jia, Xin Fan, Zhongxuan Luo, Lianbo Song, Tie Qiu |
Abstract | Detecting elliptical objects from an image is a central task in robot navigation and industrial diagnosis where the detection time is always a critical issue. Existing methods are hardly applicable to these real-time scenarios of limited hardware resource due to the huge number of fragment candidates (edges or arcs) for fitting ellipse equations. In this paper, we present a fast algorithm detecting ellipses with high accuracy. The algorithm leverage a newly developed projective invariant to significantly prune the undesired candidates and to pick out elliptical ones. The invariant is able to reflect the intrinsic geometry of a planar curve, giving the value of -1 on any three collinear points and +1 for any six points on an ellipse. Thus, we apply the pruning and picking by simply comparing these binary values. Moreover, the calculation of the invariant only involves the determinant of a 3*3 matrix. Extensive experiments on three challenging data sets with 650 images demonstrate that our detector runs 20%-50% faster than the state-of-the-art algorithms with the comparable or higher precision. |
Tasks | Robot Navigation |
Published | 2016-08-26 |
URL | http://arxiv.org/abs/1608.07470v1 |
http://arxiv.org/pdf/1608.07470v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fast-ellipse-detector-using-projective |
Repo | https://github.com/TomsonBoylett/Real-Time-Ellipse-Detection |
Framework | none |
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
Title | Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning |
Authors | William Lotter, Gabriel Kreiman, David Cox |
Abstract | While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning - leveraging unlabeled examples to learn about the structure of a domain - remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network (“PredNet”) architecture that is inspired by the concept of “predictive coding” from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure. |
Tasks | Object Recognition, Video Prediction |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.08104v5 |
http://arxiv.org/pdf/1605.08104v5.pdf | |
PWC | https://paperswithcode.com/paper/deep-predictive-coding-networks-for-video |
Repo | https://github.com/kunimasa-kawasaki/keras-prednet |
Framework | none |
Deep Joint Rain Detection and Removal from a Single Image
Title | Deep Joint Rain Detection and Removal from a Single Image |
Authors | Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, Shuicheng Yan |
Abstract | In this paper, we address a rain removal problem from a single image, even in the presence of heavy rain and rain streak accumulation. Our core ideas lie in the new rain image models and a novel deep learning architecture. We first modify an existing model comprising a rain streak layer and a background layer, by adding a binary map that locates rain streak regions. Second, we create a new model consisting of a component representing rain streak accumulation (where individual streaks cannot be seen, and thus visually similar to mist or fog), and another component representing various shapes and directions of overlapping rain streaks, which usually happen in heavy rain. Based on the first model, we develop a multi-task deep learning architecture that learns the binary rain streak map, the appearance of rain streaks, and the clean background, which is our ultimate output. The additional binary map is critically beneficial, since its loss function can provide additional strong information to the network. To handle rain streak accumulation (again, a phenomenon visually similar to mist or fog) and various shapes and directions of overlapping rain streaks, we propose a recurrent rain detection and removal network that removes rain streaks and clears up the rain accumulation iteratively and progressively. In each recurrence of our method, a new contextualized dilated network is developed to exploit regional contextual information and outputs better representation for rain detection. The evaluation on real images, particularly on heavy rain, shows the effectiveness of our novel models and architecture, outperforming the state-of-the-art methods significantly. Our codes and data sets will be publicly available. |
Tasks | Rain Removal |
Published | 2016-09-25 |
URL | http://arxiv.org/abs/1609.07769v3 |
http://arxiv.org/pdf/1609.07769v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-joint-rain-detection-and-removal-from-a |
Repo | https://github.com/ZhangXinNan/RainDetectionAndRemoval |
Framework | none |
Automatic Text Scoring Using Neural Networks
Title | Automatic Text Scoring Using Neural Networks |
Authors | Dimitrios Alikaniotis, Helen Yannakoudakis, Marek Rei |
Abstract | Automated Text Scoring (ATS) provides a cost-effective and consistent alternative to human marking. However, in order to achieve good performance, the predictive features of the system need to be manually engineered by human experts. We introduce a model that forms word representations by learning the extent to which specific words contribute to the text’s score. Using Long-Short Term Memory networks to represent the meaning of texts, we demonstrate that a fully automated framework is able to achieve excellent results over similar approaches. In an attempt to make our results more interpretable, and inspired by recent advances in visualizing neural networks, we introduce a novel method for identifying the regions of the text that the model has found more discriminative. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04289v2 |
http://arxiv.org/pdf/1606.04289v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-text-scoring-using-neural-networks |
Repo | https://github.com/mankadronit/Automated-Essay--Scoring |
Framework | tf |
Variational Bayes In Private Settings (VIPS)
Title | Variational Bayes In Private Settings (VIPS) |
Authors | Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling |
Abstract | Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilistic models, called the Conjugate Exponential (CE) family. We observe that we can straightforwardly privatise VB’s approximate posterior distributions for models in the CE family, by perturbing the expected sufficient statistics of the complete-data likelihood. For a broadly-used class of non-CE models, those with binomial likelihoods, we show how to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible, using the Polya-Gamma data augmentation scheme. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) an improved composition method for differential privacy, called the moments accountant, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets. |
Tasks | Bayesian Inference, Data Augmentation |
Published | 2016-11-01 |
URL | http://arxiv.org/abs/1611.00340v5 |
http://arxiv.org/pdf/1611.00340v5.pdf | |
PWC | https://paperswithcode.com/paper/variational-bayes-in-private-settings-vips |
Repo | https://github.com/mijungi/vips_code |
Framework | none |
Feynman Machine: The Universal Dynamical Systems Computer
Title | Feynman Machine: The Universal Dynamical Systems Computer |
Authors | Eric Laukien, Richard Crowder, Fergal Byrne |
Abstract | Efforts at understanding the computational processes in the brain have met with limited success, despite their importance and potential uses in building intelligent machines. We propose a simple new model which draws on recent findings in Neuroscience and the Applied Mathematics of interacting Dynamical Systems. The Feynman Machine is a Universal Computer for Dynamical Systems, analogous to the Turing Machine for symbolic computing, but with several important differences. We demonstrate that networks and hierarchies of simple interacting Dynamical Systems, each adaptively learning to forecast its evolution, are capable of automatically building sensorimotor models of the external and internal world. We identify such networks in mammalian neocortex, and show how existing theories of cortical computation combine with our model to explain the power and flexibility of mammalian intelligence. These findings lead directly to new architectures for machine intelligence. A suite of software implementations has been built based on these principles, and applied to a number of spatiotemporal learning tasks. |
Tasks | |
Published | 2016-09-13 |
URL | http://arxiv.org/abs/1609.03971v1 |
http://arxiv.org/pdf/1609.03971v1.pdf | |
PWC | https://paperswithcode.com/paper/feynman-machine-the-universal-dynamical |
Repo | https://github.com/ogmacorp/OgmaNeo |
Framework | none |
Clearing the Skies: A deep network architecture for single-image rain removal
Title | Clearing the Skies: A deep network architecture for single-image rain removal |
Authors | Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, John Paisley |
Abstract | We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the mapping relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to other common strategies that increase depth or breadth of the network, we use image processing domain knowledge to modify the objective function and improve deraining with a modestly-sized CNN. Specifically, we train our DerainNet on the detail (high-pass) layer rather than in the image domain. Though DerainNet is trained on synthetic data, we find that the learned network translates very effectively to real-world images for testing. Moreover, we augment the CNN framework with image enhancement to improve the visual results. Compared with state-of-the-art single image de-raining methods, our method has improved rain removal and much faster computation time after network training. |
Tasks | Image Enhancement, Rain Removal |
Published | 2016-09-07 |
URL | http://arxiv.org/abs/1609.02087v2 |
http://arxiv.org/pdf/1609.02087v2.pdf | |
PWC | https://paperswithcode.com/paper/clearing-the-skies-a-deep-network |
Repo | https://github.com/jinnovation/rainy-image-dataset |
Framework | tf |
Reset-free Trial-and-Error Learning for Robot Damage Recovery
Title | Reset-free Trial-and-Error Learning for Robot Damage Recovery |
Authors | Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret |
Abstract | The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called “Reset-free Trial-and-Error” (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention. |
Tasks | Legged Robots |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.04213v4 |
http://arxiv.org/pdf/1610.04213v4.pdf | |
PWC | https://paperswithcode.com/paper/reset-free-trial-and-error-learning-for-robot |
Repo | https://github.com/resibots/chatzilygeroudis_2018_rte |
Framework | tf |
Learning Global Features for Coreference Resolution
Title | Learning Global Features for Coreference Resolution |
Authors | Sam Wiseman, Alexander M. Rush, Stuart M. Shieber |
Abstract | There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters. Yet, state-of-the-art performance can be achieved with systems treating each mention prediction independently, which we attribute to the inherent difficulty of crafting informative cluster-level features. We instead propose to use recurrent neural networks (RNNs) to learn latent, global representations of entity clusters directly from their mentions. We show that such representations are especially useful for the prediction of pronominal mentions, and can be incorporated into an end-to-end coreference system that outperforms the state of the art without requiring any additional search. |
Tasks | Coreference Resolution |
Published | 2016-04-11 |
URL | http://arxiv.org/abs/1604.03035v1 |
http://arxiv.org/pdf/1604.03035v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-global-features-for-coreference |
Repo | https://github.com/swiseman/nn_coref |
Framework | torch |
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Title | Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging |
Authors | Jiren Jin, Hideki Nakayama |
Abstract | Automatic image annotation has been an important research topic in facilitating large scale image management and retrieval. Existing methods focus on learning image-tag correlation or correlation between tags to improve annotation accuracy. However, most of these methods evaluate their performance using top-k retrieval performance, where k is fixed. Although such setting gives convenience for comparing different methods, it is not the natural way that humans annotate images. The number of annotated tags should depend on image contents. Inspired by the recent progress in machine translation and image captioning, we propose a novel Recurrent Image Annotator (RIA) model that forms image annotation task as a sequence generation problem so that RIA can natively predict the proper length of tags according to image contents. We evaluate the proposed model on various image annotation datasets. In addition to comparing our model with existing methods using the conventional top-k evaluation measures, we also provide our model as a high quality baseline for the arbitrary length image tagging task. Moreover, the results of our experiments show that the order of tags in training phase has a great impact on the final annotation performance. |
Tasks | Image Captioning, Machine Translation |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05225v3 |
http://arxiv.org/pdf/1604.05225v3.pdf | |
PWC | https://paperswithcode.com/paper/annotation-order-matters-recurrent-image |
Repo | https://github.com/jinjiren/recurrent-image-annotator-web-demo |
Framework | none |
Globally Consistent Multi-People Tracking using Motion Patterns
Title | Globally Consistent Multi-People Tracking using Motion Patterns |
Authors | Andrii Maksai, Xinchao Wang, Francois Fleuret, Pascal Fua |
Abstract | Many state-of-the-art approaches to people tracking rely on detecting them in each frame independently, grouping detections into short but reliable trajectory segments, and then further grouping them into full trajectories. This grouping typically relies on imposing local smoothness constraints but almost never on enforcing more global constraints on the trajectories. In this paper, we propose an approach to imposing global consistency by first inferring behavioral patterns from the ground truth and then using them to guide the tracking algorithm. When used in conjunction with several state-of-the-art algorithms, this further increases their already good performance. Furthermore, we propose an unsupervised scheme that yields almost similar improvements without the need for ground truth. |
Tasks | |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00604v1 |
http://arxiv.org/pdf/1612.00604v1.pdf | |
PWC | https://paperswithcode.com/paper/globally-consistent-multi-people-tracking |
Repo | https://github.com/maksay/ptrack_cpp |
Framework | none |
Top-N Recommendation with Novel Rank Approximation
Title | Top-N Recommendation with Novel Rank Approximation |
Authors | Zhao Kang, Qiang Cheng |
Abstract | The importance of accurate recommender systems has been widely recognized by academia and industry. However, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nuclear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than convex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-$N$ recommendation accuracy substantially. |
Tasks | Recommendation Systems |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.07783v2 |
http://arxiv.org/pdf/1602.07783v2.pdf | |
PWC | https://paperswithcode.com/paper/top-n-recommendation-with-novel-rank |
Repo | https://github.com/sckangz/SDM16 |
Framework | none |