January 25, 2020

3196 words 16 mins read

Paper Group NAWR 17

Realtime and Accurate 3D Eye Gaze Capture with DCNN-based Iris and Pupil Segmentation. Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters. ACIQ: Analytical Clipping for Integer Quantization of neural networks. Bayesian Hierarchical Dynamic Model for Human Action Recognition. Learning Generalizable Device Placement Algorit …

Realtime and Accurate 3D Eye Gaze Capture with DCNN-based Iris and Pupil Segmentation


Title	Realtime and Accurate 3D Eye Gaze Capture with DCNN-based Iris and Pupil Segmentation
Authors	Zhiyong Wang, Jinxiang Chai, Shihong Xia
Abstract	This paper presents a realtime and accurate method for 3D eye gaze tracking with a monocular RGB camera. Our key idea is to train a deep convolutional neural network(DCNN) that automatically extracts the iris and pupil pixels of each eye from input images. To achieve this goal, we combine the power of Unet\cite{ronneberger2015u-net:} and Squeezenet\cite{iandola2017squeezenet:} to train an efficient convolutional neural network for pixel classification. In addition, we track the 3D eye gaze state in the Maximum A Posteriori (MAP) framework, which sequentially searches for the most likely state of the 3D eye gaze at each frame. When eye blinking occurs, the eye gaze tracker can obtain an inaccurate result. We further extend the convolutional neural network for eye close detection in order to improve the robustness and accuracy of the eye gaze tracker. Our system runs in realtime on desktop PCs and smart phones. We have evaluated our system on live videos and Internet videos, and our results demonstrate that the system is robust and accurate for various genders, races, lighting conditions, poses, shapes and facial expressions. A comparison against Wang et al.[3] shows that our method advances the state of the art in 3D eye tracking using a single RGB camera.
Tasks	Eye Tracking
Published	2019-08-28
URL	https://ieeexplore.ieee.org/document/8818661
PDF	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8818661
PWC	https://paperswithcode.com/paper/realtime-and-accurate-3d-eye-gaze-capture
Repo	https://github.com/1996scarlet/Laser-Eye
Framework	mxnet

Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters


Title	Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters
Authors	Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli
Abstract	How does the uncertainty of the value function propagate when performing temporal difference learning? In this paper, we address this question by proposing a Bayesian framework in which we employ approximate posterior distributions to model the uncertainty of the value function and Wasserstein barycenters to propagate it across state-action pairs. Leveraging on these tools, we present an algorithm, Wasserstein Q-Learning (WQL), starting in the tabular case and then, we show how it can be extended to deal with continuous domains. Furthermore, we prove that, under mild assumptions, a slight variation of WQL enjoys desirable theoretical properties in the tabular setting. Finally, we present an experimental campaign to show the effectiveness of WQL on finite problems, compared to several RL algorithms, some of which are specifically designed for exploration, along with some preliminary results on Atari games.
Tasks	Atari Games, Q-Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8685-propagating-uncertainty-in-reinforcement-learning-via-wasserstein-barycenters
PDF	http://papers.nips.cc/paper/8685-propagating-uncertainty-in-reinforcement-learning-via-wasserstein-barycenters.pdf
PWC	https://paperswithcode.com/paper/propagating-uncertainty-in-reinforcement
Repo	https://github.com/albertometelli/wql
Framework	none

ACIQ: Analytical Clipping for Integer Quantization of neural networks


Title	ACIQ: Analytical Clipping for Integer Quantization of neural networks
Authors	Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
Abstract	We analyze the trade-off between quantization noise and clipping distortion in low precision networks. We identify the statistics of various tensors, and derive exact expressions for the mean-square-error degradation due to clipping. By optimizing these expressions, we show marked improvements over standard quantization schemes that normally avoid clipping. For example, just by choosing the accurate clipping values, more than 40% accuracy improvement is obtained for the quantization of VGG-16 to 4-bits of precision. Our results have many applications for the quantization of neural networks at both training and inference time.
Tasks	Quantization
Published	2019-05-01
URL	https://openreview.net/forum?id=B1x33sC9KQ
PDF	https://openreview.net/pdf?id=B1x33sC9KQ
PWC	https://paperswithcode.com/paper/aciq-analytical-clipping-for-integer
Repo	https://github.com/submission2019/AnalyticalScaleForIntegerQuantization
Framework	pytorch

Bayesian Hierarchical Dynamic Model for Human Action Recognition


Title	Bayesian Hierarchical Dynamic Model for Human Action Recognition
Authors	Rui Zhao, Wanru Xu, Hui Su, Qiang Ji
Abstract	Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action. To address this issue, we propose a probabilistic model called Hierarchical Dynamic Model (HDM). Leveraging on Bayesian framework, the model parameters are allowed to vary across different sequences of data, which increase the capacity of the model to adapt to intra-class variations on both spatial and temporal extent of actions. Meanwhile, the generative learning process allows the model to preserve the distinctive dynamic pattern for each action class. Through Bayesian inference, we are able to quantify the uncertainty of the classification, providing insight during the decision process. Compared to state-of-the-art methods, our method not only achieves competitive recognition performance within individual dataset but also shows better generalization capability across different datasets. Experiments conducted on data with missing values also show the robustness of the proposed method.
Tasks	Bayesian Inference, Multimodal Activity Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_Bayesian_Hierarchical_Dynamic_Model_for_Human_Action_Recognition_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Bayesian_Hierarchical_Dynamic_Model_for_Human_Action_Recognition_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/bayesian-hierarchical-dynamic-model-for-human
Repo	https://github.com/rort1989/HDM
Framework	none

Learning Generalizable Device Placement Algorithms for Distributed Machine Learning


Title	Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Authors	Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh
Abstract	We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs that can be used without any re-training to predict optimized placements for unseen graphs from the same family. This eliminates the huge overhead incurred by the prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8653-learning-generalizable-device-placement-algorithms-for-distributed-machine-learning
PDF	http://papers.nips.cc/paper/8653-learning-generalizable-device-placement-algorithms-for-distributed-machine-learning.pdf
PWC	https://paperswithcode.com/paper/learning-generalizable-device-placement
Repo	https://github.com/aravic/generalizable-device-placement
Framework	tf

Generative Models for Graph-Based Protein Design


Title	Generative Models for Graph-Based Protein Design
Authors	John Ingraham, Vikas Garg, Regina Barzilay, Tommi Jaakkola
Abstract	Engineered proteins offer the potential to solve many problems in biomedicine, energy, and materials science, but creating designs that succeed is difficult in practice. A significant aspect of this challenge is the complex coupling between protein sequence and 3D structure, with the task of finding a viable design often referred to as the inverse protein folding problem. We develop relational language models for protein sequences that directly condition on a graph specification of the target structure. Our approach efficiently captures the complex dependencies in proteins by focusing on those that are long-range in sequence but local in 3D space. Our framework significantly improves in both speed and robustness over conventional and deep-learning-based methods for structure-based protein sequence design, and takes a step toward rapid and targeted biomolecular design with the aid of deep generative models.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9711-generative-models-for-graph-based-protein-design
PDF	http://papers.nips.cc/paper/9711-generative-models-for-graph-based-protein-design.pdf
PWC	https://paperswithcode.com/paper/generative-models-for-graph-based-protein
Repo	https://github.com/jingraham/neurips19-graph-protein-design
Framework	pytorch

An adaptive homeostatic algorithm for the unsupervised learning of visual features


Title	An adaptive homeostatic algorithm for the unsupervised learning of visual features
Authors	Victor Boutin, Angelo Franciosini, Laurent Perrinet
Abstract	The formation of structure in the brain, that is, of the connections between cells within neural populations, is by large an unsupervised learning process: the emergence of this architecture is mostly self-organized. In the primary visual cortex of mammals, for example, one may observe during development the formation of cells selective to localized, oriented features. This leads to the development of a rough representation of contours of the retinal image in area V1. We modeled these mechanisms using sparse Hebbian learning algorithms. These algorithms alternate a coding step to encode the information with a learning step to find the proper encoder. A major difficulty faced by these algorithms is to deduce a good representation while knowing immature encoders, and to learn good encoders with a non-optimal representation. To address this problem, we propose to introduce a new regulation process between learning and coding, called homeostasis. Our homeostasis is compatible with a neuro-mimetic architecture and allows for the fast emergence of localized filters sensitive to orientation. The key to this algorithm lies in a simple adaptation mechanism based on non-linear functions that reconciles the antagonistic processes that occur at the coding and learning time scales. We tested this unsupervised algorithm with this homeostasis rule for a range of existing unsupervised learning algorithms coupled with different neural coding algorithms. In addition, we propose a simplification of this optimal homeostasis rule by implementing a simple heuristic on the probability of activation of neurons. Compared to the optimal homeostasis rule, we show that this heuristic allows to implement a more rapid unsupervised learning algorithm while keeping a large part of its effectiveness. These results demonstrate the potential application of such a strategy in machine learning and we illustrate this with one result in a convolutional neural network.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=SyMras0cFQ
PDF	https://openreview.net/pdf?id=SyMras0cFQ
PWC	https://paperswithcode.com/paper/an-adaptive-homeostatic-algorithm-for-the
Repo	https://github.com/bicv/SHL_scripts
Framework	none

Event Cameras, Contrast Maximization and Reward Functions: An Analysis


Title	Event Cameras, Contrast Maximization and Reward Functions: An Analysis
Authors	Timo Stoffregen, Lindsay Kleeman
Abstract	Event cameras asynchronously report timestamped changes in pixel intensity and offer advantages over conventional raster scan cameras in terms of low-latency, low redundancy sensing and high dynamic range. In recent years, much of research in event based vision has been focused on performing tasks such as optic flow estimation, moving object segmentation, feature tracking, camera rotation estimation and more, through contrast maximization. In contrast maximization, events are warped along motion trajectories whose parameters depend on the quantity being estimated, to some time t_ref. The parameters are then scored by some reward function of the accumulated events at t_ref. The versatility of this approach has lead to a flurry of research in recent years, but no in-depth study of the reward chosen during optimization has yet been made. In this work we examine the choice of reward used in contrast maximization, propose a classification of different rewards and show how a reward can be constructed that is more robust to noise and aperture uncertainty. We validate our work experimentally by predicting optical flow and comparing to ground-truth data.
Tasks	Event-based vision, Optical Flow Estimation, Semantic Segmentation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Stoffregen_Event_Cameras_Contrast_Maximization_and_Reward_Functions_An_Analysis_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Stoffregen_Event_Cameras_Contrast_Maximization_and_Reward_Functions_An_Analysis_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/event-cameras-contrast-maximization-and
Repo	https://github.com/TimoStoff/events_contrast_maximization
Framework	pytorch

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders


Title	Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
Authors	Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata
Abstract	Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.
Tasks	Few-Shot Learning, Zero-Shot Learning
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Schonfeld_Generalized_Zero-_and_Few-Shot_Learning_via_Aligned_Variational_Autoencoders_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Schonfeld_Generalized_Zero-_and_Few-Shot_Learning_via_Aligned_Variational_Autoencoders_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/generalized-zero-and-few-shot-learning-via-1
Repo	https://github.com/edgarschnfld/CADA-VAE-PyTorch
Framework	pytorch

Learning RoI Transformer for Oriented Object Detection in Aerial Images


Title	Learning RoI Transformer for Oriented Object Detection in Aerial Images
Authors	Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu
Abstract	Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer.
Tasks	Object Classification, Object Detection, Object Detection In Aerial Images
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Ding_Learning_RoI_Transformer_for_Oriented_Object_Detection_in_Aerial_Images_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Ding_Learning_RoI_Transformer_for_Oriented_Object_Detection_in_Aerial_Images_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-roi-transformer-for-oriented-object
Repo	https://github.com/dingjiansw101/RoITransformer_DOTA
Framework	mxnet

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks


Title	Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
Authors	Aaron Voelker, Ivana Kajić, Chris Eliasmith
Abstract	We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit~(LMU) is mathematically derived to orthogonalize its continuous-time history – doing so by solving $d$ coupled ordinary differential equations~(ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree $d - 1$. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning $100\text{,}000$ time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time – exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network’s disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales. We demonstrate that LMU memory cells can be implemented using $m$ recurrently-connected Poisson spiking neurons, $\mathcal{O}( m )$ time and memory, with error scaling as $\mathcal{O}( d / \sqrt{m} )$. We discuss implementations of LMUs on analog and digital neuromorphic hardware.
Tasks	Time Series, Time Series Prediction
Published	2019-12-01
URL	http://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks
PDF	http://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf
PWC	https://paperswithcode.com/paper/legendre-memory-units-continuous-time
Repo	https://github.com/abr/neurips2019
Framework	tf

Poincare Glove: Hyperbolic Word Embeddings


Title	Poincare Glove: Hyperbolic Word Embeddings
Authors	Alexandru Tifrea, Gary Becigneul, Octavian-Eugen Ganea*
Abstract	Words are not created equal. In fact, they form an aristocratic graph with a latent hierarchical structure that the next generation of unsupervised learned word embeddings should reveal. In this paper, justified by the notion of delta-hyperbolicity or tree-likeliness of a space, we propose to embed words in a Cartesian product of hyperbolic spaces which we theoretically connect to the Gaussian word embeddings and their Fisher geometry. This connection allows us to introduce a novel principled hypernymy score for word embeddings. Moreover, we adapt the well-known Glove algorithm to learn unsupervised word embeddings in this type of Riemannian manifolds. We further explain how to solve the analogy task using the Riemannian parallel transport that generalizes vector arithmetics to this new type of geometry. Empirically, based on extensive experiments, we prove that our embeddings, trained unsupervised, are the first to simultaneously outperform strong and popular baselines on the tasks of similarity, analogy and hypernymy detection. In particular, for word hypernymy, we obtain new state-of-the-art on fully unsupervised WBLESS classification accuracy.
Tasks	Learning Word Embeddings, Word Embeddings
Published	2019-05-01
URL	https://openreview.net/forum?id=Ske5r3AqK7
PDF	https://openreview.net/pdf?id=Ske5r3AqK7
PWC	https://paperswithcode.com/paper/poincare-glove-hyperbolic-word-embeddings-1
Repo	https://github.com/alex-tifrea/poincare_glove
Framework	none

Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments


Title	Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments
Authors	Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen
Abstract	In this paper, we attempt to answer the question of whether neural network models can learn numeracy, which is the ability to predict the magnitude of a numeral at some specific position in a text description. A large benchmark dataset, called Numeracy-600K, is provided for the novel task. We explore several neural network models including CNN, GRU, BiGRU, CRNN, CNN-capsule, GRU-capsule, and BiGRU-capsule in the experiments. The results show that the BiGRU model gets the best micro-averaged F1 score of 80.16{%}, and the GRU-capsule model gets the best macro-averaged F1 score of 64.71{%}. Besides discussing the challenges through comprehensive experiments, we also present an important application scenario, i.e., detecting exaggerated information, for the task.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1635/
PDF	https://www.aclweb.org/anthology/P19-1635
PWC	https://paperswithcode.com/paper/numeracy-600k-learning-numeracy-for-detecting
Repo	https://github.com/aistairc/Numeracy-600K
Framework	none

Unsupervised Graph Association for Person Re-Identification


Title	Unsupervised Graph Association for Person Re-Identification
Authors	Jinlin Wu, Yang Yang, Hao Liu, Shengcai Liao, Zhen Lei, Stan Z. Li
Abstract	In this paper, we propose an unsupervised graph association (UGA) framework to learn the underlying viewinvariant representations from the video pedestrian tracklets. The core points of UGA are mining the underlying cross-view associations and reducing the damage of noise associations. To this end, UGA is adopts a two-stage training strategy: (1) intra-camera learning stage and (2) intercamera learning stage. The former learns the intra-camera representation for each camera. While the latter builds a cross-view graph (CVG) to associate different cameras. By doing this, we can learn view-invariant representation for all person. Extensive experiments and ablation studies on seven re-id datasets demonstrate the superiority of the proposed UGA over most state-of-the-art unsupervised and domain adaptation re-id methods.
Tasks	Domain Adaptation, Person Re-Identification
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Wu_Unsupervised_Graph_Association_for_Person_Re-Identification_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Wu_Unsupervised_Graph_Association_for_Person_Re-Identification_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/unsupervised-graph-association-for-person-re
Repo	https://github.com/yichuan9527/Unsupervised-Graph-Association-for-Person-Re-identification
Framework	none

Optimal Pricing in Repeated Posted-Price Auctions with Different Patience of the Seller and the Buyer


Title	Optimal Pricing in Repeated Posted-Price Auctions with Different Patience of the Seller and the Buyer
Authors	Arsenii Vanunts, Alexey Drutsa
Abstract	We study revenue optimization pricing algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation. When the participants non-equally discount their cumulative utilities, we show that the optimal constant pricing (which offers the Myerson price) is no longer optimal. In the case of more patient seller, we propose a novel multidimensional optimization functional — a generalization of the one used to determine Myerson’s price. This functional allows to find the optimal algorithm and to boost revenue of the optimal static pricing by an efficient low-dimensional approximation. Numerical experiments are provided to support our results.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8380-optimal-pricing-in-repeated-posted-price-auctions-with-different-patience-of-the-seller-and-the-buyer
PDF	http://papers.nips.cc/paper/8380-optimal-pricing-in-repeated-posted-price-auctions-with-different-patience-of-the-seller-and-the-buyer.pdf
PWC	https://paperswithcode.com/paper/optimal-pricing-in-repeated-posted-price
Repo	https://github.com/theonlybars/neurips-2019-rppa
Framework	none