July 28, 2019

2994 words 15 mins read

Paper Group ANR 236

Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification. Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning. The Promise and Peril of Human Evaluation for Model Interpretability. VAE Learning via Stein Variational Gradient Descent. 3D Face Reconstruction with Region Based Best Fit Blending Using Mob …

Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification


Title	Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification
Authors	Lu Tian, Shengjin Wang
Abstract	Person re-identification is generally divided into two part: first how to represent a pedestrian by discriminative visual descriptors and second how to compare them by suitable distance metrics. Conventional methods isolate these two parts, the first part usually unsupervised and the second part supervised. The Bag-of-Words (BoW) model is a widely used image representing descriptor in part one. Its codebook is simply generated by clustering visual features in Euclidian space. In this paper, we propose to use part two metric learning techniques in the codebook generation phase of BoW. In particular, the proposed codebook is clustered under Mahalanobis distance which is learned supervised. Extensive experiments prove that our proposed method is effective. With several low level features extracted on superpixel and fused together, our method outperforms state-of-the-art on person re-identification benchmarks including VIPeR, PRID450S, and Market1501.
Tasks	Metric Learning, Person Re-Identification
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02492v2
PDF	http://arxiv.org/pdf/1704.02492v2.pdf
PWC	https://paperswithcode.com/paper/metric-learning-in-codebook-generation-of-bag
Repo
Framework

Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning


Title	Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning
Authors	Hao Wang, Baosen Zhang
Abstract	In this paper, we derive a temporal arbitrage policy for storage via reinforcement learning. Real-time price arbitrage is an important source of revenue for storage units, but designing good strategies have proven to be difficult because of the highly uncertain nature of the prices. Instead of current model predictive or dynamic programming approaches, we use reinforcement learning to design an optimal arbitrage policy. This policy is learned through repeated charge and discharge actions performed by the storage unit through updating a value matrix. We design a reward function that does not only reflect the instant profit of charge/discharge decisions but also incorporate the history information. Simulation results demonstrate that our designed reward function leads to significant performance improvement compared with existing algorithms.
Tasks
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03127v2
PDF	http://arxiv.org/pdf/1711.03127v2.pdf
PWC	https://paperswithcode.com/paper/energy-storage-arbitrage-in-real-time-markets
Repo
Framework

The Promise and Peril of Human Evaluation for Model Interpretability


Title	The Promise and Peril of Human Evaluation for Model Interpretability
Authors	Bernease Herman
Abstract	Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.
Tasks	Interpretable Machine Learning
Published	2017-11-20
URL	https://arxiv.org/abs/1711.07414v2
PDF	https://arxiv.org/pdf/1711.07414v2.pdf
PWC	https://paperswithcode.com/paper/the-promise-and-peril-of-human-evaluation-for
Repo
Framework

VAE Learning via Stein Variational Gradient Descent


Title	VAE Learning via Stein Variational Gradient Descent
Authors	Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin
Abstract	A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.
Tasks
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05155v3
PDF	http://arxiv.org/pdf/1704.05155v3.pdf
PWC	https://paperswithcode.com/paper/vae-learning-via-stein-variational-gradient
Repo
Framework


Title	3D Face Reconstruction with Region Based Best Fit Blending Using Mobile Phone for Virtual Reality Based Social Media
Authors	Gholamreza Anbarjafari, Rain Eric Haamer, Iiris Lusi, Toomas Tikk, Lembit Valgma
Abstract	The use of virtual reality (VR) is exponentially increasing and due to that many researchers has started to work on developing new VR based social media. For this purpose it is important to have an avatar of the users which look like them to be easily generated by the devices which are accessible, such as mobile phone. In this paper, we propose a novel method of recreating a 3D human face model captured with a phone camera image or video data. The method focuses more on model shape than texture in order to make the face recognizable. We detect 68 facial feature points and use them to separate a face into four regions. For each area the best fitting models are found and are further morphed combined to find the best fitting models for each area. These are then combined and further morphed in order to restore the original facial proportions. We also present a method of texturing the resulting model, where the aforementioned feature points are used to generate a texture for the resulting model
Tasks	3D Face Reconstruction, Face Reconstruction
Published	2017-12-12
URL	http://arxiv.org/abs/1801.01089v1
PDF	http://arxiv.org/pdf/1801.01089v1.pdf
PWC	https://paperswithcode.com/paper/3d-face-reconstruction-with-region-based-best
Repo
Framework

Modeling Grasp Motor Imagery through Deep Conditional Generative Models


Title	Modeling Grasp Motor Imagery through Deep Conditional Generative Models
Authors	Matthew Veres, Medhat Moussa, Graham W. Taylor
Abstract	Grasping is a complex process involving knowledge of the object, the surroundings, and of oneself. While humans are able to integrate and process all of the sensory information required for performing this task, equipping machines with this capability is an extremely challenging endeavor. In this paper, we investigate how deep learning techniques can allow us to translate high-level concepts such as motor imagery to the problem of robotic grasp synthesis. We explore a paradigm based on generative models for learning integrated object-action representations, and demonstrate its capacity for capturing and generating multimodal, multi-finger grasp configurations on a simulated grasping dataset.
Tasks
Published	2017-01-11
URL	http://arxiv.org/abs/1701.03041v1
PDF	http://arxiv.org/pdf/1701.03041v1.pdf
PWC	https://paperswithcode.com/paper/modeling-grasp-motor-imagery-through-deep
Repo
Framework

Separation of time scales and direct computation of weights in deep neural networks


Title	Separation of time scales and direct computation of weights in deep neural networks
Authors	Nima Dehmamy, Neda Rohani, Aggelos Katsaggelos
Abstract	Artificial intelligence is revolutionizing our lives at an ever increasing pace. At the heart of this revolution is the recent advancements in deep neural networks (DNN), learning to perform sophisticated, high-level tasks. However, training DNNs requires massive amounts of data and is very computationally intensive. Gaining analytical understanding of the solutions found by DNNs can help us devise more efficient training algorithms, replacing the commonly used mthod of stochastic gradient descent (SGD). We analyze the dynamics of SGD and show that, indeed, direct computation of the solutions is possible in many cases. We show that a high performing setup used in DNNs introduces a separation of time-scales in the training dynamics, allowing SGD to train layers from the lowest (closest to input) to the highest. We then show that for each layer, the distribution of solutions found by SGD can be estimated using a class-based principal component analysis (PCA) of the layer’s input. This finding allows us to forgo SGD entirely and directly derive the DNN parameters using this class-based PCA, which can be well estimated using significantly less data than SGD. We implement these results on image datasets MNIST, CIFAR10 and CIFAR100 and find that, in fact, layers derived using our class-based PCA perform comparable or superior to neural networks of the same size and architecture trained using SGD. We also confirm that the class-based PCA often converges using a fraction of the data required for SGD. Thus, using our method training time can be reduced both by requiring less training data than SGD, and by eliminating layers in the costly backpropagation step of the training.
Tasks
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04757v3
PDF	http://arxiv.org/pdf/1703.04757v3.pdf
PWC	https://paperswithcode.com/paper/separation-of-time-scales-and-direct
Repo
Framework

MarrNet: 3D Shape Reconstruction via 2.5D Sketches


Title	MarrNet: 3D Shape Reconstruction via 2.5D Sketches
Authors	Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Freeman, Joshua B Tenenbaum
Abstract	3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches, as 3D object annotations are scarce in real images. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from domain adaptation when tested on real data. In this work, we propose MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image; models that recover 2.5D sketches are also more likely to transfer from synthetic to real data. Second, for 3D reconstruction from 2.5D sketches, systems can learn purely from synthetic data. This is because we can easily render realistic 2.5D sketches without modeling object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches; the framework is therefore end-to-end trainable on real images, requiring no human annotations. Our model achieves state-of-the-art performance on 3D shape reconstruction.
Tasks	3D Object Reconstruction, 3D Object Reconstruction From A Single Image, 3D Reconstruction, Domain Adaptation, Object Reconstruction
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03129v1
PDF	http://arxiv.org/pdf/1711.03129v1.pdf
PWC	https://paperswithcode.com/paper/marrnet-3d-shape-reconstruction-via-25d
Repo
Framework

A Unified Query-based Generative Model for Question Generation and Question Answering


Title	A Unified Query-based Generative Model for Question Generation and Question Answering
Authors	Linfeng Song, Zhiguo Wang, Wael Hamza
Abstract	We propose a query-based generative model for solving both tasks of question generation (QG) and question an- swering (QA). The model follows the classic encoder- decoder framework. The encoder takes a passage and a query as input then performs query understanding by matching the query with the passage from multiple per- spectives. The decoder is an attention-based Long Short Term Memory (LSTM) model with copy and coverage mechanisms. In the QG task, a question is generated from the system given the passage and the target answer, whereas in the QA task, the answer is generated given the question and the passage. During the training stage, we leverage a policy-gradient reinforcement learning algorithm to overcome exposure bias, a major prob- lem resulted from sequence learning with cross-entropy loss. For the QG task, our experiments show higher per- formances than the state-of-the-art results. When used as additional training data, the automatically generated questions even improve the performance of a strong ex- tractive QA system. In addition, our model shows bet- ter performance than the state-of-the-art baselines of the generative QA task.
Tasks	Question Answering, Question Generation
Published	2017-09-04
URL	http://arxiv.org/abs/1709.01058v2
PDF	http://arxiv.org/pdf/1709.01058v2.pdf
PWC	https://paperswithcode.com/paper/a-unified-query-based-generative-model-for
Repo
Framework

Smooth Sensitivity Based Approach for Differentially Private Principal Component Analysis


Title	Smooth Sensitivity Based Approach for Differentially Private Principal Component Analysis
Authors	Ran Gilad-Bachrach, Alon Gonen
Abstract	Currently known methods for this task either employ the computationally intensive \emph{exponential mechanism} or require an access to the covariance matrix, and therefore fail to utilize potential sparsity of the data. The problem of designing simpler and more efficient methods for this task has been raised as an open problem in \cite{kapralov2013differentially}. In this paper we address this problem by employing the output perturbation mechanism. Despite being arguably the simplest and most straightforward technique, it has been overlooked due to the large \emph{global sensitivity} associated with publishing the leading eigenvector. We tackle this issue by adopting a \emph{smooth sensitivity} based approach, which allows us to establish differential privacy (in a worst-case manner) and near-optimal sample complexity results under eigengap assumption. We consider both the pure and the approximate notions of differential privacy, and demonstrate a tradeoff between privacy level and sample complexity. We conclude by suggesting how our results can be extended to related problems.
Tasks
Published	2017-10-29
URL	https://arxiv.org/abs/1710.10556v4
PDF	https://arxiv.org/pdf/1710.10556v4.pdf
PWC	https://paperswithcode.com/paper/smooth-sensitivity-based-approach-for
Repo
Framework

Improved Bilinear Pooling with CNNs


Title	Improved Bilinear Pooling with CNNs
Authors	Tsung-Yu Lin, Subhransu Maji
Abstract	Bilinear pooling of Convolutional Neural Network (CNN) features [22, 23], and their compact variants [10], have been shown to be effective at fine-grained recognition, scene categorization, texture recognition, and visual question-answering tasks among others. The resulting representation captures second-order statistics of convolutional features in a translationally invariant manner. In this paper we investigate various ways of normalizing these statistics to improve their representation power. In particular we find that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwise square-root and l2 normalization. This improves the accuracy by 2-3% on a range of fine-grained recognition datasets leading to a new state of the art. We also investigate how the accuracy of matrix function computations effect network training and evaluation. In particular we compare against a technique for estimating matrix square-root gradients via solving a Lyapunov equation that is more numerically accurate than computing gradients via a Singular Value Decomposition (SVD). We find that while SVD gradients are numerically inaccurate the overall effect on the final accuracy is negligible once boundary cases are handled carefully. We present an alternative scheme for computing gradients that is faster and yet it offers improvements over the baseline model. Finally we show that the matrix square-root computed approximately using a few Newton iterations is just as accurate for the classification task but allows an order-of-magnitude faster GPU implementation compared to SVD decomposition.
Tasks	Question Answering, Visual Question Answering
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06772v1
PDF	http://arxiv.org/pdf/1707.06772v1.pdf
PWC	https://paperswithcode.com/paper/improved-bilinear-pooling-with-cnns
Repo
Framework

Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks


Title	Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks
Authors	Rajarshi Das, Manzil Zaheer, Siva Reddy, Andrew McCallum
Abstract	Existing question answering methods infer answers either from a knowledge base or from raw text. While knowledge base (KB) methods are good at answering compositional questions, their performance is often affected by the incompleteness of the KB. Au contraire, web text contains millions of facts that are absent in the KB, however in an unstructured form. {\it Universal schema} can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. In this paper we extend universal schema to natural language question answering, employing \emph{memory networks} to attend to the large body of facts in the combination of text and KB. Our models can be trained in an end-to-end fashion on question-answer pairs. Evaluation results on \spades fill-in-the-blank question answering dataset show that exploiting universal schema for question answering is better than using either a KB or text alone. This model also outperforms the current state-of-the-art by 8.5 $F_1$ points.\footnote{Code and data available in \url{https://rajarshd.github.io/TextKBQA}}
Tasks	Question Answering
Published	2017-04-27
URL	http://arxiv.org/abs/1704.08384v1
PDF	http://arxiv.org/pdf/1704.08384v1.pdf
PWC	https://paperswithcode.com/paper/question-answering-on-knowledge-bases-and
Repo
Framework

EnergyNet: Energy-based Adaptive Structural Learning of Artificial Neural Network Architectures


Title	EnergyNet: Energy-based Adaptive Structural Learning of Artificial Neural Network Architectures
Authors	Gus Kristiansen, Xavi Gonzalvo
Abstract	We present E NERGY N ET , a new framework for analyzing and building artificial neural network architectures. Our approach adaptively learns the structure of the networks in an unsupervised manner. The methodology is based upon the theoretical guarantees of the energy function of restricted Boltzmann machines (RBM) of infinite number of nodes. We present experimental results to show that the final network adapts to the complexity of a given problem.
Tasks
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03130v1
PDF	http://arxiv.org/pdf/1711.03130v1.pdf
PWC	https://paperswithcode.com/paper/energynet-energy-based-adaptive-structural
Repo
Framework

Unsupervised Place Discovery for Place-Specific Change Classifier


Title	Unsupervised Place Discovery for Place-Specific Change Classifier
Authors	Fei Xiaoxiao, Tanaka Kanji
Abstract	In this study, we address the problem of supervised change detection for robotic map learning applications, in which the aim is to train a place-specific change classifier (e.g., support vector machine (SVM)) to predict changes from a robot’s view image. An open question is the manner in which to partition a robot’s workspace into places (e.g., SVMs) to maximize the overall performance of change classifiers. This is a chicken-or-egg problem: if we have a well-trained change classifier, partitioning the robot’s workspace into places is rather easy. However, training a change classifier requires a set of place-specific training data. In this study, we address this novel problem, which we term unsupervised place discovery. In addition, we present a solution powered by convolutional-feature-based visual place recognition, and validate our approach by applying it to two place-specific change classifiers, namely, nuisance and anomaly predictors.
Tasks	Visual Place Recognition
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02054v1
PDF	http://arxiv.org/pdf/1706.02054v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-place-discovery-for-place
Repo
Framework

Detecting Small Signs from Large Images


Title	Detecting Small Signs from Large Images
Authors	Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen, Yan Tong
Abstract	In the past decade, Convolutional Neural Networks (CNNs) have been demonstrated successful for object detections. However, the size of network input is limited by the amount of memory available on GPUs. Moreover, performance degrades when detecting small objects. To alleviate the memory usage and improve the performance of detecting small traffic signs, we proposed an approach for detecting small traffic signs from large images under real world conditions. In particular, large images are broken into small patches as input to a Small-Object-Sensitive-CNN (SOS-CNN) modified from a Single Shot Multibox Detector (SSD) framework with a VGG-16 network as the base network to produce patch-level object detection results. Scale invariance is achieved by applying the SOS-CNN on an image pyramid. Then, image-level object detection is obtained by projecting all the patch-level detection results to the image at the original scale. Experimental results on a real-world conditioned traffic sign dataset have demonstrated the effectiveness of the proposed method in terms of detection accuracy and recall, especially for those with small sizes.
Tasks	Object Detection
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08574v1
PDF	http://arxiv.org/pdf/1706.08574v1.pdf
PWC	https://paperswithcode.com/paper/detecting-small-signs-from-large-images
Repo
Framework