May 7, 2019

2803 words 14 mins read

Paper Group ANR 77

Automatic Generation of Grounded Visual Questions. Dynamic Pricing in High-dimensions. Variational limits of k-NN graph based functionals on data clouds. Network Inference by Learned Node-Specific Degree Prior. An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal. Variable projection for nonsmooth models. Siamese …

Automatic Generation of Grounded Visual Questions


Title	Automatic Generation of Grounded Visual Questions
Authors	Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang
Abstract	In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.
Tasks	Question Generation
Published	2016-12-20
URL	http://arxiv.org/abs/1612.06530v2
PDF	http://arxiv.org/pdf/1612.06530v2.pdf
PWC	https://paperswithcode.com/paper/automatic-generation-of-grounded-visual
Repo
Framework

Dynamic Pricing in High-dimensions


Title	Dynamic Pricing in High-dimensions
Authors	Adel Javanmard, Hamid Nazerzadeh
Abstract	We study the pricing problem faced by a firm that sells a large number of products, described via a wide range of features, to customers that arrive over time. Customers independently make purchasing decisions according to a general choice model that includes products features and customers’ characteristics, encoded as $d$-dimensional numerical vectors, as well as the price offered. The parameters of the choice model are a priori unknown to the firm, but can be learned as the (binary-valued) sales data accrues over time. The firm’s objective is to minimize the regret, i.e., the expected revenue loss against a clairvoyant policy that knows the parameters of the choice model in advance, and always offers the revenue-maximizing price. This setting is motivated in part by the prevalence of online marketplaces that allow for real-time pricing. We assume a structured choice model, parameters of which depend on $s_0$ out of the $d$ product features. We propose a dynamic policy, called Regularized Maximum Likelihood Pricing (RMLP) that leverages the (sparsity) structure of the high-dimensional model and obtains a logarithmic regret in $T$. More specifically, the regret of our algorithm is of $O(s_0 \log d \cdot \log T)$. Furthermore, we show that no policy can obtain regret better than $O(s_0 (\log d + \log T))$.
Tasks
Published	2016-09-24
URL	http://arxiv.org/abs/1609.07574v4
PDF	http://arxiv.org/pdf/1609.07574v4.pdf
PWC	https://paperswithcode.com/paper/dynamic-pricing-in-high-dimensions
Repo
Framework

Variational limits of k-NN graph based functionals on data clouds


Title	Variational limits of k-NN graph based functionals on data clouds
Authors	Nicolas Garcia Trillos
Abstract	This paper studies the large sample asymptotics of data analysis procedures based on the optimization of functionals defined on $k$-NN graphs on point clouds. The paper is framed in the context of minimization of balanced cut functionals, but our techniques, ideas and results can be adapted to other functionals of relevance. We rigorously show that provided the number of neighbors in the graph $k:=k_n$ scales with the number of points in the cloud as $n \gg k_n \gg \log(n)$, then with probability one, the solution to the graph cut optimization problem converges towards the solution of an analogue variational problem at the continuum level.
Tasks
Published	2016-07-03
URL	http://arxiv.org/abs/1607.00696v3
PDF	http://arxiv.org/pdf/1607.00696v3.pdf
PWC	https://paperswithcode.com/paper/variational-limits-of-k-nn-graph-based
Repo
Framework

Network Inference by Learned Node-Specific Degree Prior


Title	Network Inference by Learned Node-Specific Degree Prior
Authors	Qingming Tang, Lifu Tu, Weiran Wang, Jinbo Xu
Abstract	We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by cross validation. Then we formulate network inference as a matrix completion problem regularized by our degree prior. Our theoretical analysis indicates that this prior favors a network following the learned degree distribution, and may lead to improved network recovery error bound than previous work. Experimental results on both simulated and real biological networks demonstrate the superior performance of our method in various settings.
Tasks	Matrix Completion
Published	2016-02-07
URL	http://arxiv.org/abs/1602.02386v1
PDF	http://arxiv.org/pdf/1602.02386v1.pdf
PWC	https://paperswithcode.com/paper/network-inference-by-learned-node-specific
Repo
Framework

An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal


Title	An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal
Authors	Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor
Abstract	Classification of human behavior is key to developing closed-loop Deep Brain Stimulation (DBS) systems, which may be able to decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN can collect three bipolar signals, the selection of a suitable pair of LFPs that achieves optimal recognition performance is still an open problem to address. Considering the presence of synchronized aggregate activity in the basal ganglia, this paper presents an FFT-based synchronization approach to automatically select a relevant pair of LFPs and use the pair together with an SVM-based MKL classifier for behavior recognition purposes. Our experiments on five subjects show the superiority of the proposed approach compared to other methods used for behavior classification.
Tasks
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08780v1
PDF	http://arxiv.org/pdf/1612.08780v1.pdf
PWC	https://paperswithcode.com/paper/an-fft-based-synchronization-approach-to
Repo
Framework

Variable projection for nonsmooth models


Title	Variable projection for nonsmooth models
Authors	Aleksandr Aravkin, Dmitriy Drusvyatskiy, Tristan van Leeuwen
Abstract	Variable projection solves structured optimization problems by completely minimizing over a subset of the variables while iterating over the remaining variables. Over the last 30 years, the technique has been widely used, with empirical and theoretical results demonstrating both greater efficacy and greater stability compared to competing approaches. Classic examples have exploited closed form projections and smoothness of the objective function. We extend the approach to broader settings, where the projection subproblems can be nonsmooth, and can only be solved inexactly by iterative methods. We present a few case studies on problems occurring frequently in machine-learning and high-dimensional inference.
Tasks
Published	2016-01-19
URL	http://arxiv.org/abs/1601.05011v4
PDF	http://arxiv.org/pdf/1601.05011v4.pdf
PWC	https://paperswithcode.com/paper/variable-projection-for-nonsmooth-models
Repo
Framework

Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation


Title	Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation
Authors	Andreas Doumanoglou, Vassileios Balntas, Rigas Kouskouridas, Tae-Kyun Kim
Abstract	In this paper we tackle the problem of estimating the 3D pose of object instances, using convolutional neural networks. State of the art methods usually solve the challenging problem of regression in angle space indirectly, focusing on learning discriminative features that are later fed into a separate architecture for 3D pose estimation. In contrast, we propose an end-to-end learning framework for directly regressing object poses by exploiting Siamese Networks. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset.
Tasks	3D Pose Estimation, Pose Estimation
Published	2016-07-08
URL	http://arxiv.org/abs/1607.02257v1
PDF	http://arxiv.org/pdf/1607.02257v1.pdf
PWC	https://paperswithcode.com/paper/siamese-regression-networks-with-efficient
Repo
Framework

Inferring the location of authors from words in their texts


Title	Inferring the location of authors from words in their texts
Authors	Max Berggren, Jussi Karlgren, Robert Östling, Mikael Parkvall
Abstract	For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable through explicit labels but most have no explicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location-indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.
Tasks
Published	2016-12-20
URL	http://arxiv.org/abs/1612.06671v1
PDF	http://arxiv.org/pdf/1612.06671v1.pdf
PWC	https://paperswithcode.com/paper/inferring-the-location-of-authors-from-words
Repo
Framework

The Open World of Micro-Videos


Title	The Open World of Micro-Videos
Authors	Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, Deva Ramanan
Abstract	Micro-videos are six-second videos popular on social media networks with several unique properties. Firstly, because of the authoring process, they contain significantly more diversity and narrative structure than existing collections of video “snippets”. Secondly, because they are often captured by hand-held mobile cameras, they contain specialized viewpoints including third-person, egocentric, and self-facing views seldom seen in traditional produced video. Thirdly, due to to their continuous production and publication on social networks, aggregate micro-video content contains interesting open-world dynamics that reflects the temporal evolution of tag topics. These aspects make micro-videos an appealing well of visual data for developing large-scale models for video understanding. We analyze a novel dataset of micro-videos labeled with 58 thousand tags. To analyze this data, we introduce viewpoint-specific and temporally-evolving models for video understanding, defined over state-of-the-art motion and deep visual features. We conclude that our dataset opens up new research opportunities for large-scale video analysis, novel viewpoints, and open-world dynamics.
Tasks	Video Understanding
Published	2016-03-31
URL	http://arxiv.org/abs/1603.09439v2
PDF	http://arxiv.org/pdf/1603.09439v2.pdf
PWC	https://paperswithcode.com/paper/the-open-world-of-micro-videos
Repo
Framework

A Minimax Optimal Algorithm for Crowdsourcing


Title	A Minimax Optimal Algorithm for Crowdsourcing
Authors	Thomas Bonald, Richard Combes
Abstract	We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be implemented in a streaming setting when labels are provided by workers in real time, and does not rely on an iterative procedure. We further prove that TE is minimax optimal and matches our lower bound. We conclude by assessing the performance of TE and other state-of-the-art algorithms on both synthetic and real-world data sets.
Tasks
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00226v2
PDF	http://arxiv.org/pdf/1606.00226v2.pdf
PWC	https://paperswithcode.com/paper/a-minimax-optimal-algorithm-for-crowdsourcing
Repo
Framework

Weakly supervised object detection using pseudo-strong labels


Title	Weakly supervised object detection using pseudo-strong labels
Authors	Ke Yang, Dongsheng Li, Yong Dou, Shaohe Lv, Qiang Wang
Abstract	Object detection is an import task of computer vision.A variety of methods have been proposed,but methods using the weak labels still do not have a satisfactory result.In this paper,we propose a new framework that using the weakly supervised method’s output as the pseudo-strong labels to train a strongly supervised model.One weakly supervised method is treated as black-box to generate class-specific bounding boxes on train dataset.A de-noise method is then applied to the noisy bounding boxes.Then the de-noised pseudo-strong labels are used to train a strongly object detection network.The whole framework is still weakly supervised because the entire process only uses the image-level labels.The experiment results on PASCAL VOC 2007 prove the validity of our framework, and we get result 43.4% on mean average precision compared to 39.5% of the previous best result and 34.5% of the initial method,respectively.And this frame work is simple and distinct,and is promising to be applied to other method easily.
Tasks	Object Detection, Weakly Supervised Object Detection
Published	2016-07-16
URL	http://arxiv.org/abs/1607.04731v1
PDF	http://arxiv.org/pdf/1607.04731v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-object-detection-using
Repo
Framework

Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version)


Title	Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version)
Authors	Rafael Peñaloza, Nico Potyka
Abstract	A central question for knowledge representation is how to encode and handle uncertain knowledge adequately. We introduce the probabilistic description logic ALCP that is designed for representing context-dependent knowledge, where the actual context taking place is uncertain. ALCP allows the expression of logical dependencies on the domain and probabilistic dependencies on the possible contexts. In order to draw probabilistic conclusions, we employ the principle of maximum entropy. We provide reasoning algorithms for this logic, and show that it satisfies several desirable properties of probabilistic logics.
Tasks
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09521v1
PDF	http://arxiv.org/pdf/1606.09521v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-reasoning-in-the-description
Repo
Framework

Closed loop interactions between spiking neural network and robotic simulators based on MUSIC and ROS


Title	Closed loop interactions between spiking neural network and robotic simulators based on MUSIC and ROS
Authors	Philipp Weidel, Mikael Djurfeldt, Renato Duarte, Abigail Morrison
Abstract	In order to properly assess the function and computational properties of simulated neural systems, it is necessary to account for the nature of the stimuli that drive the system. However, providing stimuli that are rich and yet both reproducible and amenable to experimental manipulations is technically challenging, and even more so if a closed-loop scenario is required. In this work, we present a novel approach to solve this problem, connecting robotics and neural network simulators. We implement a middleware solution that bridges the Robotic Operating System (ROS) to the Multi-Simulator Coordinator (MUSIC). This enables any robotic and neural simulators that implement the corresponding interfaces to be efficiently coupled, allowing real-time performance for a wide range of configurations. This work extends the toolset available for researchers in both neurorobotics and computational neuroscience, and creates the opportunity to perform closed-loop experiments of arbitrary complexity to address questions in multiple areas, including embodiment, agency, and reinforcement learning.
Tasks
Published	2016-04-16
URL	http://arxiv.org/abs/1604.04764v1
PDF	http://arxiv.org/pdf/1604.04764v1.pdf
PWC	https://paperswithcode.com/paper/closed-loop-interactions-between-spiking
Repo
Framework

Bridging Category-level and Instance-level Semantic Image Segmentation


Title	Bridging Category-level and Instance-level Semantic Image Segmentation
Authors	Zifeng Wu, Chunhua Shen, Anton van den Hengel
Abstract	We propose an approach to instance-level image segmentation that is built on top of category-level segmentation. Specifically, for each pixel in a semantic category mask, its corresponding instance bounding box is predicted using a deep fully convolutional regression network. Thus it follows a different pipeline to the popular detect-then-segment approaches that first predict instances’ bounding boxes, which are the current state-of-the-art in instance segmentation. We show that, by leveraging the strength of our state-of-the-art semantic segmentation models, the proposed method can achieve comparable or even better results to detect-then-segment approaches. We make the following contributions. (i) First, we propose a simple yet effective approach to semantic instance segmentation. (ii) Second, we propose an online bootstrapping method during training, which is critically important for achieving good performance for both semantic category segmentation and instance-level segmentation. (iii) As the performance of semantic category segmentation has a significant impact on the instance-level segmentation, which is the second step of our approach, we train fully convolutional residual networks to achieve the best semantic category segmentation accuracy. On the PASCAL VOC 2012 dataset, we obtain the currently best mean intersection-over-union score of 79.1%. (iv) We also achieve state-of-the-art results for instance-level segmentation.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2016-05-23
URL	http://arxiv.org/abs/1605.06885v1
PDF	http://arxiv.org/pdf/1605.06885v1.pdf
PWC	https://paperswithcode.com/paper/bridging-category-level-and-instance-level
Repo
Framework

Gamifying Video Object Segmentation


Title	Gamifying Video Object Segmentation
Authors	Simone Palazzo, Concetto Spampinato, Daniela Giordano
Abstract	Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare their performance with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of human time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging tasks. In particular, our method relies on a web game to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided input. Performance analysis carried out on challenging video datasets with some users playing the game demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2016-01-05
URL	http://arxiv.org/abs/1601.00825v1
PDF	http://arxiv.org/pdf/1601.00825v1.pdf
PWC	https://paperswithcode.com/paper/gamifying-video-object-segmentation
Repo
Framework