Paper Group ANR 77
Automatic Generation of Grounded Visual Questions. Dynamic Pricing in High-dimensions. Variational limits of k-NN graph based functionals on data clouds. Network Inference by Learned Node-Specific Degree Prior. An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal. Variable projection for nonsmooth models. Siamese …
Automatic Generation of Grounded Visual Questions
Title | Automatic Generation of Grounded Visual Questions |
Authors | Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang |
Abstract | In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin. |
Tasks | Question Generation |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06530v2 |
http://arxiv.org/pdf/1612.06530v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-generation-of-grounded-visual |
Repo | |
Framework | |
Dynamic Pricing in High-dimensions
Title | Dynamic Pricing in High-dimensions |
Authors | Adel Javanmard, Hamid Nazerzadeh |
Abstract | We study the pricing problem faced by a firm that sells a large number of products, described via a wide range of features, to customers that arrive over time. Customers independently make purchasing decisions according to a general choice model that includes products features and customers’ characteristics, encoded as $d$-dimensional numerical vectors, as well as the price offered. The parameters of the choice model are a priori unknown to the firm, but can be learned as the (binary-valued) sales data accrues over time. The firm’s objective is to minimize the regret, i.e., the expected revenue loss against a clairvoyant policy that knows the parameters of the choice model in advance, and always offers the revenue-maximizing price. This setting is motivated in part by the prevalence of online marketplaces that allow for real-time pricing. We assume a structured choice model, parameters of which depend on $s_0$ out of the $d$ product features. We propose a dynamic policy, called Regularized Maximum Likelihood Pricing (RMLP) that leverages the (sparsity) structure of the high-dimensional model and obtains a logarithmic regret in $T$. More specifically, the regret of our algorithm is of $O(s_0 \log d \cdot \log T)$. Furthermore, we show that no policy can obtain regret better than $O(s_0 (\log d + \log T))$. |
Tasks | |
Published | 2016-09-24 |
URL | http://arxiv.org/abs/1609.07574v4 |
http://arxiv.org/pdf/1609.07574v4.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-pricing-in-high-dimensions |
Repo | |
Framework | |
Variational limits of k-NN graph based functionals on data clouds
Title | Variational limits of k-NN graph based functionals on data clouds |
Authors | Nicolas Garcia Trillos |
Abstract | This paper studies the large sample asymptotics of data analysis procedures based on the optimization of functionals defined on $k$-NN graphs on point clouds. The paper is framed in the context of minimization of balanced cut functionals, but our techniques, ideas and results can be adapted to other functionals of relevance. We rigorously show that provided the number of neighbors in the graph $k:=k_n$ scales with the number of points in the cloud as $n \gg k_n \gg \log(n)$, then with probability one, the solution to the graph cut optimization problem converges towards the solution of an analogue variational problem at the continuum level. |
Tasks | |
Published | 2016-07-03 |
URL | http://arxiv.org/abs/1607.00696v3 |
http://arxiv.org/pdf/1607.00696v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-limits-of-k-nn-graph-based |
Repo | |
Framework | |
Network Inference by Learned Node-Specific Degree Prior
Title | Network Inference by Learned Node-Specific Degree Prior |
Authors | Qingming Tang, Lifu Tu, Weiran Wang, Jinbo Xu |
Abstract | We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by cross validation. Then we formulate network inference as a matrix completion problem regularized by our degree prior. Our theoretical analysis indicates that this prior favors a network following the learned degree distribution, and may lead to improved network recovery error bound than previous work. Experimental results on both simulated and real biological networks demonstrate the superior performance of our method in various settings. |
Tasks | Matrix Completion |
Published | 2016-02-07 |
URL | http://arxiv.org/abs/1602.02386v1 |
http://arxiv.org/pdf/1602.02386v1.pdf | |
PWC | https://paperswithcode.com/paper/network-inference-by-learned-node-specific |
Repo | |
Framework | |
An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal
Title | An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal |
Authors | Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor |
Abstract | Classification of human behavior is key to developing closed-loop Deep Brain Stimulation (DBS) systems, which may be able to decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN can collect three bipolar signals, the selection of a suitable pair of LFPs that achieves optimal recognition performance is still an open problem to address. Considering the presence of synchronized aggregate activity in the basal ganglia, this paper presents an FFT-based synchronization approach to automatically select a relevant pair of LFPs and use the pair together with an SVM-based MKL classifier for behavior recognition purposes. Our experiments on five subjects show the superiority of the proposed approach compared to other methods used for behavior classification. |
Tasks | |
Published | 2016-12-28 |
URL | http://arxiv.org/abs/1612.08780v1 |
http://arxiv.org/pdf/1612.08780v1.pdf | |
PWC | https://paperswithcode.com/paper/an-fft-based-synchronization-approach-to |
Repo | |
Framework | |
Variable projection for nonsmooth models
Title | Variable projection for nonsmooth models |
Authors | Aleksandr Aravkin, Dmitriy Drusvyatskiy, Tristan van Leeuwen |
Abstract | Variable projection solves structured optimization problems by completely minimizing over a subset of the variables while iterating over the remaining variables. Over the last 30 years, the technique has been widely used, with empirical and theoretical results demonstrating both greater efficacy and greater stability compared to competing approaches. Classic examples have exploited closed form projections and smoothness of the objective function. We extend the approach to broader settings, where the projection subproblems can be nonsmooth, and can only be solved inexactly by iterative methods. We present a few case studies on problems occurring frequently in machine-learning and high-dimensional inference. |
Tasks | |
Published | 2016-01-19 |
URL | http://arxiv.org/abs/1601.05011v4 |
http://arxiv.org/pdf/1601.05011v4.pdf | |
PWC | https://paperswithcode.com/paper/variable-projection-for-nonsmooth-models |
Repo | |
Framework | |
Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation
Title | Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation |
Authors | Andreas Doumanoglou, Vassileios Balntas, Rigas Kouskouridas, Tae-Kyun Kim |
Abstract | In this paper we tackle the problem of estimating the 3D pose of object instances, using convolutional neural networks. State of the art methods usually solve the challenging problem of regression in angle space indirectly, focusing on learning discriminative features that are later fed into a separate architecture for 3D pose estimation. In contrast, we propose an end-to-end learning framework for directly regressing object poses by exploiting Siamese Networks. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset. |
Tasks | 3D Pose Estimation, Pose Estimation |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02257v1 |
http://arxiv.org/pdf/1607.02257v1.pdf | |
PWC | https://paperswithcode.com/paper/siamese-regression-networks-with-efficient |
Repo | |
Framework | |
Inferring the location of authors from words in their texts
Title | Inferring the location of authors from words in their texts |
Authors | Max Berggren, Jussi Karlgren, Robert Östling, Mikael Parkvall |
Abstract | For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable through explicit labels but most have no explicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location-indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language. |
Tasks | |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06671v1 |
http://arxiv.org/pdf/1612.06671v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-the-location-of-authors-from-words |
Repo | |
Framework | |
The Open World of Micro-Videos
Title | The Open World of Micro-Videos |
Authors | Phuc Xuan Nguyen, Gregory Rogez, Charless Fowlkes, Deva Ramanan |
Abstract | Micro-videos are six-second videos popular on social media networks with several unique properties. Firstly, because of the authoring process, they contain significantly more diversity and narrative structure than existing collections of video “snippets”. Secondly, because they are often captured by hand-held mobile cameras, they contain specialized viewpoints including third-person, egocentric, and self-facing views seldom seen in traditional produced video. Thirdly, due to to their continuous production and publication on social networks, aggregate micro-video content contains interesting open-world dynamics that reflects the temporal evolution of tag topics. These aspects make micro-videos an appealing well of visual data for developing large-scale models for video understanding. We analyze a novel dataset of micro-videos labeled with 58 thousand tags. To analyze this data, we introduce viewpoint-specific and temporally-evolving models for video understanding, defined over state-of-the-art motion and deep visual features. We conclude that our dataset opens up new research opportunities for large-scale video analysis, novel viewpoints, and open-world dynamics. |
Tasks | Video Understanding |
Published | 2016-03-31 |
URL | http://arxiv.org/abs/1603.09439v2 |
http://arxiv.org/pdf/1603.09439v2.pdf | |
PWC | https://paperswithcode.com/paper/the-open-world-of-micro-videos |
Repo | |
Framework | |
A Minimax Optimal Algorithm for Crowdsourcing
Title | A Minimax Optimal Algorithm for Crowdsourcing |
Authors | Thomas Bonald, Richard Combes |
Abstract | We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be implemented in a streaming setting when labels are provided by workers in real time, and does not rely on an iterative procedure. We further prove that TE is minimax optimal and matches our lower bound. We conclude by assessing the performance of TE and other state-of-the-art algorithms on both synthetic and real-world data sets. |
Tasks | |
Published | 2016-06-01 |
URL | http://arxiv.org/abs/1606.00226v2 |
http://arxiv.org/pdf/1606.00226v2.pdf | |
PWC | https://paperswithcode.com/paper/a-minimax-optimal-algorithm-for-crowdsourcing |
Repo | |
Framework | |
Weakly supervised object detection using pseudo-strong labels
Title | Weakly supervised object detection using pseudo-strong labels |
Authors | Ke Yang, Dongsheng Li, Yong Dou, Shaohe Lv, Qiang Wang |
Abstract | Object detection is an import task of computer vision.A variety of methods have been proposed,but methods using the weak labels still do not have a satisfactory result.In this paper,we propose a new framework that using the weakly supervised method’s output as the pseudo-strong labels to train a strongly supervised model.One weakly supervised method is treated as black-box to generate class-specific bounding boxes on train dataset.A de-noise method is then applied to the noisy bounding boxes.Then the de-noised pseudo-strong labels are used to train a strongly object detection network.The whole framework is still weakly supervised because the entire process only uses the image-level labels.The experiment results on PASCAL VOC 2007 prove the validity of our framework, and we get result 43.4% on mean average precision compared to 39.5% of the previous best result and 34.5% of the initial method,respectively.And this frame work is simple and distinct,and is promising to be applied to other method easily. |
Tasks | Object Detection, Weakly Supervised Object Detection |
Published | 2016-07-16 |
URL | http://arxiv.org/abs/1607.04731v1 |
http://arxiv.org/pdf/1607.04731v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-object-detection-using |
Repo | |
Framework | |
Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version)
Title | Probabilistic Reasoning in the Description Logic ALCP with the Principle of Maximum Entropy (Full Version) |
Authors | Rafael Peñaloza, Nico Potyka |
Abstract | A central question for knowledge representation is how to encode and handle uncertain knowledge adequately. We introduce the probabilistic description logic ALCP that is designed for representing context-dependent knowledge, where the actual context taking place is uncertain. ALCP allows the expression of logical dependencies on the domain and probabilistic dependencies on the possible contexts. In order to draw probabilistic conclusions, we employ the principle of maximum entropy. We provide reasoning algorithms for this logic, and show that it satisfies several desirable properties of probabilistic logics. |
Tasks | |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09521v1 |
http://arxiv.org/pdf/1606.09521v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-reasoning-in-the-description |
Repo | |
Framework | |
Closed loop interactions between spiking neural network and robotic simulators based on MUSIC and ROS
Title | Closed loop interactions between spiking neural network and robotic simulators based on MUSIC and ROS |
Authors | Philipp Weidel, Mikael Djurfeldt, Renato Duarte, Abigail Morrison |
Abstract | In order to properly assess the function and computational properties of simulated neural systems, it is necessary to account for the nature of the stimuli that drive the system. However, providing stimuli that are rich and yet both reproducible and amenable to experimental manipulations is technically challenging, and even more so if a closed-loop scenario is required. In this work, we present a novel approach to solve this problem, connecting robotics and neural network simulators. We implement a middleware solution that bridges the Robotic Operating System (ROS) to the Multi-Simulator Coordinator (MUSIC). This enables any robotic and neural simulators that implement the corresponding interfaces to be efficiently coupled, allowing real-time performance for a wide range of configurations. This work extends the toolset available for researchers in both neurorobotics and computational neuroscience, and creates the opportunity to perform closed-loop experiments of arbitrary complexity to address questions in multiple areas, including embodiment, agency, and reinforcement learning. |
Tasks | |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04764v1 |
http://arxiv.org/pdf/1604.04764v1.pdf | |
PWC | https://paperswithcode.com/paper/closed-loop-interactions-between-spiking |
Repo | |
Framework | |
Bridging Category-level and Instance-level Semantic Image Segmentation
Title | Bridging Category-level and Instance-level Semantic Image Segmentation |
Authors | Zifeng Wu, Chunhua Shen, Anton van den Hengel |
Abstract | We propose an approach to instance-level image segmentation that is built on top of category-level segmentation. Specifically, for each pixel in a semantic category mask, its corresponding instance bounding box is predicted using a deep fully convolutional regression network. Thus it follows a different pipeline to the popular detect-then-segment approaches that first predict instances’ bounding boxes, which are the current state-of-the-art in instance segmentation. We show that, by leveraging the strength of our state-of-the-art semantic segmentation models, the proposed method can achieve comparable or even better results to detect-then-segment approaches. We make the following contributions. (i) First, we propose a simple yet effective approach to semantic instance segmentation. (ii) Second, we propose an online bootstrapping method during training, which is critically important for achieving good performance for both semantic category segmentation and instance-level segmentation. (iii) As the performance of semantic category segmentation has a significant impact on the instance-level segmentation, which is the second step of our approach, we train fully convolutional residual networks to achieve the best semantic category segmentation accuracy. On the PASCAL VOC 2012 dataset, we obtain the currently best mean intersection-over-union score of 79.1%. (iv) We also achieve state-of-the-art results for instance-level segmentation. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06885v1 |
http://arxiv.org/pdf/1605.06885v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-category-level-and-instance-level |
Repo | |
Framework | |
Gamifying Video Object Segmentation
Title | Gamifying Video Object Segmentation |
Authors | Simone Palazzo, Concetto Spampinato, Daniela Giordano |
Abstract | Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare their performance with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of human time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging tasks. In particular, our method relies on a web game to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided input. Performance analysis carried out on challenging video datasets with some users playing the game demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00825v1 |
http://arxiv.org/pdf/1601.00825v1.pdf | |
PWC | https://paperswithcode.com/paper/gamifying-video-object-segmentation |
Repo | |
Framework | |