Paper Group ANR 182
Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN. Online shopping behavior study based on multi-granularity opinion mining: China vs. America. Neural Turing Machines: Convergence of Copy Tasks. Sketching for Large-Scale Learning of M …
Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction
Title | Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction |
Authors | Jianfeng Dong, Xirong Li, Cees G. M. Snoek |
Abstract | This paper strives to find the sentence best describing the content of an image or video. Different from existing works, which rely on a joint subspace for image / video to sentence matching, we propose to do so in a visual space only. We contribute Word2VisualVec, a deep neural network architecture that learns to predict a deep visual encoding of textual input based on sentence vectorization and a multi-layer perceptron. We thoroughly analyze its architectural design, by varying the sentence vectorization strategy, network depth and the deep feature to predict for image to sentence matching. We also generalize Word2VisualVec for matching a video to a sentence, by extending the predictive abilities to 3-D ConvNet features as well as a visual-audio representation. Experiments on four challenging image and video benchmarks detail Word2VisualVec’s properties, capabilities for image and video to sentence matching, and on all datasets its state-of-the-art results. |
Tasks | |
Published | 2016-04-23 |
URL | http://arxiv.org/abs/1604.06838v2 |
http://arxiv.org/pdf/1604.06838v2.pdf | |
PWC | https://paperswithcode.com/paper/word2visualvec-image-and-video-to-sentence |
Repo | |
Framework | |
Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN
Title | Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN |
Authors | Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang |
Abstract | Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential Deep Trajectory Descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51 and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset. |
Tasks | Temporal Action Localization |
Published | 2016-09-10 |
URL | http://arxiv.org/abs/1609.03056v2 |
http://arxiv.org/pdf/1609.03056v2.pdf | |
PWC | https://paperswithcode.com/paper/sequential-deep-trajectory-descriptor-for |
Repo | |
Framework | |
Online shopping behavior study based on multi-granularity opinion mining: China vs. America
Title | Online shopping behavior study based on multi-granularity opinion mining: China vs. America |
Authors | Qingqing Zhou, Rui Xia, Chengzhi Zhang |
Abstract | With the development of e-commerce, many products are now being sold worldwide, and manufacturers are eager to obtain a better understanding of customer behavior in various regions. To achieve this goal, most previous efforts have focused mainly on questionnaires, which are time-consuming and costly. The tremendous volume of product reviews on e-commerce websites has seen a new trend emerge, whereby manufacturers attempt to understand user preferences by analyzing online reviews. Following this trend, this paper addresses the problem of studying customer behavior by exploiting recently developed opinion mining techniques. This work is novel for three reasons. First, questionnaire-based investigation is automatically enabled by employing algorithms for template-based question generation and opinion mining-based answer extraction. Using this system, manufacturers are able to obtain reports of customer behavior featuring a much larger sample size, more direct information, a higher degree of automation, and a lower cost. Second, international customer behavior study is made easier by integrating tools for multilingual opinion mining. Third, this is the first time an automatic questionnaire investigation has been conducted to compare customer behavior in China and America, where product reviews are written and read in Chinese and English, respectively. Our study on digital cameras, smartphones, and tablet computers yields three findings. First, Chinese customers follow the Doctrine of the Mean, and often use euphemistic expressions, while American customers express their opinions more directly. Second, Chinese customers care more about general feelings, while American customers pay more attention to product details. Third, Chinese customers focus on external features, while American customers care more about the internal features of products. |
Tasks | Opinion Mining, Question Generation |
Published | 2016-03-26 |
URL | http://arxiv.org/abs/1603.08089v1 |
http://arxiv.org/pdf/1603.08089v1.pdf | |
PWC | https://paperswithcode.com/paper/online-shopping-behavior-study-based-on-multi |
Repo | |
Framework | |
Neural Turing Machines: Convergence of Copy Tasks
Title | Neural Turing Machines: Convergence of Copy Tasks |
Authors | Janez Aleš |
Abstract | The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been studied to overcome these difficulties. In this report we focus on improving the quality of prediction of the original linear memory architecture on copy and repeat copy tasks. Copy task predictions on sequences of length six times larger than those the neural Turing machine was trained on prove to be highly accurate and so do predictions of repeat copy tasks for sequences with twice the repetition number and twice the sequence length neural Turing machine was trained on. |
Tasks | |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02336v1 |
http://arxiv.org/pdf/1612.02336v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-turing-machines-convergence-of-copy |
Repo | |
Framework | |
Sketching for Large-Scale Learning of Mixture Models
Title | Sketching for Large-Scale Learning of Mixture Models |
Authors | Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Pérez |
Abstract | Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a “compressive learning” framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing. |
Tasks | Compressive Sensing, Speaker Verification |
Published | 2016-06-09 |
URL | http://arxiv.org/abs/1606.02838v2 |
http://arxiv.org/pdf/1606.02838v2.pdf | |
PWC | https://paperswithcode.com/paper/sketching-for-large-scale-learning-of-mixture |
Repo | |
Framework | |
BaTFLED: Bayesian Tensor Factorization Linked to External Data
Title | BaTFLED: Bayesian Tensor Factorization Linked to External Data |
Authors | Nathan H Lazar, Mehmet Gönen, Kemal Sönmez |
Abstract | The vast majority of current machine learning algorithms are designed to predict single responses or a vector of responses, yet many types of response are more naturally organized as matrices or higher-order tensor objects where characteristics are shared across modes. We present a new machine learning algorithm BaTFLED (Bayesian Tensor Factorization Linked to External Data) that predicts values in a three-dimensional response tensor using input features for each of the dimensions. BaTFLED uses a probabilistic Bayesian framework to learn projection matrices mapping input features for each mode into latent representations that multiply to form the response tensor. By utilizing a Tucker decomposition, the model can capture weights for interactions between latent factors for each mode in a small core tensor. Priors that encourage sparsity in the projection matrices and core tensor allow for feature selection and model regularization. This method is shown to far outperform elastic net and neural net models on ‘cold start’ tasks from data simulated in a three-mode structure. Additionally, we apply the model to predict dose-response curves in a panel of breast cancer cell lines treated with drug compounds that was used as a Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge. |
Tasks | Feature Selection |
Published | 2016-12-09 |
URL | http://arxiv.org/abs/1612.02965v2 |
http://arxiv.org/pdf/1612.02965v2.pdf | |
PWC | https://paperswithcode.com/paper/batfled-bayesian-tensor-factorization-linked |
Repo | |
Framework | |
The Power of Arc Consistency for CSPs Defined by Partially-Ordered Forbidden Patterns
Title | The Power of Arc Consistency for CSPs Defined by Partially-Ordered Forbidden Patterns |
Authors | Martin C. Cooper, Stanislav Živný |
Abstract | Characterising tractable fragments of the constraint satisfaction problem (CSP) is an important challenge in theoretical computer science and artificial intelligence. Forbidding patterns (generic sub-instances) provides a means of defining CSP fragments which are neither exclusively language-based nor exclusively structure-based. It is known that the class of binary CSP instances in which the broken-triangle pattern (BTP) does not occur, a class which includes all tree-structured instances, are decided by arc consistency (AC), a ubiquitous reduction operation in constraint solvers. We provide a characterisation of simple partially-ordered forbidden patterns which have this AC-solvability property. It turns out that BTP is just one of five such AC-solvable patterns. The four other patterns allow us to exhibit new tractable classes. |
Tasks | |
Published | 2016-04-27 |
URL | http://arxiv.org/abs/1604.07981v4 |
http://arxiv.org/pdf/1604.07981v4.pdf | |
PWC | https://paperswithcode.com/paper/the-power-of-arc-consistency-for-csps-defined |
Repo | |
Framework | |
Semantic Reasoning for Context-aware Internet of Things Applications
Title | Semantic Reasoning for Context-aware Internet of Things Applications |
Authors | Altti Ilari Maarala, Xiang Su, Jukka Riekki |
Abstract | Advances in ICT are bringing into reality the vision of a large number of uniquely identifiable, interconnected objects and things that gather information from diverse physical environments and deliver the information to a variety of innovative applications and services. These sensing objects and things form the Internet of Things (IoT) that can improve energy and cost efficiency and automation in many different industry fields such as transportation and logistics, health care and manufacturing, and facilitate our everyday lives as well. IoT applications rely on real-time context data and allow sending information for driving the behaviors of users in intelligent environments. |
Tasks | |
Published | 2016-04-28 |
URL | http://arxiv.org/abs/1604.08340v1 |
http://arxiv.org/pdf/1604.08340v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-reasoning-for-context-aware-internet |
Repo | |
Framework | |
Object Specific Deep Learning Feature and Its Application to Face Detection
Title | Object Specific Deep Learning Feature and Its Application to Face Detection |
Authors | Xianxu Hou, Ke Sun, Linlin Shen, Guoping Qiu |
Abstract | We present a method for discovering and exploiting object specific deep learning features and use face detection as a case study. Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce an object specific channel (OSC) and systematically identifying it for the human face object has been developed. Based on the basic OSC features, we introduce a multi-resolution approach to constructing robust face heatmaps for fast face detection in unconstrained settings. We show that multi-resolution OSC can be used to develop state of the art face detectors which have the advantage of being simple and compact. |
Tasks | Face Detection |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01366v1 |
http://arxiv.org/pdf/1609.01366v1.pdf | |
PWC | https://paperswithcode.com/paper/object-specific-deep-learning-feature-and-its |
Repo | |
Framework | |
Efficiency Evaluation of Character-level RNN Training Schedules
Title | Efficiency Evaluation of Character-level RNN Training Schedules |
Authors | Cedric De Boom, Sam Leroux, Steven Bohez, Pieter Simoens, Thomas Demeester, Bart Dhoedt |
Abstract | We present four training and prediction schedules from the same character-level recurrent neural network. The efficiency of these schedules is tested in terms of model effectiveness as a function of training time and amount of training data seen. We show that the choice of training and prediction schedule potentially has a considerable impact on the prediction effectiveness for a given training budget. |
Tasks | |
Published | 2016-05-09 |
URL | http://arxiv.org/abs/1605.02486v1 |
http://arxiv.org/pdf/1605.02486v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiency-evaluation-of-character-level-rnn |
Repo | |
Framework | |
Exploring Strategies for Classification of External Stimuli Using Statistical Features of the Plant Electrical Response
Title | Exploring Strategies for Classification of External Stimuli Using Statistical Features of the Plant Electrical Response |
Authors | Shre Kumar Chatterjee, Saptarshi Das, Koushik Maharatna, Elisa Masi, Luisa Santopolo, Stefano Mancuso, Andrea Vitaletti |
Abstract | Plants sense their environment by producing electrical signals which in essence represent changes in underlying physiological processes. These electrical signals, when monitored, show both stochastic and deterministic dynamics. In this paper, we compute 11 statistical features from the raw non-stationary plant electrical signal time series to classify the stimulus applied (causing the electrical signal). By using different discriminant analysis based classification techniques, we successfully establish that there is enough information in the raw electrical signal to classify the stimuli. In the process, we also propose two standard features which consistently give good classification results for three types of stimuli - Sodium Chloride (NaCl), Sulphuric Acid (H2SO4) and Ozone (O3). This may facilitate reduction in the complexity involved in computing all the features for online classification of similar external stimuli in future. |
Tasks | Time Series |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09820v1 |
http://arxiv.org/pdf/1611.09820v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-strategies-for-classification-of |
Repo | |
Framework | |
Constitutional Precedent of Amicus Briefs
Title | Constitutional Precedent of Amicus Briefs |
Authors | Allen Huang, Lars Roemheld |
Abstract | We investigate shared language between U.S. Supreme Court majority opinions and interest groups’ corresponding amicus briefs. Specifically, we evaluate whether language that originated in an amicus brief acquired legal precedent status by being cited in the Court’s opinion. Using plagiarism detection software, automated querying of a large legal database, and manual analysis, we establish seven instances where interest group amici were able to formulate constitutional case law, setting binding legal precedent. We discuss several such instances for their implications in the Supreme Court’s creation of case law. |
Tasks | |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04672v2 |
http://arxiv.org/pdf/1606.04672v2.pdf | |
PWC | https://paperswithcode.com/paper/constitutional-precedent-of-amicus-briefs |
Repo | |
Framework | |
Controlling Robot Morphology from Incomplete Measurements
Title | Controlling Robot Morphology from Incomplete Measurements |
Authors | Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda |
Abstract | Mobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often incomplete. We marginalize the control over the missing measurements and evaluate an explicit safety condition. If the safety condition is violated, tactile terrain exploration by the body-mounted robotic arm gathers the missing data. |
Tasks | |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02739v1 |
http://arxiv.org/pdf/1612.02739v1.pdf | |
PWC | https://paperswithcode.com/paper/controlling-robot-morphology-from-incomplete |
Repo | |
Framework | |
Signs in time: Encoding human motion as a temporal image
Title | Signs in time: Encoding human motion as a temporal image |
Authors | Joon Son Chung, Andrew Zisserman |
Abstract | The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localizing signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise. |
Tasks | Time Series |
Published | 2016-08-06 |
URL | http://arxiv.org/abs/1608.02059v1 |
http://arxiv.org/pdf/1608.02059v1.pdf | |
PWC | https://paperswithcode.com/paper/signs-in-time-encoding-human-motion-as-a |
Repo | |
Framework | |
Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Title | Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection |
Authors | Hisham Cholakkal, Jubin Johnson, Deepu Rajan |
Abstract | Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications. |
Tasks | Object Detection, Salient Object Detection |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05345v3 |
http://arxiv.org/pdf/1611.05345v3.pdf | |
PWC | https://paperswithcode.com/paper/backtracking-spatial-pyramid-pooling-spp |
Repo | |
Framework | |