May 6, 2019

2879 words 14 mins read

Paper Group ANR 182

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN. Online shopping behavior study based on multi-granularity opinion mining: China vs. America. Neural Turing Machines: Convergence of Copy Tasks. Sketching for Large-Scale Learning of M …

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction


Title	Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction
Authors	Jianfeng Dong, Xirong Li, Cees G. M. Snoek
Abstract	This paper strives to find the sentence best describing the content of an image or video. Different from existing works, which rely on a joint subspace for image / video to sentence matching, we propose to do so in a visual space only. We contribute Word2VisualVec, a deep neural network architecture that learns to predict a deep visual encoding of textual input based on sentence vectorization and a multi-layer perceptron. We thoroughly analyze its architectural design, by varying the sentence vectorization strategy, network depth and the deep feature to predict for image to sentence matching. We also generalize Word2VisualVec for matching a video to a sentence, by extending the predictive abilities to 3-D ConvNet features as well as a visual-audio representation. Experiments on four challenging image and video benchmarks detail Word2VisualVec’s properties, capabilities for image and video to sentence matching, and on all datasets its state-of-the-art results.
Tasks
Published	2016-04-23
URL	http://arxiv.org/abs/1604.06838v2
PDF	http://arxiv.org/pdf/1604.06838v2.pdf
PWC	https://paperswithcode.com/paper/word2visualvec-image-and-video-to-sentence
Repo
Framework

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN


Title	Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN
Authors	Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang
Abstract	Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential Deep Trajectory Descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51 and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.
Tasks	Temporal Action Localization
Published	2016-09-10
URL	http://arxiv.org/abs/1609.03056v2
PDF	http://arxiv.org/pdf/1609.03056v2.pdf
PWC	https://paperswithcode.com/paper/sequential-deep-trajectory-descriptor-for
Repo
Framework

Online shopping behavior study based on multi-granularity opinion mining: China vs. America


Title	Online shopping behavior study based on multi-granularity opinion mining: China vs. America
Authors	Qingqing Zhou, Rui Xia, Chengzhi Zhang
Abstract	With the development of e-commerce, many products are now being sold worldwide, and manufacturers are eager to obtain a better understanding of customer behavior in various regions. To achieve this goal, most previous efforts have focused mainly on questionnaires, which are time-consuming and costly. The tremendous volume of product reviews on e-commerce websites has seen a new trend emerge, whereby manufacturers attempt to understand user preferences by analyzing online reviews. Following this trend, this paper addresses the problem of studying customer behavior by exploiting recently developed opinion mining techniques. This work is novel for three reasons. First, questionnaire-based investigation is automatically enabled by employing algorithms for template-based question generation and opinion mining-based answer extraction. Using this system, manufacturers are able to obtain reports of customer behavior featuring a much larger sample size, more direct information, a higher degree of automation, and a lower cost. Second, international customer behavior study is made easier by integrating tools for multilingual opinion mining. Third, this is the first time an automatic questionnaire investigation has been conducted to compare customer behavior in China and America, where product reviews are written and read in Chinese and English, respectively. Our study on digital cameras, smartphones, and tablet computers yields three findings. First, Chinese customers follow the Doctrine of the Mean, and often use euphemistic expressions, while American customers express their opinions more directly. Second, Chinese customers care more about general feelings, while American customers pay more attention to product details. Third, Chinese customers focus on external features, while American customers care more about the internal features of products.
Tasks	Opinion Mining, Question Generation
Published	2016-03-26
URL	http://arxiv.org/abs/1603.08089v1
PDF	http://arxiv.org/pdf/1603.08089v1.pdf
PWC	https://paperswithcode.com/paper/online-shopping-behavior-study-based-on-multi
Repo
Framework

Neural Turing Machines: Convergence of Copy Tasks


Title	Neural Turing Machines: Convergence of Copy Tasks
Authors	Janez Aleš
Abstract	The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been studied to overcome these difficulties. In this report we focus on improving the quality of prediction of the original linear memory architecture on copy and repeat copy tasks. Copy task predictions on sequences of length six times larger than those the neural Turing machine was trained on prove to be highly accurate and so do predictions of repeat copy tasks for sequences with twice the repetition number and twice the sequence length neural Turing machine was trained on.
Tasks
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02336v1
PDF	http://arxiv.org/pdf/1612.02336v1.pdf
PWC	https://paperswithcode.com/paper/neural-turing-machines-convergence-of-copy
Repo
Framework

Sketching for Large-Scale Learning of Mixture Models


Title	Sketching for Large-Scale Learning of Mixture Models
Authors	Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Pérez
Abstract	Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a “compressive learning” framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing.
Tasks	Compressive Sensing, Speaker Verification
Published	2016-06-09
URL	http://arxiv.org/abs/1606.02838v2
PDF	http://arxiv.org/pdf/1606.02838v2.pdf
PWC	https://paperswithcode.com/paper/sketching-for-large-scale-learning-of-mixture
Repo
Framework

BaTFLED: Bayesian Tensor Factorization Linked to External Data


Title	BaTFLED: Bayesian Tensor Factorization Linked to External Data
Authors	Nathan H Lazar, Mehmet Gönen, Kemal Sönmez
Abstract	The vast majority of current machine learning algorithms are designed to predict single responses or a vector of responses, yet many types of response are more naturally organized as matrices or higher-order tensor objects where characteristics are shared across modes. We present a new machine learning algorithm BaTFLED (Bayesian Tensor Factorization Linked to External Data) that predicts values in a three-dimensional response tensor using input features for each of the dimensions. BaTFLED uses a probabilistic Bayesian framework to learn projection matrices mapping input features for each mode into latent representations that multiply to form the response tensor. By utilizing a Tucker decomposition, the model can capture weights for interactions between latent factors for each mode in a small core tensor. Priors that encourage sparsity in the projection matrices and core tensor allow for feature selection and model regularization. This method is shown to far outperform elastic net and neural net models on ‘cold start’ tasks from data simulated in a three-mode structure. Additionally, we apply the model to predict dose-response curves in a panel of breast cancer cell lines treated with drug compounds that was used as a Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge.
Tasks	Feature Selection
Published	2016-12-09
URL	http://arxiv.org/abs/1612.02965v2
PDF	http://arxiv.org/pdf/1612.02965v2.pdf
PWC	https://paperswithcode.com/paper/batfled-bayesian-tensor-factorization-linked
Repo
Framework

The Power of Arc Consistency for CSPs Defined by Partially-Ordered Forbidden Patterns


Title	The Power of Arc Consistency for CSPs Defined by Partially-Ordered Forbidden Patterns
Authors	Martin C. Cooper, Stanislav Živný
Abstract	Characterising tractable fragments of the constraint satisfaction problem (CSP) is an important challenge in theoretical computer science and artificial intelligence. Forbidding patterns (generic sub-instances) provides a means of defining CSP fragments which are neither exclusively language-based nor exclusively structure-based. It is known that the class of binary CSP instances in which the broken-triangle pattern (BTP) does not occur, a class which includes all tree-structured instances, are decided by arc consistency (AC), a ubiquitous reduction operation in constraint solvers. We provide a characterisation of simple partially-ordered forbidden patterns which have this AC-solvability property. It turns out that BTP is just one of five such AC-solvable patterns. The four other patterns allow us to exhibit new tractable classes.
Tasks
Published	2016-04-27
URL	http://arxiv.org/abs/1604.07981v4
PDF	http://arxiv.org/pdf/1604.07981v4.pdf
PWC	https://paperswithcode.com/paper/the-power-of-arc-consistency-for-csps-defined
Repo
Framework

Semantic Reasoning for Context-aware Internet of Things Applications


Title	Semantic Reasoning for Context-aware Internet of Things Applications
Authors	Altti Ilari Maarala, Xiang Su, Jukka Riekki
Abstract	Advances in ICT are bringing into reality the vision of a large number of uniquely identifiable, interconnected objects and things that gather information from diverse physical environments and deliver the information to a variety of innovative applications and services. These sensing objects and things form the Internet of Things (IoT) that can improve energy and cost efficiency and automation in many different industry fields such as transportation and logistics, health care and manufacturing, and facilitate our everyday lives as well. IoT applications rely on real-time context data and allow sending information for driving the behaviors of users in intelligent environments.
Tasks
Published	2016-04-28
URL	http://arxiv.org/abs/1604.08340v1
PDF	http://arxiv.org/pdf/1604.08340v1.pdf
PWC	https://paperswithcode.com/paper/semantic-reasoning-for-context-aware-internet
Repo
Framework

Object Specific Deep Learning Feature and Its Application to Face Detection


Title	Object Specific Deep Learning Feature and Its Application to Face Detection
Authors	Xianxu Hou, Ke Sun, Linlin Shen, Guoping Qiu
Abstract	We present a method for discovering and exploiting object specific deep learning features and use face detection as a case study. Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce an object specific channel (OSC) and systematically identifying it for the human face object has been developed. Based on the basic OSC features, we introduce a multi-resolution approach to constructing robust face heatmaps for fast face detection in unconstrained settings. We show that multi-resolution OSC can be used to develop state of the art face detectors which have the advantage of being simple and compact.
Tasks	Face Detection
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01366v1
PDF	http://arxiv.org/pdf/1609.01366v1.pdf
PWC	https://paperswithcode.com/paper/object-specific-deep-learning-feature-and-its
Repo
Framework

Efficiency Evaluation of Character-level RNN Training Schedules


Title	Efficiency Evaluation of Character-level RNN Training Schedules
Authors	Cedric De Boom, Sam Leroux, Steven Bohez, Pieter Simoens, Thomas Demeester, Bart Dhoedt
Abstract	We present four training and prediction schedules from the same character-level recurrent neural network. The efficiency of these schedules is tested in terms of model effectiveness as a function of training time and amount of training data seen. We show that the choice of training and prediction schedule potentially has a considerable impact on the prediction effectiveness for a given training budget.
Tasks
Published	2016-05-09
URL	http://arxiv.org/abs/1605.02486v1
PDF	http://arxiv.org/pdf/1605.02486v1.pdf
PWC	https://paperswithcode.com/paper/efficiency-evaluation-of-character-level-rnn
Repo
Framework

Exploring Strategies for Classification of External Stimuli Using Statistical Features of the Plant Electrical Response


Title	Exploring Strategies for Classification of External Stimuli Using Statistical Features of the Plant Electrical Response
Authors	Shre Kumar Chatterjee, Saptarshi Das, Koushik Maharatna, Elisa Masi, Luisa Santopolo, Stefano Mancuso, Andrea Vitaletti
Abstract	Plants sense their environment by producing electrical signals which in essence represent changes in underlying physiological processes. These electrical signals, when monitored, show both stochastic and deterministic dynamics. In this paper, we compute 11 statistical features from the raw non-stationary plant electrical signal time series to classify the stimulus applied (causing the electrical signal). By using different discriminant analysis based classification techniques, we successfully establish that there is enough information in the raw electrical signal to classify the stimuli. In the process, we also propose two standard features which consistently give good classification results for three types of stimuli - Sodium Chloride (NaCl), Sulphuric Acid (H2SO4) and Ozone (O3). This may facilitate reduction in the complexity involved in computing all the features for online classification of similar external stimuli in future.
Tasks	Time Series
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09820v1
PDF	http://arxiv.org/pdf/1611.09820v1.pdf
PWC	https://paperswithcode.com/paper/exploring-strategies-for-classification-of
Repo
Framework

Constitutional Precedent of Amicus Briefs


Title	Constitutional Precedent of Amicus Briefs
Authors	Allen Huang, Lars Roemheld
Abstract	We investigate shared language between U.S. Supreme Court majority opinions and interest groups’ corresponding amicus briefs. Specifically, we evaluate whether language that originated in an amicus brief acquired legal precedent status by being cited in the Court’s opinion. Using plagiarism detection software, automated querying of a large legal database, and manual analysis, we establish seven instances where interest group amici were able to formulate constitutional case law, setting binding legal precedent. We discuss several such instances for their implications in the Supreme Court’s creation of case law.
Tasks
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04672v2
PDF	http://arxiv.org/pdf/1606.04672v2.pdf
PWC	https://paperswithcode.com/paper/constitutional-precedent-of-amicus-briefs
Repo
Framework

Controlling Robot Morphology from Incomplete Measurements


Title	Controlling Robot Morphology from Incomplete Measurements
Authors	Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda
Abstract	Mobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often incomplete. We marginalize the control over the missing measurements and evaluate an explicit safety condition. If the safety condition is violated, tactile terrain exploration by the body-mounted robotic arm gathers the missing data.
Tasks
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02739v1
PDF	http://arxiv.org/pdf/1612.02739v1.pdf
PWC	https://paperswithcode.com/paper/controlling-robot-morphology-from-incomplete
Repo
Framework

Signs in time: Encoding human motion as a temporal image


Title	Signs in time: Encoding human motion as a temporal image
Authors	Joon Son Chung, Andrew Zisserman
Abstract	The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localizing signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise.
Tasks	Time Series
Published	2016-08-06
URL	http://arxiv.org/abs/1608.02059v1
PDF	http://arxiv.org/pdf/1608.02059v1.pdf
PWC	https://paperswithcode.com/paper/signs-in-time-encoding-human-motion-as-a
Repo
Framework

Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection


Title	Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Authors	Hisham Cholakkal, Jubin Johnson, Deepu Rajan
Abstract	Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications.
Tasks	Object Detection, Salient Object Detection
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05345v3
PDF	http://arxiv.org/pdf/1611.05345v3.pdf
PWC	https://paperswithcode.com/paper/backtracking-spatial-pyramid-pooling-spp
Repo
Framework