July 27, 2019

3278 words 16 mins read

Paper Group ANR 656

Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds. ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets. Using Deep Neural Network Approximate Bayesian Network. Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network …

Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks


Title	Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks
Authors	Chia-Wei Ao, Hung-yi Lee
Abstract	Retrieving spoken content with spoken queries, or query-by- example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of the existing query-by-example STD system, yielding performance comparable to the existing system but with a lower search time complexity.
Tasks
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00354v2
PDF	http://arxiv.org/pdf/1709.00354v2.pdf
PWC	https://paperswithcode.com/paper/query-by-example-spoken-term-detection-using
Repo
Framework

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds


Title	Variants of RMSProp and Adagrad with Logarithmic Regret Bounds
Authors	Mahesh Chandra Mukkamala, Matthias Hein
Abstract	Adaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show $\sqrt{T}$-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp for which we show logarithmic regret bounds for strongly convex functions. Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.
Tasks
Published	2017-06-17
URL	http://arxiv.org/abs/1706.05507v2
PDF	http://arxiv.org/pdf/1706.05507v2.pdf
PWC	https://paperswithcode.com/paper/variants-of-rmsprop-and-adagrad-with
Repo
Framework

ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets


Title	ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets
Authors	Haojian Jin, Yale Song, Koji Yatani
Abstract	Video consumption is being shifted from sit-and-watch to selective skimming. Existing video player interfaces, however, only provide indirect manipulation to support this emerging behavior. Video summarization alleviates this issue to some extent, shortening a video based on the desired length of a summary as an input variable. But an optimal length of a summarized video is often not available in advance. Moreover, the user cannot edit the summary once it is produced, limiting its practical applications. We argue that video summarization should be an interactive, mixed-initiative process in which users have control over the summarization procedure while algorithms help users achieve their goal via video understanding. In this paper, we introduce ElasticPlay, a mixed-initiative approach that combines an advanced video summarization technique with direct interface manipulation to help users control the video summarization process. Users can specify a time budget for the remaining content while watching a video; our system then immediately updates the playback plan using our proposed cut-and-forward algorithm, determining which parts to skip or to fast-forward. This interactive process allows users to fine-tune the summarization result with immediate feedback. We show that our system outperforms existing video summarization techniques on the TVSum50 dataset. We also report two lab studies (22 participants) and a Mechanical Turk deployment study (60 participants), and show that the participants responded favorably to ElasticPlay.
Tasks	Video Summarization, Video Understanding
Published	2017-08-23
URL	http://arxiv.org/abs/1708.06858v1
PDF	http://arxiv.org/pdf/1708.06858v1.pdf
PWC	https://paperswithcode.com/paper/elasticplay-interactive-video-summarization
Repo
Framework

Using Deep Neural Network Approximate Bayesian Network


Title	Using Deep Neural Network Approximate Bayesian Network
Authors	Jie Jia, Honggang Zhou, Yunchun Li
Abstract	We present a new method to approximate posterior probabilities of Bayesian Network using Deep Neural Network. Experiment results on several public Bayesian Network datasets shows that Deep Neural Network is capable of learning joint probability distri- bution of Bayesian Network by learning from a few observation and posterior probability distribution pairs with high accuracy. Compared with traditional approximate method likelihood weighting sampling algorithm, our method is much faster and gains higher accuracy in medium sized Bayesian Network. Another advantage of our method is that our method can be parallelled much easier in GPU without extra effort. We also ex- plored the connection between the accuracy of our model and the number of training examples. The result shows that our model saturate as the number of training examples grow and we don’t need many training examples to get reasonably good result. Another contribution of our work is that we have shown discriminative model like Deep Neural Network can approximate generative model like Bayesian Network.
Tasks
Published	2017-12-31
URL	http://arxiv.org/abs/1801.00282v2
PDF	http://arxiv.org/pdf/1801.00282v2.pdf
PWC	https://paperswithcode.com/paper/using-deep-neural-network-approximate
Repo
Framework

Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach


Title	Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Authors	Aidean Sharghi, Jacob S. Laurel, Boqing Gong
Abstract	Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate the performance of a video summarizer. To tackle the first problem, we explore the recently proposed query-focused video summarization which introduces user preferences in the form of text queries about the video into the summarization process. We propose a memory network parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. To address the second challenge, we contend that a good evaluation metric for video summarization should focus on the semantic information that humans can perceive rather than the visual features or temporal overlaps. To this end, we collect dense per-video-shot concept annotations, compile a new dataset, and suggest an efficient evaluation method defined upon the concept annotations. We conduct extensive experiments contrasting our video summarizer to existing ones and present detailed analyses about the dataset and the new evaluation method.
Tasks	Video Summarization
Published	2017-07-16
URL	http://arxiv.org/abs/1707.04960v1
PDF	http://arxiv.org/pdf/1707.04960v1.pdf
PWC	https://paperswithcode.com/paper/query-focused-video-summarization-dataset
Repo
Framework

Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks


Title	Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks
Authors	Matthew Nokleby, Waheed U. Bajwa
Abstract	Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a network of nodes—each one of which has a stream of data arriving at a constant rate—that solve a stochastic convex optimization problem by collaborating with each other over rate-limited communication links. To this end, we present and analyze two algorithms—termed distributed stochastic approximation mirror descent (D-SAMD) and accelerated distributed stochastic approximation mirror descent (AD-SAMD)—that are based on two stochastic variants of mirror descent and in which nodes collaborate via approximate averaging of the local, noisy subgradients using distributed consensus. Our main contributions are (i) bounds on the convergence rates of D-SAMD and AD-SAMD in terms of the number of nodes, network topology, and ratio of the data streaming and communication rates, and (ii) sufficient conditions for order-optimum convergence of these algorithms. In particular, we show that for sufficiently well-connected networks, distributed learning schemes can obtain order-optimum convergence even if the communications rate is small. Further we find that the use of accelerated methods significantly enlarges the regime in which order-optimum convergence is achieved; this is in contrast to the centralized setting, where accelerated methods usually offer only a modest improvement. Finally, we demonstrate the effectiveness of the proposed algorithms using numerical experiments.
Tasks	Stochastic Optimization
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07888v4
PDF	http://arxiv.org/pdf/1704.07888v4.pdf
PWC	https://paperswithcode.com/paper/stochastic-optimization-from-distributed
Repo
Framework

Query-Aware Sparse Coding for Multi-Video Summarization


Title	Query-Aware Sparse Coding for Multi-Video Summarization
Authors	Zhong Ji, Yaru Ma, Yanwei Pang, Xuelong Li
Abstract	Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest. Video summarization aims at providing such a technique by transforming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the user’s search intent. To this end, this paper proposes a novel query-aware approach by formulating the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the important preference information to reveal the query intent. To provide a user-friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific events related to the query by using an unsupervised multi-graph fusion method. We release a new public dataset named MVS1K, which contains about 1, 000 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on MVS1K dataset validate our approaches produce superior objective and subjective results against several recently proposed approaches.
Tasks	Video Summarization
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04021v1
PDF	http://arxiv.org/pdf/1707.04021v1.pdf
PWC	https://paperswithcode.com/paper/query-aware-sparse-coding-for-multi-video
Repo
Framework


Title	A Correlative Denoising Autoencoder to Model Social Influence for Top-N Recommender System
Authors	Yiteng Pan, Fazhi He, Haiping Yu
Abstract	In recent years, there are numerous works been proposed to leverage the techniques of deep learning to improve social-aware recommendation performance. In most cases, it requires a larger number of data to train a robust deep learning model, which contains a lot of parameters to fit training data. However, both data of user ratings and social networks are facing critical sparse problem, which makes it not easy to train a robust deep neural network model. Towards this problem, we propose a novel Correlative Denoising Autoencoder (CoDAE) method by taking correlations between users with multiple roles into account to learn robust representations from sparse inputs of ratings and social networks for recommendation. We develop the CoDAE model by utilizing three separated autoencoders to learn user features with roles of rater, truster and trustee, respectively. Especially, on account of that each input unit of user vectors with roles of truster and trustee is corresponding to a particular user, we propose to utilize shared parameters to learn common information of the units that corresponding to same users. Moreover, we propose a related regularization term to learn correlations between user features that learnt by the three subnetworks of CoDAE model. We further conduct a series of experiments to evaluate the proposed method on two public datasets for Top-N recommendation task. The experimental results demonstrate that the proposed model outperforms state-of-the-art algorithms on rank-sensitive metrics of MAP and NDCG.
Tasks	Denoising, Recommendation Systems
Published	2017-03-06
URL	https://arxiv.org/abs/1703.01760v3
PDF	https://arxiv.org/pdf/1703.01760v3.pdf
PWC	https://paperswithcode.com/paper/trust-aware-collaborative-denoising-auto
Repo
Framework

Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference


Title	Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference
Authors	Bo Li, Yuchao Dai, Mingyi He
Abstract	Monocular depth estimation is a challenging task in complex compositions depicting multiple objects of diverse scales. Albeit the recent great progress thanks to the deep convolutional neural networks (CNNs), the state-of-the-art monocular depth estimation methods still fall short to handle such real-world challenging scenarios. In this paper, we propose a deep end-to-end learning framework to tackle these challenges, which learns the direct mapping from a color image to the corresponding depth map. First, we represent monocular depth estimation as a multi-category dense labeling task by contrast to the regression based formulation. In this way, we could build upon the recent progress in dense labeling such as semantic segmentation. Second, we fuse different side-outputs from our front-end dilated convolutional neural network in a hierarchical way to exploit the multi-scale depth cues for depth estimation, which is critical to achieve scale-aware depth estimation. Third, we propose to utilize soft-weighted-sum inference instead of the hard-max inference, transforming the discretized depth score to continuous depth value. Thus, we reduce the influence of quantization error and improve the robustness of our method. Extensive experiments on the NYU Depth V2 and KITTI datasets show the superiority of our method compared with current state-of-the-art methods. Furthermore, experiments on the NYU V2 dataset reveal that our model is able to learn the probability distribution of depth.
Tasks	Depth Estimation, Monocular Depth Estimation, Quantization, Semantic Segmentation
Published	2017-08-02
URL	http://arxiv.org/abs/1708.02287v1
PDF	http://arxiv.org/pdf/1708.02287v1.pdf
PWC	https://paperswithcode.com/paper/monocular-depth-estimation-with-hierarchical
Repo
Framework


Title	A Tutorial on Hawkes Processes for Events in Social Media
Authors	Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra, Lexing Xie
Abstract	This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data - we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
Tasks	Point Processes
Published	2017-08-21
URL	http://arxiv.org/abs/1708.06401v2
PDF	http://arxiv.org/pdf/1708.06401v2.pdf
PWC	https://paperswithcode.com/paper/a-tutorial-on-hawkes-processes-for-events-in
Repo
Framework

Exact MAP Inference by Avoiding Fractional Vertices


Title	Exact MAP Inference by Avoiding Fractional Vertices
Authors	Erik M. Lindgren, Alexandros G. Dimakis, Adam Klivans
Abstract	Given a graphical model, one essential problem is MAP inference, that is, finding the most likely configuration of states according to the model. Although this problem is NP-hard, large instances can be solved in practice. A major open question is to explain why this is true. We give a natural condition under which we can provably perform MAP inference in polynomial time. We require that the number of fractional vertices in the LP relaxation exceeding the optimal solution is bounded by a polynomial in the problem size. This resolves an open question by Dimakis, Gohari, and Wainwright. In contrast, for general LP relaxations of integer programs, known techniques can only handle a constant number of fractional vertices whose value exceeds the optimal solution. We experimentally verify this condition and demonstrate how efficient various integer programming methods are at removing fractional solutions.
Tasks
Published	2017-03-08
URL	http://arxiv.org/abs/1703.02689v1
PDF	http://arxiv.org/pdf/1703.02689v1.pdf
PWC	https://paperswithcode.com/paper/exact-map-inference-by-avoiding-fractional
Repo
Framework


Title	Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction
Authors	Benoît Massé, Silèye Ba, Radu Horaud
Abstract	The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and head movements. Both VFOA and gaze are modeled as latent variables in a Bayesian switching state-space model. The proposed formulation leads to a tractable learning procedure and to an efficient algorithm that simultaneously tracks gaze and visual focus. The method is tested and benchmarked using two publicly available datasets that contain typical multi-party human-robot and human-human interactions.
Tasks
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04727v2
PDF	http://arxiv.org/pdf/1703.04727v2.pdf
PWC	https://paperswithcode.com/paper/tracking-gaze-and-visual-focus-of-attention
Repo
Framework

Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction


Title	Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction
Authors	Giles Tetteh, Markus Rempfler, Bjoern H. Menze, Claus Zimmer
Abstract	Feature extraction is a very crucial task in image and pixel (voxel) classification and regression in biomedical image modeling. In this work we present a machine learning based feature extraction scheme based on inception models for pixel classification tasks. We extract features under multi-scale and multi-layer schemes through convolutional operators. Layers of Fully Convolutional Network are later stacked on this feature extraction layers and trained end-to-end for the purpose of classification. We test our model on the DRIVE and STARE public data sets for the purpose of segmentation and centerline detection and it out performs most existing hand crafted or deterministic feature schemes found in literature. We achieve an average maximum Dice of 0.85 on the DRIVE data set which out performs the scores from the second human annotator of this data set. We also achieve an average maximum Dice of 0.85 and kappa of 0.84 on the STARE data set. Though these datasets are mainly 2-D we also propose ways of extending this feature extraction scheme to handle 3-D datasets.
Tasks
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03743v1
PDF	http://arxiv.org/pdf/1704.03743v1.pdf
PWC	https://paperswithcode.com/paper/deep-fext-deep-feature-extraction-for-vessel
Repo
Framework

Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization


Title	Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization
Authors	Rameswar Panda, Amit K. Roy-Chowdhury
Abstract	Most traditional video summarization methods are designed to generate effective summaries for single-view videos, and thus they cannot fully exploit the complicated intra and inter-view correlations in summarizing multi-view videos in a camera network. In this paper, with the aim of summarizing multi-view videos, we introduce a novel unsupervised framework via joint embedding and sparse representative selection. The objective function is two-fold. The first is to capture the multi-view correlations via an embedding, which helps in extracting a diverse set of representatives. The second is to use a `2;1- norm to model the sparsity while selecting representative shots for the summary. We propose to jointly optimize both of the objectives, such that embedding can not only characterize the correlations, but also indicate the requirements of sparse representative selection. We present an efficient alternating algorithm based on half-quadratic minimization to solve the proposed non-smooth and non-convex objective with convergence analysis. A key advantage of the proposed approach with respect to the state-of-the-art is that it can summarize multi-view videos without assuming any prior correspondences/alignment between them, e.g., uncalibrated camera networks. Rigorous experiments on several multi-view datasets demonstrate that our approach clearly outperforms the state-of-the-art methods. \|
Tasks	Video Summarization
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03121v1
PDF	http://arxiv.org/pdf/1706.03121v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-surveillance-video-summarization
Repo
Framework

Reinforcement Learning Algorithm Selection


Title	Reinforcement Learning Algorithm Selection
Authors	Romain Laroche, Raphael Feraud
Abstract	This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.
Tasks
Published	2017-01-30
URL	http://arxiv.org/abs/1701.08810v3
PDF	http://arxiv.org/pdf/1701.08810v3.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-algorithm-selection
Repo
Framework

Paper Group ANR 656

Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets

Using Deep Neural Network Approximate Bayesian Network

Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach

Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks

Query-Aware Sparse Coding for Multi-Video Summarization

A Correlative Denoising Autoencoder to Model Social Influence for Top-N Recommender System

Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference

A Tutorial on Hawkes Processes for Events in Social Media

Exact MAP Inference by Avoiding Fractional Vertices

Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

Deep-FExt: Deep Feature Extraction for Vessel Segmentation and Centerline Prediction

Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization

Reinforcement Learning Algorithm Selection

Paper Group ANR 567

Paper Group ANR 502

Paper Group ANR 592