May 6, 2019

2626 words 13 mins read

Paper Group ANR 359

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model. Online Dual Coordinate Ascent Learning. Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction. Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering. A Recurrent Encoder-Decoder Network for Sequential Face Alignment. Comp …

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model


Title	Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model
Authors	Shi Feng, Shujie Liu, Mu Li, Ming Zhou
Abstract	Neural machine translation has shown very promising results lately. Most NMT models follow the encoder-decoder framework. To make encoder-decoder models more flexible, attention mechanism was introduced to machine translation and also other tasks like speech recognition and image captioning. We observe that the quality of translation by attention-based encoder-decoder can be significantly damaged when the alignment is incorrect. We attribute these problems to the lack of distortion and fertility models. Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation. Our proposed method achieved an improvement of 2 BLEU points over the original attention-based encoder-decoder.
Tasks	Image Captioning, Machine Translation, Speech Recognition
Published	2016-01-13
URL	http://arxiv.org/abs/1601.03317v3
PDF	http://arxiv.org/pdf/1601.03317v3.pdf
PWC	https://paperswithcode.com/paper/implicit-distortion-and-fertility-models-for
Repo
Framework

Online Dual Coordinate Ascent Learning


Title	Online Dual Coordinate Ascent Learning
Authors	Bicheng Ying, Kun Yuan, Ali H. Sayed
Abstract	The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an {\em online} dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature embeds the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly attractive for online learning scenarios.
Tasks
Published	2016-02-24
URL	http://arxiv.org/abs/1602.07630v1
PDF	http://arxiv.org/pdf/1602.07630v1.pdf
PWC	https://paperswithcode.com/paper/online-dual-coordinate-ascent-learning
Repo
Framework

Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction


Title	Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction
Authors	Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar
Abstract	This paper addresses the problem of natural image segmentation by extracting information from a multi-layer array which is constructed based on color, gradient, and statistical properties of the local neighborhoods in an image. A Gaussian Mixture Model (GMM) is used to improve the effectiveness of local spectral histogram features. Grouping these features leads to forming a rough initial over-segmented layer which contains coherent regions of pixels. The regions are merged by using two proposed functions for calculating the distance between two neighboring regions and making decisions about their merging. Extensive experiments are performed on the Berkeley Segmentation Dataset to evaluate the performance of our proposed method and compare the results with the recent state-of-the-art methods. The experimental results indicate that our method achieves higher level of accuracy for natural images compared to recent methods.
Tasks	Semantic Segmentation
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07586v2
PDF	http://arxiv.org/pdf/1605.07586v2.pdf
PWC	https://paperswithcode.com/paper/natural-scene-image-segmentation-based-on
Repo
Framework

Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering


Title	Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering
Authors	Du Yong Kim, Ba-Ngu Vo, Ba-Tuong Vo
Abstract	This paper proposes an online visual multi-object tracking algorithm using a top-down Bayesian formulation that seamlessly integrates state estimation, track management, clutter rejection, occlusion and mis-detection handling into a single recursion. This is achieved by modeling the multi-object state as labeled random finite set and using the Bayes recursion to propagate the multi-object filtering density forward in time. The proposed filter updates tracks with detections but switches to image data when mis-detection occurs, thereby exploiting the efficiency of detection data and the accuracy of image data. Furthermore the labeled random finite set framework enables the incorporation of prior knowledge that mis-detections of long tracks which occur in the middle of the scene are likely to be due to occlusions. Such prior knowledge can be exploited to improve occlusion handling, especially long occlusions that can lead to premature track termination in on-line multi-object tracking. Tracking performance are compared to state-of-the-art algorithms on well-known benchmark video datasets.
Tasks	Multi-Object Tracking, Object Tracking
Published	2016-11-18
URL	http://arxiv.org/abs/1611.06011v2
PDF	http://arxiv.org/pdf/1611.06011v2.pdf
PWC	https://paperswithcode.com/paper/online-visual-multi-object-tracking-via
Repo
Framework

A Recurrent Encoder-Decoder Network for Sequential Face Alignment


Title	A Recurrent Encoder-Decoder Network for Sequential Face Alignment
Authors	Xi Peng, Rogerio S. Feris, Xiaoyu Wang, Dimitris N. Metaxas
Abstract	We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets.
Tasks	Face Alignment
Published	2016-08-19
URL	http://arxiv.org/abs/1608.05477v2
PDF	http://arxiv.org/pdf/1608.05477v2.pdf
PWC	https://paperswithcode.com/paper/a-recurrent-encoder-decoder-network-for
Repo
Framework

Comparison of several short-term traffic speed forecasting models


Title	Comparison of several short-term traffic speed forecasting models
Authors	John Boaz Lee, Kardi Teknomo
Abstract	The widespread adoption of smartphones in recent years has made it possible for us to collect large amounts of traffic data. Special software installed on the phones of drivers allow us to gather GPS trajectories of their vehicles on the road network. In this paper, we simulate the trajectories of multiple agents on a road network and use various models to forecast the short-term traffic speed of various links. Our results show that traditional techniques like multiple regression and artificial neural networks work well but simpler adaptive models that do not require prior training also perform comparatively well.
Tasks
Published	2016-09-06
URL	http://arxiv.org/abs/1609.02409v1
PDF	http://arxiv.org/pdf/1609.02409v1.pdf
PWC	https://paperswithcode.com/paper/comparison-of-several-short-term-traffic
Repo
Framework

RGBD Datasets: Past, Present and Future


Title	RGBD Datasets: Past, Present and Future
Authors	Michael Firman
Abstract	Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.
Tasks	Gesture Recognition, Object Tracking, Pose Estimation
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00999v2
PDF	http://arxiv.org/pdf/1604.00999v2.pdf
PWC	https://paperswithcode.com/paper/rgbd-datasets-past-present-and-future
Repo
Framework

Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification


Title	Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification
Authors	Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, Katia Sycara
Abstract	Due to the lack of structured knowledge applied in learning distributed representation of cate- gories, existing work cannot incorporate category hierarchies into entity information. We propose a framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases. The framework allows to com- pute meaningful semantic relatedness between entities and categories. Our framework can han- dle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification.
Tasks
Published	2016-07-27
URL	http://arxiv.org/abs/1607.07956v1
PDF	http://arxiv.org/pdf/1607.07956v1.pdf
PWC	https://paperswithcode.com/paper/joint-embedding-of-hierarchical-categories
Repo
Framework

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data


Title	Semi Supervised Preposition-Sense Disambiguation using Multilingual Data
Authors	Hila Gonen, Yoav Goldberg
Abstract	Prepositions are very common and very ambiguous, and understanding their sense is critical for understanding the meaning of the sentence. Supervised corpora for the preposition-sense disambiguation task are small, suggesting a semi-supervised approach to the task. We show that signals from unannotated multilingual data can be used to improve supervised preposition-sense disambiguation. Our approach pre-trains an LSTM encoder for predicting the translation of a preposition, and then incorporates the pre-trained encoder as a component in a supervised classification system, and fine-tunes it for the task. The multilingual signals consistently improve results on two preposition-sense datasets.
Tasks
Published	2016-11-27
URL	http://arxiv.org/abs/1611.08813v1
PDF	http://arxiv.org/pdf/1611.08813v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-preposition-sense
Repo
Framework

A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images


Title	A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images
Authors	Jun Li, Reinhard Klein, Angela Yao
Abstract	Estimating depth from a single RGB image is an ill-posed and inherently ambiguous problem. State-of-the-art deep learning methods can now estimate accurate 2D depth maps, but when the maps are projected into 3D, they lack local detail and are often highly distorted. We propose a fast-to-train two-streamed CNN that predicts depth and depth gradients, which are then fused together into an accurate and detailed depth map. We also define a novel set loss over multiple images; by regularizing the estimation between a common set of images, the network is less prone to over-fitting and achieves better accuracy than competing methods. Experiments on the NYU Depth v2 dataset shows that our depth predictions are competitive with state-of-the-art and lead to faithful 3D projections.
Tasks
Published	2016-07-04
URL	http://arxiv.org/abs/1607.00730v4
PDF	http://arxiv.org/pdf/1607.00730v4.pdf
PWC	https://paperswithcode.com/paper/a-two-streamed-network-for-estimating-fine
Repo
Framework

Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition


Title	Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition
Authors	Miao Cheng, Ah Chung Tsoi
Abstract	As a general means of expression, audio analysis and recognition has attracted much attentions for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human-machine interfaces. Distinguish from other existing works, the person-dependent patterns of audio emotions are conducted, and fractal dimension features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn intrinsic characteristics of auditory emotions, while the utterance features are learned from fractal dimensions of each sub-bands. Experimental results show the proposed method is able to provide comparative performance for audio emotion recognition.
Tasks	Emotion Recognition
Published	2016-07-01
URL	http://arxiv.org/abs/1607.00087v2
PDF	http://arxiv.org/pdf/1607.00087v2.pdf
PWC	https://paperswithcode.com/paper/fractal-dimension-pattern-based
Repo
Framework

Content-based Video Indexing and Retrieval Using Corr-LDA


Title	Content-based Video Indexing and Retrieval Using Corr-LDA
Authors	Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh
Abstract	Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging. This paper proposes a very specific way of searching for video clips, based on the content of the video. We present our work on Content-based Video Indexing and Retrieval using the Correspondence-Latent Dirichlet Allocation (corr-LDA) probabilistic framework. This is a model that provides for auto-annotation of videos in a database with textual descriptors, and brings the added benefit of utilizing the semantic relations between the content of the video and text. We use the concept-level matching provided by corr-LDA to build correspondences between text and multimedia, with the objective of retrieving content with increased accuracy. In our experiments, we employ only the audio components of the individual recordings and compare our results with an SVM-based approach.
Tasks
Published	2016-02-27
URL	https://arxiv.org/abs/1602.08581v2
PDF	https://arxiv.org/pdf/1602.08581v2.pdf
PWC	https://paperswithcode.com/paper/content-based-video-indexing-and-retrieval
Repo
Framework

On the Modeling of Error Functions as High Dimensional Landscapes for Weight Initialization in Learning Networks


Title	On the Modeling of Error Functions as High Dimensional Landscapes for Weight Initialization in Learning Networks
Authors	Julius, Gopinath Mahale, Sumana T., C. S. Adityakrishna
Abstract	Next generation deep neural networks for classification hosted on embedded platforms will rely on fast, efficient, and accurate learning algorithms. Initialization of weights in learning networks has a great impact on the classification accuracy. In this paper we focus on deriving good initial weights by modeling the error function of a deep neural network as a high-dimensional landscape. We observe that due to the inherent complexity in its algebraic structure, such an error function may conform to general results of the statistics of large systems. To this end we apply some results from Random Matrix Theory to analyse these functions. We model the error function in terms of a Hamiltonian in N-dimensions and derive some theoretical results about its general behavior. These results are further used to make better initial guesses of weights for the learning algorithm.
Tasks
Published	2016-07-20
URL	http://arxiv.org/abs/1607.06011v1
PDF	http://arxiv.org/pdf/1607.06011v1.pdf
PWC	https://paperswithcode.com/paper/on-the-modeling-of-error-functions-as-high
Repo
Framework

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/ε)$


Title	Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/ε)$
Authors	Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang
Abstract	In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov’s smoothing technique and Nesterov’s accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov’s smoothing algorithm and the primal-dual style of first-order methods.
Tasks
Published	2016-07-13
URL	http://arxiv.org/abs/1607.03815v2
PDF	http://arxiv.org/pdf/1607.03815v2.pdf
PWC	https://paperswithcode.com/paper/homotopy-smoothing-for-non-smooth-problems
Repo
Framework

ORBSLAM-based Endoscope Tracking and 3D Reconstruction


Title	ORBSLAM-based Endoscope Tracking and 3D Reconstruction
Authors	Nader Mahmoud, Iñigo Cirauqui, Alexandre Hostettler, Christophe Doignon, Luc Soler, Jacques Marescaux, J. M. M. Montiel
Abstract	We aim to track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope. This information offers new possibilities for developing surgical navigation and augmented reality applications. The main benefit of this approach is the lack of extra tracking elements which can disturb the surgeon performance in the clinical routine. It is our first contribution to exploit ORBSLAM, one of the best performing monocular SLAM algorithms, to estimate both of the endoscope location, and 3D structure of the surgical scene. However, the reconstructed 3D map poorly describe textureless soft organ surfaces such as liver. It is our second contribution to extend ORBSLAM to be able to reconstruct a semi-dense map of soft organs. Experimental results on in-vivo pigs, shows a robust endoscope tracking even with organs deformations and partial instrument occlusions. It also shows the reconstruction density, and accuracy against ground truth surface obtained from CT.
Tasks	3D Reconstruction
Published	2016-08-29
URL	http://arxiv.org/abs/1608.08149v1
PDF	http://arxiv.org/pdf/1608.08149v1.pdf
PWC	https://paperswithcode.com/paper/orbslam-based-endoscope-tracking-and-3d
Repo
Framework