Paper Group ANR 359
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model. Online Dual Coordinate Ascent Learning. Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction. Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering. A Recurrent Encoder-Decoder Network for Sequential Face Alignment. Comp …
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model
Title | Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model |
Authors | Shi Feng, Shujie Liu, Mu Li, Ming Zhou |
Abstract | Neural machine translation has shown very promising results lately. Most NMT models follow the encoder-decoder framework. To make encoder-decoder models more flexible, attention mechanism was introduced to machine translation and also other tasks like speech recognition and image captioning. We observe that the quality of translation by attention-based encoder-decoder can be significantly damaged when the alignment is incorrect. We attribute these problems to the lack of distortion and fertility models. Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation. Our proposed method achieved an improvement of 2 BLEU points over the original attention-based encoder-decoder. |
Tasks | Image Captioning, Machine Translation, Speech Recognition |
Published | 2016-01-13 |
URL | http://arxiv.org/abs/1601.03317v3 |
http://arxiv.org/pdf/1601.03317v3.pdf | |
PWC | https://paperswithcode.com/paper/implicit-distortion-and-fertility-models-for |
Repo | |
Framework | |
Online Dual Coordinate Ascent Learning
Title | Online Dual Coordinate Ascent Learning |
Authors | Bicheng Ying, Kun Yuan, Ali H. Sayed |
Abstract | The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an {\em online} dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature embeds the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly attractive for online learning scenarios. |
Tasks | |
Published | 2016-02-24 |
URL | http://arxiv.org/abs/1602.07630v1 |
http://arxiv.org/pdf/1602.07630v1.pdf | |
PWC | https://paperswithcode.com/paper/online-dual-coordinate-ascent-learning |
Repo | |
Framework | |
Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction
Title | Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction |
Authors | Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar |
Abstract | This paper addresses the problem of natural image segmentation by extracting information from a multi-layer array which is constructed based on color, gradient, and statistical properties of the local neighborhoods in an image. A Gaussian Mixture Model (GMM) is used to improve the effectiveness of local spectral histogram features. Grouping these features leads to forming a rough initial over-segmented layer which contains coherent regions of pixels. The regions are merged by using two proposed functions for calculating the distance between two neighboring regions and making decisions about their merging. Extensive experiments are performed on the Berkeley Segmentation Dataset to evaluate the performance of our proposed method and compare the results with the recent state-of-the-art methods. The experimental results indicate that our method achieves higher level of accuracy for natural images compared to recent methods. |
Tasks | Semantic Segmentation |
Published | 2016-05-24 |
URL | http://arxiv.org/abs/1605.07586v2 |
http://arxiv.org/pdf/1605.07586v2.pdf | |
PWC | https://paperswithcode.com/paper/natural-scene-image-segmentation-based-on |
Repo | |
Framework | |
Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering
Title | Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering |
Authors | Du Yong Kim, Ba-Ngu Vo, Ba-Tuong Vo |
Abstract | This paper proposes an online visual multi-object tracking algorithm using a top-down Bayesian formulation that seamlessly integrates state estimation, track management, clutter rejection, occlusion and mis-detection handling into a single recursion. This is achieved by modeling the multi-object state as labeled random finite set and using the Bayes recursion to propagate the multi-object filtering density forward in time. The proposed filter updates tracks with detections but switches to image data when mis-detection occurs, thereby exploiting the efficiency of detection data and the accuracy of image data. Furthermore the labeled random finite set framework enables the incorporation of prior knowledge that mis-detections of long tracks which occur in the middle of the scene are likely to be due to occlusions. Such prior knowledge can be exploited to improve occlusion handling, especially long occlusions that can lead to premature track termination in on-line multi-object tracking. Tracking performance are compared to state-of-the-art algorithms on well-known benchmark video datasets. |
Tasks | Multi-Object Tracking, Object Tracking |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06011v2 |
http://arxiv.org/pdf/1611.06011v2.pdf | |
PWC | https://paperswithcode.com/paper/online-visual-multi-object-tracking-via |
Repo | |
Framework | |
A Recurrent Encoder-Decoder Network for Sequential Face Alignment
Title | A Recurrent Encoder-Decoder Network for Sequential Face Alignment |
Authors | Xi Peng, Rogerio S. Feris, Xiaoyu Wang, Dimitris N. Metaxas |
Abstract | We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets. |
Tasks | Face Alignment |
Published | 2016-08-19 |
URL | http://arxiv.org/abs/1608.05477v2 |
http://arxiv.org/pdf/1608.05477v2.pdf | |
PWC | https://paperswithcode.com/paper/a-recurrent-encoder-decoder-network-for |
Repo | |
Framework | |
Comparison of several short-term traffic speed forecasting models
Title | Comparison of several short-term traffic speed forecasting models |
Authors | John Boaz Lee, Kardi Teknomo |
Abstract | The widespread adoption of smartphones in recent years has made it possible for us to collect large amounts of traffic data. Special software installed on the phones of drivers allow us to gather GPS trajectories of their vehicles on the road network. In this paper, we simulate the trajectories of multiple agents on a road network and use various models to forecast the short-term traffic speed of various links. Our results show that traditional techniques like multiple regression and artificial neural networks work well but simpler adaptive models that do not require prior training also perform comparatively well. |
Tasks | |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.02409v1 |
http://arxiv.org/pdf/1609.02409v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-several-short-term-traffic |
Repo | |
Framework | |
RGBD Datasets: Past, Present and Future
Title | RGBD Datasets: Past, Present and Future |
Authors | Michael Firman |
Abstract | Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes. |
Tasks | Gesture Recognition, Object Tracking, Pose Estimation |
Published | 2016-04-04 |
URL | http://arxiv.org/abs/1604.00999v2 |
http://arxiv.org/pdf/1604.00999v2.pdf | |
PWC | https://paperswithcode.com/paper/rgbd-datasets-past-present-and-future |
Repo | |
Framework | |
Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification
Title | Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification |
Authors | Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, Katia Sycara |
Abstract | Due to the lack of structured knowledge applied in learning distributed representation of cate- gories, existing work cannot incorporate category hierarchies into entity information. We propose a framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases. The framework allows to com- pute meaningful semantic relatedness between entities and categories. Our framework can han- dle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification. |
Tasks | |
Published | 2016-07-27 |
URL | http://arxiv.org/abs/1607.07956v1 |
http://arxiv.org/pdf/1607.07956v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-embedding-of-hierarchical-categories |
Repo | |
Framework | |
Semi Supervised Preposition-Sense Disambiguation using Multilingual Data
Title | Semi Supervised Preposition-Sense Disambiguation using Multilingual Data |
Authors | Hila Gonen, Yoav Goldberg |
Abstract | Prepositions are very common and very ambiguous, and understanding their sense is critical for understanding the meaning of the sentence. Supervised corpora for the preposition-sense disambiguation task are small, suggesting a semi-supervised approach to the task. We show that signals from unannotated multilingual data can be used to improve supervised preposition-sense disambiguation. Our approach pre-trains an LSTM encoder for predicting the translation of a preposition, and then incorporates the pre-trained encoder as a component in a supervised classification system, and fine-tunes it for the task. The multilingual signals consistently improve results on two preposition-sense datasets. |
Tasks | |
Published | 2016-11-27 |
URL | http://arxiv.org/abs/1611.08813v1 |
http://arxiv.org/pdf/1611.08813v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-preposition-sense |
Repo | |
Framework | |
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images
Title | A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images |
Authors | Jun Li, Reinhard Klein, Angela Yao |
Abstract | Estimating depth from a single RGB image is an ill-posed and inherently ambiguous problem. State-of-the-art deep learning methods can now estimate accurate 2D depth maps, but when the maps are projected into 3D, they lack local detail and are often highly distorted. We propose a fast-to-train two-streamed CNN that predicts depth and depth gradients, which are then fused together into an accurate and detailed depth map. We also define a novel set loss over multiple images; by regularizing the estimation between a common set of images, the network is less prone to over-fitting and achieves better accuracy than competing methods. Experiments on the NYU Depth v2 dataset shows that our depth predictions are competitive with state-of-the-art and lead to faithful 3D projections. |
Tasks | |
Published | 2016-07-04 |
URL | http://arxiv.org/abs/1607.00730v4 |
http://arxiv.org/pdf/1607.00730v4.pdf | |
PWC | https://paperswithcode.com/paper/a-two-streamed-network-for-estimating-fine |
Repo | |
Framework | |
Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition
Title | Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition |
Authors | Miao Cheng, Ah Chung Tsoi |
Abstract | As a general means of expression, audio analysis and recognition has attracted much attentions for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human-machine interfaces. Distinguish from other existing works, the person-dependent patterns of audio emotions are conducted, and fractal dimension features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn intrinsic characteristics of auditory emotions, while the utterance features are learned from fractal dimensions of each sub-bands. Experimental results show the proposed method is able to provide comparative performance for audio emotion recognition. |
Tasks | Emotion Recognition |
Published | 2016-07-01 |
URL | http://arxiv.org/abs/1607.00087v2 |
http://arxiv.org/pdf/1607.00087v2.pdf | |
PWC | https://paperswithcode.com/paper/fractal-dimension-pattern-based |
Repo | |
Framework | |
Content-based Video Indexing and Retrieval Using Corr-LDA
Title | Content-based Video Indexing and Retrieval Using Corr-LDA |
Authors | Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, Rita Singh |
Abstract | Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging. This paper proposes a very specific way of searching for video clips, based on the content of the video. We present our work on Content-based Video Indexing and Retrieval using the Correspondence-Latent Dirichlet Allocation (corr-LDA) probabilistic framework. This is a model that provides for auto-annotation of videos in a database with textual descriptors, and brings the added benefit of utilizing the semantic relations between the content of the video and text. We use the concept-level matching provided by corr-LDA to build correspondences between text and multimedia, with the objective of retrieving content with increased accuracy. In our experiments, we employ only the audio components of the individual recordings and compare our results with an SVM-based approach. |
Tasks | |
Published | 2016-02-27 |
URL | https://arxiv.org/abs/1602.08581v2 |
https://arxiv.org/pdf/1602.08581v2.pdf | |
PWC | https://paperswithcode.com/paper/content-based-video-indexing-and-retrieval |
Repo | |
Framework | |
On the Modeling of Error Functions as High Dimensional Landscapes for Weight Initialization in Learning Networks
Title | On the Modeling of Error Functions as High Dimensional Landscapes for Weight Initialization in Learning Networks |
Authors | Julius, Gopinath Mahale, Sumana T., C. S. Adityakrishna |
Abstract | Next generation deep neural networks for classification hosted on embedded platforms will rely on fast, efficient, and accurate learning algorithms. Initialization of weights in learning networks has a great impact on the classification accuracy. In this paper we focus on deriving good initial weights by modeling the error function of a deep neural network as a high-dimensional landscape. We observe that due to the inherent complexity in its algebraic structure, such an error function may conform to general results of the statistics of large systems. To this end we apply some results from Random Matrix Theory to analyse these functions. We model the error function in terms of a Hamiltonian in N-dimensions and derive some theoretical results about its general behavior. These results are further used to make better initial guesses of weights for the learning algorithm. |
Tasks | |
Published | 2016-07-20 |
URL | http://arxiv.org/abs/1607.06011v1 |
http://arxiv.org/pdf/1607.06011v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-modeling-of-error-functions-as-high |
Repo | |
Framework | |
Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/ε)$
Title | Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/ε)$ |
Authors | Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang |
Abstract | In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov’s smoothing technique and Nesterov’s accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov’s smoothing algorithm and the primal-dual style of first-order methods. |
Tasks | |
Published | 2016-07-13 |
URL | http://arxiv.org/abs/1607.03815v2 |
http://arxiv.org/pdf/1607.03815v2.pdf | |
PWC | https://paperswithcode.com/paper/homotopy-smoothing-for-non-smooth-problems |
Repo | |
Framework | |
ORBSLAM-based Endoscope Tracking and 3D Reconstruction
Title | ORBSLAM-based Endoscope Tracking and 3D Reconstruction |
Authors | Nader Mahmoud, Iñigo Cirauqui, Alexandre Hostettler, Christophe Doignon, Luc Soler, Jacques Marescaux, J. M. M. Montiel |
Abstract | We aim to track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope. This information offers new possibilities for developing surgical navigation and augmented reality applications. The main benefit of this approach is the lack of extra tracking elements which can disturb the surgeon performance in the clinical routine. It is our first contribution to exploit ORBSLAM, one of the best performing monocular SLAM algorithms, to estimate both of the endoscope location, and 3D structure of the surgical scene. However, the reconstructed 3D map poorly describe textureless soft organ surfaces such as liver. It is our second contribution to extend ORBSLAM to be able to reconstruct a semi-dense map of soft organs. Experimental results on in-vivo pigs, shows a robust endoscope tracking even with organs deformations and partial instrument occlusions. It also shows the reconstruction density, and accuracy against ground truth surface obtained from CT. |
Tasks | 3D Reconstruction |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.08149v1 |
http://arxiv.org/pdf/1608.08149v1.pdf | |
PWC | https://paperswithcode.com/paper/orbslam-based-endoscope-tracking-and-3d |
Repo | |
Framework | |