Paper Group ANR 92
Spectrum Estimation from a Few Entries. Estimating the Coefficients of a Mixture of Two Linear Regressions by Expectation Maximization. Graph Scaling Cut with L1-Norm for Classification of Hyperspectral Images. The Helsinki Neural Machine Translation System. Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Pr …
Spectrum Estimation from a Few Entries
Title | Spectrum Estimation from a Few Entries |
Authors | Ashish Khetan, Sewoong Oh |
Abstract | Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering spectral properties of the underlying matrix from a sampling of its entries. We are particularly interested in directly recovering the spectrum, which is the set of singular values, and also in sample-efficient approaches for recovering a spectral sum function, which is an aggregate sum of the same function applied to each of the singular values. We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values. The main technical challenge is in accurately estimating the Schatten norms from a sampling of a matrix. We introduce a novel unbiased estimator based on counting small structures in a graph and provide guarantees that match its empirical performance. Our theoretical analysis shows that Schatten norms can be recovered accurately from strictly smaller number of samples compared to what is needed to recover the underlying low-rank matrix. Numerical experiments suggest that we significantly improve upon a competing approach of using matrix completion methods. |
Tasks | Matrix Completion |
Published | 2017-03-18 |
URL | http://arxiv.org/abs/1703.06327v1 |
http://arxiv.org/pdf/1703.06327v1.pdf | |
PWC | https://paperswithcode.com/paper/spectrum-estimation-from-a-few-entries |
Repo | |
Framework | |
Estimating the Coefficients of a Mixture of Two Linear Regressions by Expectation Maximization
Title | Estimating the Coefficients of a Mixture of Two Linear Regressions by Expectation Maximization |
Authors | Jason M. Klusowski, Dana Yang, W. D. Brinda |
Abstract | We give convergence guarantees for estimating the coefficients of a symmetric mixture of two linear regressions by expectation maximization (EM). In particular, we show that the empirical EM iterates converge to the target parameter vector at the parametric rate, provided the algorithm is initialized in an unbounded cone. In particular, if the initial guess has a sufficiently large cosine angle with the target parameter vector, a sample-splitting version of the EM algorithm converges to the true coefficient vector with high probability. Interestingly, our analysis borrows from tools used in the problem of estimating the centers of a symmetric mixture of two Gaussians by EM. We also show that the population EM operator for mixtures of two regressions is anti-contractive from the target parameter vector if the cosine angle between the input vector and the target parameter vector is too small, thereby establishing the necessity of our conic condition. Finally, we give empirical evidence supporting this theoretical observation, which suggests that the sample based EM algorithm performs poorly when initial guesses are drawn accordingly. Our simulation study also suggests that the EM algorithm performs well even under model misspecification (i.e., when the covariate and error distributions violate the model assumptions). |
Tasks | |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08231v3 |
http://arxiv.org/pdf/1704.08231v3.pdf | |
PWC | https://paperswithcode.com/paper/estimating-the-coefficients-of-a-mixture-of |
Repo | |
Framework | |
Graph Scaling Cut with L1-Norm for Classification of Hyperspectral Images
Title | Graph Scaling Cut with L1-Norm for Classification of Hyperspectral Images |
Authors | Ramanarayan Mohanty, S L Happy, Aurobinda Routray |
Abstract | In this paper, we propose an L1 normalized graph based dimensionality reduction method for Hyperspectral images, called as L1-Scaling Cut (L1-SC). The underlying idea of this method is to generate the optimal projection matrix by retaining the original distribution of the data. Though L2-norm is generally preferred for computation, it is sensitive to noise and outliers. However, L1-norm is robust to them. Therefore, we obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm. Furthermore, an iterative algorithm is described to solve the optimization problem. The experimental results of the HSI classification confirm the effectiveness of the proposed L1-SC method on both noisy and noiseless data. |
Tasks | Classification Of Hyperspectral Images, Dimensionality Reduction |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02920v1 |
http://arxiv.org/pdf/1709.02920v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-scaling-cut-with-l1-norm-for |
Repo | |
Framework | |
The Helsinki Neural Machine Translation System
Title | The Helsinki Neural Machine Translation System |
Authors | Robert Östling, Yves Scherrer, Jörg Tiedemann, Gongbo Tang, Tommi Nieminen |
Abstract | We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in the news translation task at WMT 2017, where it ranked first in both the human and automatic evaluations for English–Finnish. We discuss the success of English–Finnish translations and the overall advantage of NMT over a strong SMT baseline. We also discuss our submissions for English–Latvian, English–Chinese and Chinese–English. |
Tasks | Machine Translation |
Published | 2017-08-20 |
URL | http://arxiv.org/abs/1708.05942v1 |
http://arxiv.org/pdf/1708.05942v1.pdf | |
PWC | https://paperswithcode.com/paper/the-helsinki-neural-machine-translation |
Repo | |
Framework | |
Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis
Title | Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis |
Authors | Ahmed M. Alaa, Scott Hu, Mihaela van der Schaar |
Abstract | Critically ill patients in regular wards are vulnerable to unanticipated adverse events which require prompt transfer to the intensive care unit (ICU). To allow for accurate prognosis of deteriorating patients, we develop a novel continuous-time probabilistic model for a monitored patient’s temporal sequence of physiological data. Our model captures “informatively sampled” patient episodes: the clinicians’ decisions on when to observe a hospitalized patient’s vital signs and lab tests over time are represented by a marked Hawkes process, with intensity parameters that are modulated by the patient’s latent clinical states, and with observable physiological data (mark process) modeled as a switching multi-task Gaussian process. In addition, our model captures “informatively censored” patient episodes by representing the patient’s latent clinical states as an absorbing semi-Markov jump process. The model parameters are learned from offline patient episodes in the electronic health records via an EM-based algorithm. Experiments conducted on a cohort of patients admitted to a major medical center over a 3-year period show that risk prognosis based on our model significantly outperforms the currently deployed medical risk scores and other baseline machine learning algorithms. |
Tasks | |
Published | 2017-05-15 |
URL | http://arxiv.org/abs/1705.05267v1 |
http://arxiv.org/pdf/1705.05267v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-clinical-judgments-semi-markov |
Repo | |
Framework | |
Sparse Ternary Codes for similarity search have higher coding gain than dense binary codes
Title | Sparse Ternary Codes for similarity search have higher coding gain than dense binary codes |
Authors | Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov, Taras Holotyak |
Abstract | This paper addresses the problem of Approximate Nearest Neighbor (ANN) search in pattern recognition where feature vectors in a database are encoded as compact codes in order to speed-up the similarity search in large-scale databases. Considering the ANN problem from an information-theoretic perspective, we interpret it as an encoding, which maps the original feature vectors to a less entropic sparse representation while requiring them to be as informative as possible. We then define the coding gain for ANN search using information-theoretic measures. We next show that the classical approach to this problem, which consists of binarization of the projected vectors is sub-optimal. Instead, a properly designed ternary encoding achieves higher coding gains and lower complexity. |
Tasks | |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07675v2 |
http://arxiv.org/pdf/1701.07675v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-ternary-codes-for-similarity-search |
Repo | |
Framework | |
Object category understanding via eye fixations on freehand sketches
Title | Object category understanding via eye fixations on freehand sketches |
Authors | Ravi Kiran Sarvadevabhatla, Sudharshan Suresh, R. Venkatesh Babu |
Abstract | The study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories. Our analysis shows that fixation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. In our paper, we show that the multi-level consistency in the fixation data can be exploited to (a) predict a test sketch’s category given only its fixation sequence and (b) build a computational model which predicts part-labels underlying fixations on objects. We hope that our findings motivate the community to deem sketch-like representations worthy of gaze-based studies vis-a-vis photographic images. |
Tasks | |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06554v1 |
http://arxiv.org/pdf/1703.06554v1.pdf | |
PWC | https://paperswithcode.com/paper/object-category-understanding-via-eye |
Repo | |
Framework | |
High-Dimensional Materials and Process Optimization using Data-driven Experimental Design with Well-Calibrated Uncertainty Estimates
Title | High-Dimensional Materials and Process Optimization using Data-driven Experimental Design with Well-Calibrated Uncertainty Estimates |
Authors | Julia Ling, Max Hutchinson, Erin Antono, Sean Paradiso, Bryce Meredig |
Abstract | The optimization of composition and processing to obtain materials that exhibit desirable characteristics has historically relied on a combination of scientist intuition, trial and error, and luck. We propose a methodology that can accelerate this process by fitting data-driven models to experimental data as it is collected to suggest which experiment should be performed next. This methodology can guide the scientist to test the most promising candidates earlier, and can supplement scientific intuition and knowledge with data-driven insights. A key strength of the proposed framework is that it scales to high-dimensional parameter spaces, as are typical in materials discovery applications. Importantly, the data-driven models incorporate uncertainty analysis, so that new experiments are proposed based on a combination of exploring high-uncertainty candidates and exploiting high-performing regions of parameter space. Over four materials science test cases, our methodology led to the optimal candidate being found with three times fewer required measurements than random guessing on average. |
Tasks | |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.07423v2 |
http://arxiv.org/pdf/1704.07423v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-materials-and-process |
Repo | |
Framework | |
TAPAS: Two-pass Approximate Adaptive Sampling for Softmax
Title | TAPAS: Two-pass Approximate Adaptive Sampling for Softmax |
Authors | Yu Bai, Sally Goldman, Li Zhang |
Abstract | TAPAS is a novel adaptive sampling method for the softmax model. It uses a two pass sampling strategy where the examples used to approximate the gradient of the partition function are first sampled according to a squashed population distribution and then resampled adaptively using the context and current model. We describe an efficient distributed implementation of TAPAS. We show, on both synthetic data and a large real dataset, that TAPAS has low computational overhead and works well for minimizing the rank loss for multi-class classification problems with a very large label space. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.03073v2 |
http://arxiv.org/pdf/1707.03073v2.pdf | |
PWC | https://paperswithcode.com/paper/tapas-two-pass-approximate-adaptive-sampling |
Repo | |
Framework | |
MemexQA: Visual Memex Question Answering
Title | MemexQA: Visual Memex Question Answering |
Authors | Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann |
Abstract | This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection. Towards solving the task, we 1) present the MemexQA dataset, a large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers, 2) propose MemexNet, a unified, end-to-end trainable network architecture for image, text and video question answering. Experimental results on the MemexQA dataset demonstrate that MemexNet outperforms strong baselines and yields the state-of-the-art on this novel and challenging task. The promising results on TextQA and VideoQA suggest MemexNet’s efficacy and scalability across various QA tasks. |
Tasks | Memex Question Answering, Question Answering, Video Question Answering |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.01336v1 |
http://arxiv.org/pdf/1708.01336v1.pdf | |
PWC | https://paperswithcode.com/paper/memexqa-visual-memex-question-answering |
Repo | |
Framework | |
Second-order Temporal Pooling for Action Recognition
Title | Second-order Temporal Pooling for Action Recognition |
Authors | Anoop Cherian, Stephen Gould |
Abstract | Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy. |
Tasks | Temporal Action Localization |
Published | 2017-04-23 |
URL | http://arxiv.org/abs/1704.06925v2 |
http://arxiv.org/pdf/1704.06925v2.pdf | |
PWC | https://paperswithcode.com/paper/second-order-temporal-pooling-for-action |
Repo | |
Framework | |
Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks
Title | Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks |
Authors | Joost van Amersfoort, Wenzhe Shi, Alejandro Acosta, Francisco Massa, Johannes Totz, Zehan Wang, Jose Caballero |
Abstract | Frame interpolation attempts to synthesise frames given one or more consecutive video frames. In recent years, deep learning approaches, and notably convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. These techniques often tackle two problems, namely algorithm efficiency and reconstruction quality. In this paper, we present a multi-scale generative adversarial network for frame interpolation (\mbox{FIGAN}). To maximise the efficiency of our network, we propose a novel multi-scale residual estimation module where the predicted flow and synthesised frame are constructed in a coarse-to-fine fashion. To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses. We evaluate the proposed approach using a collection of 60fps videos from YouTube-8m. Our results improve the state-of-the-art accuracy and provide subjective visual quality comparable to the best performing interpolation method at x47 faster runtime. |
Tasks | |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06045v2 |
http://arxiv.org/pdf/1711.06045v2.pdf | |
PWC | https://paperswithcode.com/paper/frame-interpolation-with-multi-scale-deep |
Repo | |
Framework | |
Full-Page Text Recognition: Learning Where to Start and When to Stop
Title | Full-Page Text Recognition: Learning Where to Start and When to Stop |
Authors | Bastien Moysset, Christopher Kermorvant, Christian Wolf |
Abstract | Text line detection and localization is a crucial step for full page document analysis, but still suffers from heterogeneity of real life documents. In this paper, we present a new approach for full page text recognition. Localization of the text lines is based on regressions with Fully Convolutional Neural Networks and Multidimensional Long Short-Term Memory as contextual layers. In order to increase the efficiency of this localization method, only the position of the left side of the text lines are predicted. The text recognizer is then in charge of predicting the end of the text to recognize. This method has shown good results for full page text recognition on the highly heterogeneous Maurdor dataset. |
Tasks | |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08628v1 |
http://arxiv.org/pdf/1704.08628v1.pdf | |
PWC | https://paperswithcode.com/paper/full-page-text-recognition-learning-where-to |
Repo | |
Framework | |
Multimodal Content Analysis for Effective Advertisements on YouTube
Title | Multimodal Content Analysis for Effective Advertisements on YouTube |
Authors | Nikhita Vedula, Wei Sun, Hyunhwan Lee, Harsh Gupta, Mitsunori Ogihara, Joseph Johnson, Gang Ren, Srinivasan Parthasarathy |
Abstract | The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform. |
Tasks | Recommendation Systems |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03946v1 |
http://arxiv.org/pdf/1709.03946v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-content-analysis-for-effective |
Repo | |
Framework | |
Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection
Title | Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection |
Authors | Cheng-Tao Chung, Lin-Shan Lee |
Abstract | In this paper, we compare two paradigms for unsupervised discovery of structured acoustic tokens directly from speech corpora without any human annotation. The Multigranular Paradigm seeks to capture all available information in the corpora with multiple sets of tokens for different model granularities. The Hierarchical Paradigm attempts to jointly learn several levels of signal representations in a hierarchical structure. The two paradigms are unified within a theoretical framework in this paper. Query-by-Example Spoken Term Detection (QbE-STD) experiments on the QUESST dataset of MediaEval 2015 verifies the competitiveness of the acoustic tokens. The Enhanced Relevance Score (ERS) proposed in this work improves both paradigms for the task of QbE-STD. We also list results on the ABX evaluation task of the Zero Resource Challenge 2015 for comparison of the Paradigms. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10133v1 |
http://arxiv.org/pdf/1711.10133v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-discovery-of-structured-acoustic |
Repo | |
Framework | |