May 6, 2019

3258 words 16 mins read

Paper Group ANR 229

Unified View of Matrix Completion under General Structural Constraints. Automatic Face Reenactment. Deep Supervised Hashing with Triplet Labels. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets. Detecting Dominant Vanishing P …

Unified View of Matrix Completion under General Structural Constraints


Title	Unified View of Matrix Completion under General Structural Constraints
Authors	Suriya Gunasekar, Arindam Banerjee, Joydeep Ghosh
Abstract	In this paper, we present a unified analysis of matrix completion under general low-dimensional structural constraints induced by {\em any} norm regularization. We consider two estimators for the general problem of structured matrix completion, and provide unified upper bounds on the sample complexity and the estimation error. Our analysis relies on results from generic chaining, and we establish two intermediate results of independent interest: (a) in characterizing the size or complexity of low dimensional subsets in high dimensional ambient space, a certain partial complexity measure encountered in the analysis of matrix completion problems is characterized in terms of a well understood complexity measure of Gaussian widths, and (b) it is shown that a form of restricted strong convexity holds for matrix completion problems under general norm regularization. Further, we provide several non-trivial examples of structures included in our framework, notably the recently proposed spectral $k$-support norm.
Tasks	Matrix Completion
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08708v2
PDF	http://arxiv.org/pdf/1603.08708v2.pdf
PWC	https://paperswithcode.com/paper/unified-view-of-matrix-completion-under
Repo
Framework

Automatic Face Reenactment


Title	Automatic Face Reenactment
Authors	Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
Abstract	We propose an image-based, facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserving the original target performance. Our system is fully automatic and does not require a database of source expressions. Instead, it is able to produce convincing reenactment results from a short source video captured with an off-the-shelf camera, such as a webcam, where the user performs arbitrary facial gestures. Our reenactment pipeline is conceived as part image retrieval and part face transfer: The image retrieval is based on temporal clustering of target frames and a novel image matching metric that combines appearance and motion to select candidate frames from the source video, while the face transfer uses a 2D warping strategy that preserves the user’s identity. Our system excels in simplicity as it does not rely on a 3D face model, it is robust under head motion and does not require the source and target performance to be similar. We show convincing reenactment results for videos that we recorded ourselves and for low-quality footage taken from the Internet.
Tasks	Face Reenactment, Face Transfer, Image Retrieval
Published	2016-02-08
URL	http://arxiv.org/abs/1602.02651v1
PDF	http://arxiv.org/pdf/1602.02651v1.pdf
PWC	https://paperswithcode.com/paper/automatic-face-reenactment
Repo
Framework

Deep Supervised Hashing with Triplet Labels


Title	Deep Supervised Hashing with Triplet Labels
Authors	Xiaofang Wang, Yi Shi, Kris M. Kitani
Abstract	Hashing is one of the most popular and powerful approximate nearest neighbor search techniques for large-scale image retrieval. Most traditional hashing methods first represent images as off-the-shelf visual features and then produce hashing codes in a separate stage. However, off-the-shelf visual features may not be optimally compatible with the hash code learning procedure, which may result in sub-optimal hash codes. Recently, deep hashing methods have been proposed to simultaneously learn image features and hash codes using deep neural networks and have shown superior performance over traditional hashing methods. Most deep hashing methods are given supervised information in the form of pairwise labels or triplet labels. The current state-of-the-art deep hashing method DPSH~\cite{li2015feature}, which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities. Inspired by DPSH~\cite{li2015feature}, we propose a triplet label based deep hashing method which aims to maximize the likelihood of the given triplet labels. Experimental results show that our method outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets, including the state-of-the-art method DPSH~\cite{li2015feature} and all the previous triplet label based deep hashing methods.
Tasks	Image Retrieval
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03900v1
PDF	http://arxiv.org/pdf/1612.03900v1.pdf
PWC	https://paperswithcode.com/paper/deep-supervised-hashing-with-triplet-labels
Repo
Framework

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning


Title	Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
Authors	Tiancheng Zhao, Maxine Eskenazi
Abstract	This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method outperforms the modular-based baseline and learns a distributed representation of the latent dialog state.
Tasks
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02560v2
PDF	http://arxiv.org/pdf/1606.02560v2.pdf
PWC	https://paperswithcode.com/paper/towards-end-to-end-learning-for-dialog-state
Repo
Framework

Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets


Title	Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets
Authors	Patrick O. Glauner, Andre Boechat, Lautaro Dolberg, Radu State, Franck Bettinger, Yves Rangoni, Diogo Duarte
Abstract	Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.
Tasks
Published	2016-02-26
URL	http://arxiv.org/abs/1602.08350v2
PDF	http://arxiv.org/pdf/1602.08350v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-detection-of-non-technical-losses
Repo
Framework

Detecting Dominant Vanishing Points in Natural Scenes with Application to Composition-Sensitive Image Retrieval


Title	Detecting Dominant Vanishing Points in Natural Scenes with Application to Composition-Sensitive Image Retrieval
Authors	Zihan Zhou, Farshid Farhat, James Z. Wang
Abstract	Linear perspective is widely used in landscape photography to create the impression of depth on a 2D photo. Automated understanding of linear perspective in landscape photography has several real-world applications, including aesthetics assessment, image retrieval, and on-site feedback for photo composition, yet adequate automated understanding has been elusive. We address this problem by detecting the dominant vanishing point and the associated line structures in a photo. However, natural landscape scenes pose great technical challenges because often the inadequate number of strong edges converging to the dominant vanishing point is inadequate. To overcome this difficulty, we propose a novel vanishing point detection method that exploits global structures in the scene via contour detection. We show that our method significantly outperforms state-of-the-art methods on a public ground truth landscape image dataset that we have created. Based on the detection results, we further demonstrate how our approach to linear perspective understanding provides on-site guidance to amateur photographers on their work through a novel viewpoint-specific image retrieval system.
Tasks	Contour Detection, Image Retrieval
Published	2016-08-15
URL	http://arxiv.org/abs/1608.04267v2
PDF	http://arxiv.org/pdf/1608.04267v2.pdf
PWC	https://paperswithcode.com/paper/detecting-dominant-vanishing-points-in
Repo
Framework

Voronoi Region-Based Adaptive Unsupervised Color Image Segmentation


Title	Voronoi Region-Based Adaptive Unsupervised Color Image Segmentation
Authors	R. Hettiarachchi, J. F. Peters
Abstract	Color image segmentation is a crucial step in many computer vision and pattern recognition applications. This article introduces an adaptive and unsupervised clustering approach based on Voronoi regions, which can be applied to solve the color image segmentation problem. The proposed method performs region splitting and merging within Voronoi regions of the Dirichlet Tessellated image (also called a Voronoi diagram) , which improves the efficiency and the accuracy of the number of clusters and cluster centroids estimation process. Furthermore, the proposed method uses cluster centroid proximity to merge proximal clusters in order to find the final number of clusters and cluster centroids. In contrast to the existing adaptive unsupervised cluster-based image segmentation algorithms, the proposed method uses K-means clustering algorithm in place of the Fuzzy C-means algorithm to find the final segmented image. The proposed method was evaluated on three different unsupervised image segmentation evaluation benchmarks and its results were compared with two other adaptive unsupervised cluster-based image segmentation algorithms. The experimental results reported in this article confirm that the proposed method outperforms the existing algorithms in terms of the quality of image segmentation results. Also, the proposed method results in the lowest average execution time per image compared to the existing methods reported in this article.
Tasks	Semantic Segmentation
Published	2016-04-02
URL	http://arxiv.org/abs/1604.00533v1
PDF	http://arxiv.org/pdf/1604.00533v1.pdf
PWC	https://paperswithcode.com/paper/voronoi-region-based-adaptive-unsupervised
Repo
Framework

Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes


Title	Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes
Authors	Afshin Dehghan, Mubarak Shah
Abstract	Multi-object tracking has been studied for decades. However, when it comes to tracking pedestrians in extremely crowded scenes, we are limited to only few works. This is an important problem which gives rise to several challenges. Pre-trained object detectors fail to localize targets in crowded sequences. This consequently limits the use of data-association based multi-target tracking methods which rely on the outcome of an object detector. Additionally, the small apparent target size makes it challenging to extract features to discriminate targets from their surroundings. Finally, the large number of targets greatly increases computational complexity which in turn makes it hard to extend existing multi-target tracking approaches to high-density crowd scenarios. In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently. We formulate online crowd tracking as Binary Quadratic Programing. Our formulation employs target’s individual information in the form of appearance and motion as well as contextual cues in the form of neighborhood motion, spatial proximity and grouping constraints, and solves detection and data association simultaneously. In order to solve the proposed quadratic optimization efficiently, where state-of art commercial quadratic programing solvers fail to find the answer in a reasonable amount of time, we propose to use the most recent version of the Modified Frank Wolfe algorithm, which takes advantage of SWAP-steps to speed up the optimization. We show that the proposed formulation can track hundreds of targets efficiently and improves state-of-art results by significant margins on eleven challenging high density crowd sequences.
Tasks	Multi-Object Tracking, Object Tracking
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09240v1
PDF	http://arxiv.org/pdf/1603.09240v1.pdf
PWC	https://paperswithcode.com/paper/binary-quadratic-programing-for-online
Repo
Framework

Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia


Title	Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia
Authors	Salita Ulitia Prini, Ary Setijadi Prihatmanto
Abstract	Adding an emotions using prosody manipulation method for Indonesian text to speech system. Text To Speech (TTS) is a system that can convert text in one language into speech, accordance with the reading of the text in the language used. The focus of this research is a natural sounding concept, the make “humanize” for the pronunciation of voice synthesis system Text To Speech. Humans have emotions / intonation that may affect the sound produced. The main requirement for the system used Text To Speech in this research is eSpeak, the database MBROLA using id1, Human Speech Corpus database from a website that summarizes the words with the highest frequency (Most Common Words) used in a country. And there are 3 types of emotional / intonation designed base. There is a happy, angry and sad emotion. Method for develop the emotional filter is manipulate the relevant features of prosody (especially pitch and duration value) using a predetermined rate factor that has been established by analyzing the differences between the standard output Text To Speech and voice recording with emotional prosody / a particular intonation. The test results for the perception tests of Human Speech Corpus for happy emotion is 95 %, 96.25 % for angry emotion and 98.75 % for sad emotions. For perception test system carried by intelligibility and naturalness test. Intelligibility test for the accuracy of sound with the original sentence is 93.3%, and for clarity rate for each sentence is 62.8%. For naturalness, accuracy emotional election amounted to 75.6 % for happy emotion, 73.3 % for angry emotion, and 60 % for sad emotions. —– Text To Speech (TTS) merupakan suatu sistem yang dapat mengonversi teks dalam format suatu bahasa menjadi ucapan sesuai dengan pembacaan teks dalam bahasa yang digunakan.
Tasks
Published	2016-06-29
URL	http://arxiv.org/abs/1606.09222v1
PDF	http://arxiv.org/pdf/1606.09222v1.pdf
PWC	https://paperswithcode.com/paper/penambahan-emosi-menggunakan-metode
Repo
Framework

Homotopy Analysis for Tensor PCA


Title	Homotopy Analysis for Tensor PCA
Authors	Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi
Abstract	Developing efficient and guaranteed nonconvex algorithms has been an important challenge in modern machine learning. Algorithms with good empirical performance such as stochastic gradient descent often lack theoretical guarantees. In this paper, we analyze the class of homotopy or continuation methods for global optimization of nonconvex functions. These methods start from an objective function that is efficient to optimize (e.g. convex), and progressively modify it to obtain the required objective, and the solutions are passed along the homotopy path. For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the “high noise” regime. The signal-to-noise requirement for our algorithm is tight in the sense that it matches the recovery guarantee for the best degree-4 sum-of-squares algorithm. In addition, we prove a phase transition along the homotopy path for tensor PCA. This allows to simplify the homotopy method to a local search algorithm, viz., tensor power iterations, with a specific initialization and a noise injection procedure, while retaining the theoretical guarantees.
Tasks
Published	2016-10-28
URL	http://arxiv.org/abs/1610.09322v4
PDF	http://arxiv.org/pdf/1610.09322v4.pdf
PWC	https://paperswithcode.com/paper/homotopy-analysis-for-tensor-pca
Repo
Framework


Title	Experiments with POS Tagging Code-mixed Indian Social Media Text
Authors	Prakash B. Pimpale, Raj Nath Patel
Abstract	This paper presents Centre for Development of Advanced Computing Mumbai’s (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.
Tasks	Part-Of-Speech Tagging
Published	2016-10-31
URL	http://arxiv.org/abs/1610.09799v1
PDF	http://arxiv.org/pdf/1610.09799v1.pdf
PWC	https://paperswithcode.com/paper/experiments-with-pos-tagging-code-mixed
Repo
Framework


Title	Correlation Hashing Network for Efficient Cross-Modal Retrieval
Authors	Yue Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu
Abstract	Hashing is widely applied to approximate nearest neighbor search for large-scale multimodal retrieval with storage and computation efficiency. Cross-modal hashing improves the quality of hash coding by exploiting semantic correlations across different modalities. Existing cross-modal hashing methods first transform data into low-dimensional feature vectors, and then generate binary codes by another separate quantization step. However, suboptimal hash codes may be generated since the quantization error is not explicitly minimized and the feature representation is not jointly optimized with the binary codes. This paper presents a Correlation Hashing Network (CHN) approach to cross-modal hashing, which jointly learns good data representation tailored to hash coding and formally controls the quantization error. The proposed CHN is a hybrid deep architecture that constitutes a convolutional neural network for learning good image representations, a multilayer perception for learning good text representations, two hashing layers for generating compact binary codes, and a structured max-margin loss that integrates all things together to enable learning similarity-preserving and high-quality hash codes. Extensive empirical study shows that CHN yields state of the art cross-modal retrieval performance on standard benchmarks.
Tasks	Cross-Modal Retrieval, Quantization
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06697v2
PDF	http://arxiv.org/pdf/1602.06697v2.pdf
PWC	https://paperswithcode.com/paper/correlation-hashing-network-for-efficient
Repo
Framework

Heart Rate Variability and Respiration Signal as Diagnostic Tools for Late Onset Sepsis in Neonatal Intensive Care Units


Title	Heart Rate Variability and Respiration Signal as Diagnostic Tools for Late Onset Sepsis in Neonatal Intensive Care Units
Authors	Yuan Wang, Guy Carrault, Alain Beuchee, Nathalie Costet, Huazhong Shu, Lotfi Senhadji
Abstract	Apnea-bradycardia is one of the major clinical early indicators of late-onset sepsis occurring in approximately 7% to 10% of all neonates and in more than 25% of very low birth weight infants in NICU. The objective of this paper was to determine if HRV, respiration and their relationships help to diagnose infection in premature infants via non-invasive ways in NICU. Therefore, we implement Mono-Channel (MC) and Bi-Channel (BC) Analysis in two groups: sepsis (S) vs. non-sepsis (NS). Firstly, we studied RR series not only by linear methods: time domain and frequency domain, but also by non-linear methods: chaos theory and information theory. The results show that alpha Slow, alpha Fast and Sample Entropy are significant parameters to distinguish S from NS. Secondly, the question about the functional coupling of HRV and nasal respiration is addressed. Local linear correlation coefficient r2t,f has been explored, while non-linear regression coefficient h2 was calculated in two directions. It is obvious that r2t,f within the third frequency band (0.2<f<0.4 Hz) and h2 in two directions were complementary approaches to diagnose sepsis. Thirdly, feasibility study is carried out on the candidate parameters selected from MC and BC respectively. We discovered that the proposed test based on optimal fusion of 6 features shows good performance with the largest AUC and a reduced probability of false alarm (PFA).
Tasks	Heart Rate Variability
Published	2016-05-12
URL	http://arxiv.org/abs/1605.05247v1
PDF	http://arxiv.org/pdf/1605.05247v1.pdf
PWC	https://paperswithcode.com/paper/heart-rate-variability-and-respiration-signal
Repo
Framework

Surprising properties of dropout in deep networks


Title	Surprising properties of dropout in deep networks
Authors	David P. Helmbold, Philip M. Long
Abstract	We analyze dropout in deep networks with rectified linear units and the quadratic loss. Our results expose surprising differences between the behavior of dropout and more traditional regularizers like weight decay. For example, on some simple data sets dropout training produces negative weights even though the output is the sum of the inputs. This provides a counterpoint to the suggestion that dropout discourages co-adaptation of weights. We also show that the dropout penalty can grow exponentially in the depth of the network while the weight-decay penalty remains essentially linear, and that dropout is insensitive to various re-scalings of the input features, outputs, and network weights. This last insensitivity implies that there are no isolated local minima of the dropout training criterion. Our work uncovers new properties of dropout, extends our understanding of why dropout succeeds, and lays the foundation for further progress.
Tasks
Published	2016-02-14
URL	http://arxiv.org/abs/1602.04484v5
PDF	http://arxiv.org/pdf/1602.04484v5.pdf
PWC	https://paperswithcode.com/paper/surprising-properties-of-dropout-in-deep
Repo
Framework

Deeply Semantic Inductive Spatio-Temporal Learning


Title	Deeply Semantic Inductive Spatio-Temporal Learning
Authors	Jakob Suchan, Mehul Bhatt, Carl Schultz
Abstract	We present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an example application in the domain of computing for visual arts and computational cognitive science.
Tasks
Published	2016-08-09
URL	http://arxiv.org/abs/1608.02693v1
PDF	http://arxiv.org/pdf/1608.02693v1.pdf
PWC	https://paperswithcode.com/paper/deeply-semantic-inductive-spatio-temporal
Repo
Framework