Paper Group ANR 979
A method to Suppress Facial Expression in Posed and Spontaneous Videos. SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks. Global Convergence to the Equilibrium of GANs using Variational Inequalities. Mining …
A method to Suppress Facial Expression in Posed and Spontaneous Videos
Title | A method to Suppress Facial Expression in Posed and Spontaneous Videos |
Authors | Ghada Zamzmi, Gabriel Ruiz, Matthew Shreve, Dmitry Goldgof, Rangachar Kasturi, Sudeep Sarkar |
Abstract | We address the problem of suppressing facial expressions in videos because expressions can hinder the retrieval of important information in applications such as face recognition. To achieve this, we present an optical strain suppression method that removes any facial expression without requiring training for a specific expression. For each frame in a video, an optical strain map that provides the strain magnitude value at each pixel is generated; this strain map is then utilized to neutralize the expression by replacing pixels of high strain values with pixels from a reference face frame. Experimental results of testing the method on various expressions namely happiness, sadness, and anger for two publicly available data sets (i.e., BU-4DFE and AM-FED) show the ability of our method in suppressing facial expressions. |
Tasks | Face Recognition |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02401v1 |
http://arxiv.org/pdf/1810.02401v1.pdf | |
PWC | https://paperswithcode.com/paper/a-method-to-suppress-facial-expression-in |
Repo | |
Framework | |
SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections
Title | SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections |
Authors | Amir-Hossein Karimi, Alexander Wong, Ali Ghodsi |
Abstract | Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this challenge, such approaches are not well-suited for accelerating computational speed for supervised dimensionality reduction. Motivated to tackle this challenge, in this study we explore a novel direction of directly learning optimal class-aware embeddings in a supervised manner via the notion of supervised random projections (SRP). The key idea behind SRP is that, rather than performing spectral decomposition (or approximations thereof) which are computationally prohibitive for large-scale data, we instead perform a direct decomposition by leveraging kernel approximation theory and the symmetry of the Hilbert-Schmidt Independence Criterion (HSIC) measure of dependence between the embedded data and the labels. Experimental results on five different synthetic and real-world datasets demonstrate that the proposed SRP strategy for class-aware embedding learning can be very promising in producing embeddings that are highly competitive with existing supervised dimensionality reduction methods (e.g., SPCA and KSPCA) while achieving 1-2 orders of magnitude better computational performance. As such, such an efficient approach to learning embeddings for dimensionality reduction can be a powerful tool for large-scale data analysis and visualization. |
Tasks | Dimensionality Reduction |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1811.03166v1 |
http://arxiv.org/pdf/1811.03166v1.pdf | |
PWC | https://paperswithcode.com/paper/srp-efficient-class-aware-embedding-learning |
Repo | |
Framework | |
Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks
Title | Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks |
Authors | Henggang Cui, Vladan Radosavljevic, Fang-Chieh Chou, Tsung-Han Lin, Thi Nguyen, Tzu-Kuo Huang, Jeff Schneider, Nemanja Djuric |
Abstract | Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. However, despite large interest and a number of industry players working in the autonomous domain, there still remains more to be done in order to develop a system capable of operating at a level comparable to best human drivers. One reason for this is high uncertainty of traffic behavior and large number of situations that an SDV may encounter on the roads, making it very difficult to create a fully generalizable system. To ensure safe and efficient operations, an autonomous vehicle is required to account for this uncertainty and to anticipate a multitude of possible behaviors of traffic actors in its surrounding. We address this critical problem and present a method to predict multiple possible trajectories of actors while also estimating their probabilities. The method encodes each actor’s surrounding context into a raster image, used as input by deep convolutional networks to automatically derive relevant features for the task. Following extensive offline evaluation and comparison to state-of-the-art baselines, the method was successfully tested on SDVs in closed-course tests. |
Tasks | Autonomous Driving |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.10732v2 |
http://arxiv.org/pdf/1809.10732v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-trajectory-predictions-for |
Repo | |
Framework | |
Global Convergence to the Equilibrium of GANs using Variational Inequalities
Title | Global Convergence to the Equilibrium of GANs using Variational Inequalities |
Authors | Ian Gemp, Sridhar Mahadevan |
Abstract | In optimization, the negative gradient of a function denotes the direction of steepest descent. Furthermore, traveling in any direction orthogonal to the gradient maintains the value of the function. In this work, we show that these orthogonal directions that are ignored by gradient descent can be critical in equilibrium problems. Equilibrium problems have drawn heightened attention in machine learning due to the emergence of the Generative Adversarial Network (GAN). We use the framework of Variational Inequalities to analyze popular training algorithms for a fundamental GAN variant: the Wasserstein Linear-Quadratic GAN. We show that the steepest descent direction causes divergence from the equilibrium, and convergence to the equilibrium is achieved through following a particular orthogonal direction. We call this successful technique Crossing-the-Curl, named for its mathematical derivation as well as its intuition: identify the game’s axis of rotation and move “across” space in the direction towards smaller “curling”. |
Tasks | |
Published | 2018-08-04 |
URL | https://arxiv.org/abs/1808.01531v3 |
https://arxiv.org/pdf/1808.01531v3.pdf | |
PWC | https://paperswithcode.com/paper/global-convergence-to-the-equilibrium-of-gans |
Repo | |
Framework | |
Mining Social Media for Newsgathering: A Review
Title | Mining Social Media for Newsgathering: A Review |
Authors | Arkaitz Zubiaga |
Abstract | Social media is becoming an increasingly important data source for learning about breaking news and for following the latest developments of ongoing news. This is in part possible thanks to the existence of mobile devices, which allows anyone with access to the Internet to post updates from anywhere, leading in turn to a growing presence of citizen journalism. Consequently, social media has become a go-to resource for journalists during the process of newsgathering. Use of social media for newsgathering is however challenging, and suitable tools are needed in order to facilitate access to useful information for reporting. In this paper, we provide an overview of research in data mining and natural language processing for mining social media for newsgathering. We discuss five different areas that researchers have worked on to mitigate the challenges inherent to social media newsgathering: news discovery, curation of news, validation and verification of content, newsgathering dashboards, and other tasks. We outline the progress made so far in the field, summarise the current challenges as well as discuss future directions in the use of computational journalism to assist with social media newsgathering. This review is relevant to computer scientists researching news in social media as well as for interdisciplinary researchers interested in the intersection of computer science and journalism. |
Tasks | |
Published | 2018-04-10 |
URL | https://arxiv.org/abs/1804.03540v2 |
https://arxiv.org/pdf/1804.03540v2.pdf | |
PWC | https://paperswithcode.com/paper/mining-social-media-for-newsgathering |
Repo | |
Framework | |
Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation
Title | Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation |
Authors | Benjamin Marie, Atsushi Fujita |
Abstract | Recent work achieved remarkable results in training neural machine translation (NMT) systems in a fully unsupervised way, with new and dedicated architectures that rely on monolingual corpora only. In this work, we propose to define unsupervised NMT (UNMT) as NMT trained with the supervision of synthetic bilingual data. Our approach straightforwardly enables the use of state-of-the-art architectures proposed for supervised NMT by replacing human-made bilingual data with synthetic bilingual data for training. We propose to initialize the training of UNMT with synthetic bilingual data generated by unsupervised statistical machine translation (USMT). The UNMT system is then incrementally improved using back-translation. Our preliminary experiments show that our approach achieves a new state-of-the-art for unsupervised machine translation on the WMT16 German–English news translation task, for both translation directions. |
Tasks | Machine Translation, Unsupervised Machine Translation |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12703v1 |
http://arxiv.org/pdf/1810.12703v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-machine-translation-1 |
Repo | |
Framework | |
Shape-conditioned Image Generation by Learning Latent Appearance Representation from Unpaired Data
Title | Shape-conditioned Image Generation by Learning Latent Appearance Representation from Unpaired Data |
Authors | Yutaro Miyauchi, Yusuke Sugano, Yasuyuki Matsushita |
Abstract | Conditional image generation is effective for diverse tasks including training data synthesis for learning-based computer vision. However, despite the recent advances in generative adversarial networks (GANs), it is still a challenging task to generate images with detailed conditioning on object shapes. Existing methods for conditional image generation use category labels and/or keypoints and are only give limited control over object categories. In this work, we present SCGAN, an architecture to generate images with a desired shape specified by an input normal map. The shape-conditioned image generation task is achieved by explicitly modeling the image appearance via a latent appearance vector. The network is trained using unpaired training samples of real images and rendered normal maps. This approach enables us to generate images of arbitrary object categories with the target shape and diverse image appearances. We show the effectiveness of our method through both qualitative and quantitative evaluation on training data generation tasks. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.11991v1 |
http://arxiv.org/pdf/1811.11991v1.pdf | |
PWC | https://paperswithcode.com/paper/shape-conditioned-image-generation-by |
Repo | |
Framework | |
Efficient Active Learning for Image Classification and Segmentation using a Sample Selection and Conditional Generative Adversarial Network
Title | Efficient Active Learning for Image Classification and Segmentation using a Sample Selection and Conditional Generative Adversarial Network |
Authors | Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Mauricio Reyes |
Abstract | Training robust deep learning (DL) systems for medical image classification or segmentation is challenging due to limited images covering different disease types and severity. We propose an active learning (AL) framework to select most informative samples and add to the training data. We use conditional generative adversarial networks (cGANs) to generate realistic chest xray images with different disease characteristics by conditioning its generation on a real image sample. Informative samples to add to the training set are identified using a Bayesian neural network. Experiments show our proposed AL framework is able to achieve state of the art performance by using about 35% of the full dataset, thus saving significant time and effort over conventional methods. |
Tasks | Active Learning, Image Classification |
Published | 2018-06-14 |
URL | https://arxiv.org/abs/1806.05473v4 |
https://arxiv.org/pdf/1806.05473v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-active-learning-for-image |
Repo | |
Framework | |
Scalable Natural Gradient Langevin Dynamics in Practice
Title | Scalable Natural Gradient Langevin Dynamics in Practice |
Authors | Henri Palacci, Henry Hess |
Abstract | Stochastic Gradient Langevin Dynamics (SGLD) is a sampling scheme for Bayesian modeling adapted to large datasets and models. SGLD relies on the injection of Gaussian Noise at each step of a Stochastic Gradient Descent (SGD) update. In this scheme, every component in the noise vector is independent and has the same scale, whereas the parameters we seek to estimate exhibit strong variations in scale and significant correlation structures, leading to poor convergence and mixing times. We compare different preconditioning approaches to the normalization of the noise vector and benchmark these approaches on the following criteria: 1) mixing times of the multivariate parameter vector, 2) regularizing effect on small dataset where it is easy to overfit, 3) covariate shift detection and 4) resistance to adversarial examples. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02855v1 |
http://arxiv.org/pdf/1806.02855v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-natural-gradient-langevin-dynamics |
Repo | |
Framework | |
Human-Machine Inference Networks For Smart Decision Making: Opportunities and Challenges
Title | Human-Machine Inference Networks For Smart Decision Making: Opportunities and Challenges |
Authors | Aditya Vempaty, Bhavya Kailkhura, Pramod K. Varshney |
Abstract | The emerging paradigm of Human-Machine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for human-only or sensor-only networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases. |
Tasks | Decision Making |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09626v1 |
http://arxiv.org/pdf/1801.09626v1.pdf | |
PWC | https://paperswithcode.com/paper/human-machine-inference-networks-for-smart |
Repo | |
Framework | |
Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features
Title | Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features |
Authors | Constantinos Loukas |
Abstract | Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of ‘elapsed time’ (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory (LSTM) network was trained for video shot classification based on the fusion of CNN features with ‘elapsed time’, increasing the accuracy to 86%. Our results highlight the prominent role of visual saliency, long-range temporal recursion and ‘elapsed time’ (a feature so far ignored), for surgical phase recognition. |
Tasks | Transfer Learning |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07853v4 |
http://arxiv.org/pdf/1807.07853v4.pdf | |
PWC | https://paperswithcode.com/paper/surgical-phase-recognition-of-short-video |
Repo | |
Framework | |
Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data
Title | Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data |
Authors | Anru Zhang, Rungang Han |
Abstract | In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named \underline{s}parse \underline{t}ensor \underline{a}lternating \underline{t}hresholding for \underline{s}ingular \underline{v}alue \underline{d}ecomposition (STAT-SVD) is proposed. The proposed procedure features a novel double projection & thresholding scheme, which provides a sharp criterion for thresholding in each iteration. Compared with regular tensor SVD model, STAT-SVD permits more robust estimation under weaker assumptions. Both the upper and lower bounds for estimation accuracy are developed. The proposed procedure is shown to be minimax rate-optimal in a general class of situations. Simulation studies show that STAT-SVD performs well under a variety of configurations. We also illustrate the merits of the proposed procedure on a longitudinal tensor dataset on European country mortality rates. |
Tasks | Dimensionality Reduction |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01796v1 |
http://arxiv.org/pdf/1809.01796v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-sparse-singular-value-decomposition |
Repo | |
Framework | |
Multi-Band Covariance Interpolation with Applications in Massive MIMO
Title | Multi-Band Covariance Interpolation with Applications in Massive MIMO |
Authors | Saeid Haghighatshoar, Mahdi Barzegar Khalilsarai, Giuseppe Caire |
Abstract | In this paper, we study the problem of multi-band (frequency-variant) covariance interpolation with a particular emphasis towards massive MIMO applications. In a massive MIMO system, the communication between each BS with $M \gg 1$ antennas and each single-antenna user occurs through a collection of scatterers in the environment, where the channel vector of each user at BS antennas consists in a weighted linear combination of the array responses of the scatterers, where each scatterer has its own angle of arrival (AoA) and complex channel gain. The array response at a given AoA depends on the wavelength of the incoming planar wave and is naturally frequency dependent. This results in a frequency-dependent distortion where the second order statistics, i.e., the covariance matrix, of the channel vectors varies with frequency. In this paper, we show that although this effect is generally negligible for a small number of antennas $M$, it results in a considerable distortion of the covariance matrix and especially its dominant signal subspace in the massive MIMO regime where $M \to \infty$, and can generally incur a serious degradation of the performance especially in frequency division duplexing (FDD) massive MIMO systems where the uplink (UL) and the downlink (DL) communication occur over different frequency bands. We propose a novel UL-DL covariance interpolation technique that is able to recover the covariance matrix in the DL from an estimate of the covariance matrix in the UL under a mild reciprocity condition on the angular power spread function (PSF) of the users. We analyze the performance of our proposed scheme mathematically and prove its robustness under a sufficiently large spatial oversampling of the array. We also propose several simple off-the-shelf algorithms for UL-DL covariance interpolation and evaluate their performance via numerical simulations. |
Tasks | |
Published | 2018-01-11 |
URL | http://arxiv.org/abs/1801.03714v1 |
http://arxiv.org/pdf/1801.03714v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-band-covariance-interpolation-with |
Repo | |
Framework | |
Safe Active Feature Selection for Sparse Learning
Title | Safe Active Feature Selection for Sparse Learning |
Authors | Shaogang Ren, Jianhua Z. Huang, Shuai Huang, Xiaoning Qian |
Abstract | We present safe active incremental feature selection~(SAIF) to scale up the computation of LASSO solutions. SAIF does not require a solution from a heavier penalty parameter as in sequential screening or updating the full model for each iteration as in dynamic screening. Different from these existing screening methods, SAIF starts from a small number of features and incrementally recruits active features and updates the significantly reduced model. Hence, it is much more computationally efficient and scalable with the number of features. More critically, SAIF has the safe guarantee as it has the convergence guarantee to the optimal solution to the original full LASSO problem. Such an incremental procedure and theoretical convergence guarantee can be extended to fused LASSO problems. Compared with state-of-the-art screening methods as well as working set and homotopy methods, which may not always guarantee the optimal solution, SAIF can achieve superior or comparable efficiency and high scalability with the safe guarantee when facing extremely high dimensional data sets. Experiments with both synthetic and real-world data sets show that SAIF can be up to 50 times faster than dynamic screening, and hundreds of times faster than computing LASSO or fused LASSO solutions without screening. |
Tasks | Feature Selection, Sparse Learning |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05817v2 |
http://arxiv.org/pdf/1806.05817v2.pdf | |
PWC | https://paperswithcode.com/paper/safe-active-feature-selection-for-sparse |
Repo | |
Framework | |
Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance
Title | Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance |
Authors | Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto |
Abstract | Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distance, proving that it enjoys the same smoothness as its regularized version and we explicitly provide an efficient algorithm to compute its gradient. We show that this result benefits both theory and applications: on one hand, high order smoothness confers statistical guarantees to learning with Wasserstein approximations. On the other hand, the gradient formula allows us to efficiently solve learning and optimization problems in practice. Promising preliminary experiments complement our analysis. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11897v1 |
http://arxiv.org/pdf/1805.11897v1.pdf | |
PWC | https://paperswithcode.com/paper/differential-properties-of-sinkhorn |
Repo | |
Framework | |