January 30, 2020

3327 words 16 mins read

Paper Group ANR 426

Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks. RBED: Reward Based Epsilon Decay. The Source-Target Domain Mismatch Problem in Machine Translation. Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture. A Deep Generative Model of Speech Complex Spectrograms. Multimodal …

Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks


Title	Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks
Authors	Yumeng Zhang, Gaoguo Jia, Li Chen, Mingrui Zhang, Junhai Yong
Abstract	There is an urgent need for an effective video classification method by means of a small number of samples. The deficiency of samples could be effectively alleviated by generating samples through Generative Adversarial Networks (GAN), but the generation of videos on a typical category remains to be underexplored since the complex actions and the changeable viewpoints are difficult to simulate. In this paper, we propose a generative data augmentation method for temporal stream of the Temporal Segment Networks with the dynamic image. The dynamic image compresses the motion information of video into a still image, removing the interference factors such as the background. Thus it is easier to generate images with categorical motion information using GAN. We use the generated dynamic images to enhance the features, with regularization achieved as well, thereby to achieve the effect of video augmentation. In order to deal with the uneven quality of generated images, we propose a Self-Paced Selection (SPS) method, which automatically selects the high-quality generated samples to be added to the network training. Our method is verified on two benchmark datasets, HMDB51 and UCF101. The experimental results show that the method can improve the accuracy of video classification under the circumstance of sample insufficiency and sample imbalance.
Tasks	Data Augmentation, Video Classification
Published	2019-09-16
URL	https://arxiv.org/abs/1909.12929v1
PDF	https://arxiv.org/pdf/1909.12929v1.pdf
PWC	https://paperswithcode.com/paper/self-paced-video-data-augmentation-with
Repo
Framework

RBED: Reward Based Epsilon Decay


Title	RBED: Reward Based Epsilon Decay
Authors	Aakash Maroti
Abstract	$\varepsilon$-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the $\varepsilon$ value, where $\varepsilon$ depicts the how much an agent is allowed to explore. This paper proposes a new approach to this $\varepsilon$ decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13701v1
PDF	https://arxiv.org/pdf/1910.13701v1.pdf
PWC	https://paperswithcode.com/paper/rbed-reward-based-epsilon-decay
Repo
Framework

The Source-Target Domain Mismatch Problem in Machine Translation


Title	The Source-Target Domain Mismatch Problem in Machine Translation
Authors	Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc’Aurelio Ranzato
Abstract	While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that particularly in low resource settings this causes the domains of the source and target language to greatly mismatch, as the two languages are often spoken in further apart regions of the world with more distinctive cultural traits and unrelated local events. In this work we first propose a controlled setting to carefully analyze the source-target domain mismatch, and its dependence on the amount of parallel and monolingual data. Second, we test both a model trained with back-translation and one trained with self-training. The latter leverages in-domain source monolingual data but uses potentially incorrect target references. We found that these two approaches are often complementary to each other. For instance, on a low-resource Nepali-English dataset the combined approach improves upon the baseline using just parallel data by 2.5 BLEU points, and by 0.6 BLEU point when compared to back-translation.
Tasks	Machine Translation
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13151v1
PDF	https://arxiv.org/pdf/1909.13151v1.pdf
PWC	https://paperswithcode.com/paper/the-source-target-domain-mismatch-problem-in
Repo
Framework

Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture


Title	Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture
Authors	Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac
Abstract	This paper addresses the problem of interpolating visual textures. We formulate this problem by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results. We show our method outperforms a number of baselines according to a comprehensive suite of metrics as well as a user study. We further show several applications based on our technique, which include texture brush, texture dissolve, and animal hybridization.
Tasks
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03447v2
PDF	http://arxiv.org/pdf/1901.03447v2.pdf
PWC	https://paperswithcode.com/paper/texture-mixer-a-network-for-controllable
Repo
Framework

A Deep Generative Model of Speech Complex Spectrograms


Title	A Deep Generative Model of Speech Complex Spectrograms
Authors	Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii
Abstract	This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i.e., the group delay and the instantaneous frequency. Based on these assumptions, we explore and compare several combinations of loss functions for training our models. Built upon the variational autoencoder framework, our model consists of three convolutional neural networks acting as an encoder, a magnitude decoder, and a phase decoder. In addition to the latent variables, we propose to also condition the phase estimation on the estimated magnitude. Evaluated for a time-domain speech reconstruction task, our models could generate speech with a high perceptual quality and a high intelligibility.
Tasks
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03269v1
PDF	http://arxiv.org/pdf/1903.03269v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-generative-model-of-speech-complex
Repo
Framework

Multimodal Deep Network Embedding with Integrated Structure and Attribute Information


Title	Multimodal Deep Network Embedding with Integrated Structure and Attribute Information
Authors	Conghui Zheng, Li Pan, Peng Wu
Abstract	Network embedding is the process of learning low-dimensional representations for nodes in a network, while preserving node features. Existing studies only leverage network structure information and focus on preserving structural features. However, nodes in real-world networks often have a rich set of attributes providing extra semantic information. It has been demonstrated that both structural and attribute features are important for network analysis tasks. To preserve both features, we investigate the problem of integrating structure and attribute information to perform network embedding and propose a Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear network structures and the complex interactions among structures and attributes, using a deep model consisting of multiple layers of non-linear functions. Since structures and attributes are two different types of information, a multimodal learning method is adopted to pre-process them and help the model to better capture the correlations between node structure and attribute information. We employ both structural proximity and attribute proximity in the loss function to preserve the respective features and the representations are obtained by minimizing the loss function. Results of extensive experiments on four real-world datasets show that the proposed method performs significantly better than baselines on a variety of tasks, which demonstrate the effectiveness and generality of our method.
Tasks	Network Embedding
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12019v1
PDF	http://arxiv.org/pdf/1903.12019v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-deep-network-embedding-with
Repo
Framework

Automatic Image Co-Segmentation: A Survey


Title	Automatic Image Co-Segmentation: A Survey
Authors	Xiabi Liu, Xin Duan
Abstract	Image co-segmentation is important for its advantage of alleviating the ill-pose nature of image segmentation through exploring the correlation between related images. Many automatic image co-segmentation algorithms have been developed in the last decade, which are investigated comprehensively in this paper. We firstly analyze visual/semantic cues for guiding image co-segmentation, including object cues and correlation cues. Then we describe the traditional methods in three categories of object elements based, object regions/contours based, common object model based. In the next part, deep learning based methods are reviewed. Furthermore, widely used test datasets and evaluation criteria are introduced and the reported performances of the surveyed algorithms are compared with each other. Finally, we discuss the current challenges and possible future directions and conclude the paper. Hopefully, this comprehensive investigation will be helpful for the development of image co-segmentation technique.
Tasks	Semantic Segmentation
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07685v1
PDF	https://arxiv.org/pdf/1911.07685v1.pdf
PWC	https://paperswithcode.com/paper/automatic-image-co-segmentation-a-survey
Repo
Framework

Evolving ab initio trading strategies in heterogeneous environments


Title	Evolving ab initio trading strategies in heterogeneous environments
Authors	David Rushing Dewhurst, Yi Li, Alexander Bogdan, Jasmine Geng
Abstract	Securities markets are quintessential complex adaptive systems in which heterogeneous agents compete in an attempt to maximize returns. Species of trading agents are also subject to evolutionary pressure as entire classes of strategies become obsolete and new classes emerge. Using an agent-based model of interacting heterogeneous agents as a flexible environment that can endogenously model many diverse market conditions, we subject deep neural networks to evolutionary pressure to create dominant trading agents. After analyzing the performance of these agents and noting the emergence of anomalous superdiffusion through the evolutionary process, we construct a method to turn high-fitness agents into trading algorithms. We backtest these trading algorithms on real high-frequency foreign exchange data, demonstrating that elite trading algorithms are consistently profitable in a variety of market conditions—even though these algorithms had never before been exposed to real financial data. These results provide evidence to suggest that developing \textit{ab initio} trading strategies by repeated simulation and evolution in a mechanistic market model may be a practical alternative to explicitly training models with past observed market data.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09524v1
PDF	https://arxiv.org/pdf/1912.09524v1.pdf
PWC	https://paperswithcode.com/paper/evolving-ab-initio-trading-strategies-in
Repo
Framework

Efficient Inner Product Approximation in Hybrid Spaces


Title	Efficient Inner Product Approximation in Hybrid Spaces
Authors	Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar
Abstract	Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are often used in conjunction with sparse textual features to provide high dimensional hybrid representation of documents. Efficient search in such hybrid spaces is very challenging as the techniques that perform well for sparse vectors have little overlap with those that work well for dense vectors. Popular techniques like Locality Sensitive Hashing (LSH) and its data-dependent variants also do not give good accuracy in high dimensional hybrid spaces. Even though hybrid scenarios are becoming more prevalent, currently there exist no efficient techniques in literature that are both fast and accurate. In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy. We also propose efficient data structures that exploit modern computer architectures, resulting in orders of magnitude faster search than the existing baselines. The performance of the proposed method is demonstrated on several datasets including a very large scale industrial dataset containing one billion vectors in a billion dimensional space, achieving over 10x speedup and higher accuracy against competitive baselines.
Tasks	Network Embedding
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08690v1
PDF	http://arxiv.org/pdf/1903.08690v1.pdf
PWC	https://paperswithcode.com/paper/efficient-inner-product-approximation-in
Repo
Framework

Multi-Hot Compact Network Embedding


Title	Multi-Hot Compact Network Embedding
Authors	Chaozhuo Li, Senzhang Wang, Philip S. Yu, Zhoujun Li
Abstract	Network embedding, as a promising way of the network representation learning, is capable of supporting various subsequent network mining and analysis tasks, and has attracted growing research interests recently. Traditional approaches assign each node with an independent continuous vector, which will cause huge memory overhead for large networks. In this paper we propose a novel multi-hot compact embedding strategy to effectively reduce memory cost by learning partially shared embeddings. The insight is that a node embedding vector is composed of several basis vectors, which can significantly reduce the number of continuous vectors while maintain similar data representation ability. Specifically, we propose a MCNE model to learn compact embeddings from pre-learned node features. A novel component named compressor is integrated into MCNE to tackle the challenge that popular back-propagation optimization cannot propagate through discrete samples. We further propose an end-to-end model MCNE$_{t}$ to learn compact embeddings from the input network directly. Empirically, we evaluate the proposed models over three real network datasets, and the results demonstrate that our proposals can save about 90% of memory cost of network embeddings without significantly performance decline.
Tasks	Network Embedding, Representation Learning
Published	2019-03-07
URL	https://arxiv.org/abs/1903.03213v2
PDF	https://arxiv.org/pdf/1903.03213v2.pdf
PWC	https://paperswithcode.com/paper/multi-hot-compact-network-embedding
Repo
Framework

Representation Learning for Recommender Systems with Application to the Scientific Literature


Title	Representation Learning for Recommender Systems with Application to the Scientific Literature
Authors	Robin Brochier
Abstract	The scientific literature is a large information network linking various actors (laboratories, companies, institutions, etc.). The vast amount of data generated by this network constitutes a dynamic heterogeneous attributed network (HAN), in which new information is constantly produced and from which it is increasingly difficult to extract content of interest. In this article, I present my first thesis works in partnership with an industrial company, Digital Scientific Research Technology. This later offers a scientific watch tool, Peerus, addressing various issues, such as the real time recommendation of newly published papers or the search for active experts to start new collaborations. To tackle this diversity of applications, a common approach consists in learning representations of the nodes and attributes of this HAN and use them as features for a variety of recommendation tasks. However, most works on attributed network embedding pay too little attention to textual attributes and do not fully take advantage of recent natural language processing techniques. Moreover, proposed methods that jointly learn node and document representations do not provide a way to effectively infer representations for new documents for which network information is missing, which happens to be crucial in real time recommender systems. Finally, the interplay between textual and graph data in text-attributed heterogeneous networks remains an open research direction.
Tasks	Network Embedding, Recommendation Systems, Representation Learning
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11058v1
PDF	http://arxiv.org/pdf/1902.11058v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-for-recommender
Repo
Framework

Bio-Inspired Foveated Technique for Augmented-Range Vehicle Detection Using Deep Neural Networks


Title	Bio-Inspired Foveated Technique for Augmented-Range Vehicle Detection Using Deep Neural Networks
Authors	Pedro Azevedo, Sabrina S. Panceri, Rânik Guidolini, Vinicius B. Cardoso, Claudine Badue, Thiago Oliveira-Santos, Alberto F. De Souza
Abstract	We propose a bio-inspired foveated technique to detect cars in a long range camera view using a deep convolutional neural network (DCNN) for the IARA self-driving car. The DCNN receives as input (i) an image, which is captured by a camera installed on IARA’s roof; and (ii) crops of the image, which are centered in the waypoints computed by IARA’s path planner and whose sizes increase with the distance from IARA. We employ an overlap filter to discard detections of the same car in different crops of the same image based on the percentage of overlap of detections’ bounding boxes. We evaluated the performance of the proposed augmented-range vehicle detection system (ARVDS) using the hardware and software infrastructure available in the IARA self-driving car. Using IARA, we captured thousands of images of real traffic situations containing cars in a long range. Experimental results show that ARVDS increases the Average Precision (AP) of long range car detection from 29.51% (using a single whole image) to 63.15%.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00944v1
PDF	https://arxiv.org/pdf/1910.00944v1.pdf
PWC	https://paperswithcode.com/paper/bio-inspired-foveated-technique-for-augmented
Repo
Framework

Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation


Title	Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation
Authors	Risheng Liu, Long Ma, Yuxi Zhang, Xin Fan, Zhongxuan Luo
Abstract	Enhancing visual qualities for underexposed images is an extensively concerned task that plays important roles in various areas of multimedia and computer vision. Most existing methods often fail to generate high-quality results with appropriate luminance and abundant details. To address these issues, we in this work develop a novel framework, integrating both knowledge from physical principles and implicit distributions from data to solve the underexposed image correction task. More concretely, we propose a new perspective to formulate this task as an energy-inspired model with advanced hybrid priors. A propagation procedure navigated by the hybrid priors is well designed for simultaneously propagating the reflectance and illumination toward desired results. We conduct extensive experiments to verify the necessity of integrating both underlying principles (i.e., with knowledge) and distributions (i.e., from data) as navigated deep propagation. Plenty of experimental results of underexposed image correction demonstrate that our proposed method performs favorably against the state-of-the-art methods on both subjective and objective assessments. Additionally, we execute the task of face detection to further verify the naturalness and practical value of underexposed image correction. What’s more, we employ our method to single image haze removal whose experimental results further demonstrate its superiorities.
Tasks	Face Detection, Single Image Haze Removal
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07408v1
PDF	https://arxiv.org/pdf/1907.07408v1.pdf
PWC	https://paperswithcode.com/paper/underexposed-image-correction-via-hybrid
Repo
Framework

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization


Title	Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Authors	Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang
Abstract	Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, their poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization—stronger-than-typical $\ell_2$ regularization or early stopping—we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce and give convergence guarantees for a stochastic optimizer for the group DRO setting, underpinning the empirical study above.
Tasks	Natural Language Inference
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08731v1
PDF	https://arxiv.org/pdf/1911.08731v1.pdf
PWC	https://paperswithcode.com/paper/distributionally-robust-neural-networks-for
Repo
Framework

Color inference from semantic labeling for person search in videos


Title	Color inference from semantic labeling for person search in videos
Authors	Jules Simon, Guillaume-Alexandre Bilodeau, David Steele, Harshad Mahadik
Abstract	We propose an explainable model to generate semantic color labels for person search. In this context, persons are described from their semantic parts, such as hat, shirt, etc. Person search consists in looking for people based on these descriptions. In this work, we aim to improve the accuracy of color labels for people. Our goal is to handle the high variability of human perception. Existing solutions are based on hand-crafted features or learnt features that are not explainable. Moreover most of them only focus on a limited set of colors. We propose a method based on binary search trees and a large peer-labelled color name dataset. This allows us to synthesize the human perception of colors. Using semantic segmentation and our color labeling method, we label segments of pedestrians with their associated colors. We evaluate our solution on person search on datasets such as PCN, and show a precision as high as 80.4%.
Tasks	Person Search, Semantic Segmentation
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13114v1
PDF	https://arxiv.org/pdf/1911.13114v1.pdf
PWC	https://paperswithcode.com/paper/color-inference-from-semantic-labeling-for
Repo
Framework