Paper Group ANR 426
Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks. RBED: Reward Based Epsilon Decay. The Source-Target Domain Mismatch Problem in Machine Translation. Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture. A Deep Generative Model of Speech Complex Spectrograms. Multimodal …
Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks
Title | Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks |
Authors | Yumeng Zhang, Gaoguo Jia, Li Chen, Mingrui Zhang, Junhai Yong |
Abstract | There is an urgent need for an effective video classification method by means of a small number of samples. The deficiency of samples could be effectively alleviated by generating samples through Generative Adversarial Networks (GAN), but the generation of videos on a typical category remains to be underexplored since the complex actions and the changeable viewpoints are difficult to simulate. In this paper, we propose a generative data augmentation method for temporal stream of the Temporal Segment Networks with the dynamic image. The dynamic image compresses the motion information of video into a still image, removing the interference factors such as the background. Thus it is easier to generate images with categorical motion information using GAN. We use the generated dynamic images to enhance the features, with regularization achieved as well, thereby to achieve the effect of video augmentation. In order to deal with the uneven quality of generated images, we propose a Self-Paced Selection (SPS) method, which automatically selects the high-quality generated samples to be added to the network training. Our method is verified on two benchmark datasets, HMDB51 and UCF101. The experimental results show that the method can improve the accuracy of video classification under the circumstance of sample insufficiency and sample imbalance. |
Tasks | Data Augmentation, Video Classification |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.12929v1 |
https://arxiv.org/pdf/1909.12929v1.pdf | |
PWC | https://paperswithcode.com/paper/self-paced-video-data-augmentation-with |
Repo | |
Framework | |
RBED: Reward Based Epsilon Decay
Title | RBED: Reward Based Epsilon Decay |
Authors | Aakash Maroti |
Abstract | $\varepsilon$-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the $\varepsilon$ value, where $\varepsilon$ depicts the how much an agent is allowed to explore. This paper proposes a new approach to this $\varepsilon$ decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13701v1 |
https://arxiv.org/pdf/1910.13701v1.pdf | |
PWC | https://paperswithcode.com/paper/rbed-reward-based-epsilon-decay |
Repo | |
Framework | |
The Source-Target Domain Mismatch Problem in Machine Translation
Title | The Source-Target Domain Mismatch Problem in Machine Translation |
Authors | Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc’Aurelio Ranzato |
Abstract | While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that particularly in low resource settings this causes the domains of the source and target language to greatly mismatch, as the two languages are often spoken in further apart regions of the world with more distinctive cultural traits and unrelated local events. In this work we first propose a controlled setting to carefully analyze the source-target domain mismatch, and its dependence on the amount of parallel and monolingual data. Second, we test both a model trained with back-translation and one trained with self-training. The latter leverages in-domain source monolingual data but uses potentially incorrect target references. We found that these two approaches are often complementary to each other. For instance, on a low-resource Nepali-English dataset the combined approach improves upon the baseline using just parallel data by 2.5 BLEU points, and by 0.6 BLEU point when compared to back-translation. |
Tasks | Machine Translation |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13151v1 |
https://arxiv.org/pdf/1909.13151v1.pdf | |
PWC | https://paperswithcode.com/paper/the-source-target-domain-mismatch-problem-in |
Repo | |
Framework | |
Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture
Title | Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture |
Authors | Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac |
Abstract | This paper addresses the problem of interpolating visual textures. We formulate this problem by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results. We show our method outperforms a number of baselines according to a comprehensive suite of metrics as well as a user study. We further show several applications based on our technique, which include texture brush, texture dissolve, and animal hybridization. |
Tasks | |
Published | 2019-01-11 |
URL | http://arxiv.org/abs/1901.03447v2 |
http://arxiv.org/pdf/1901.03447v2.pdf | |
PWC | https://paperswithcode.com/paper/texture-mixer-a-network-for-controllable |
Repo | |
Framework | |
A Deep Generative Model of Speech Complex Spectrograms
Title | A Deep Generative Model of Speech Complex Spectrograms |
Authors | Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii |
Abstract | This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i.e., the group delay and the instantaneous frequency. Based on these assumptions, we explore and compare several combinations of loss functions for training our models. Built upon the variational autoencoder framework, our model consists of three convolutional neural networks acting as an encoder, a magnitude decoder, and a phase decoder. In addition to the latent variables, we propose to also condition the phase estimation on the estimated magnitude. Evaluated for a time-domain speech reconstruction task, our models could generate speech with a high perceptual quality and a high intelligibility. |
Tasks | |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03269v1 |
http://arxiv.org/pdf/1903.03269v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-generative-model-of-speech-complex |
Repo | |
Framework | |
Multimodal Deep Network Embedding with Integrated Structure and Attribute Information
Title | Multimodal Deep Network Embedding with Integrated Structure and Attribute Information |
Authors | Conghui Zheng, Li Pan, Peng Wu |
Abstract | Network embedding is the process of learning low-dimensional representations for nodes in a network, while preserving node features. Existing studies only leverage network structure information and focus on preserving structural features. However, nodes in real-world networks often have a rich set of attributes providing extra semantic information. It has been demonstrated that both structural and attribute features are important for network analysis tasks. To preserve both features, we investigate the problem of integrating structure and attribute information to perform network embedding and propose a Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear network structures and the complex interactions among structures and attributes, using a deep model consisting of multiple layers of non-linear functions. Since structures and attributes are two different types of information, a multimodal learning method is adopted to pre-process them and help the model to better capture the correlations between node structure and attribute information. We employ both structural proximity and attribute proximity in the loss function to preserve the respective features and the representations are obtained by minimizing the loss function. Results of extensive experiments on four real-world datasets show that the proposed method performs significantly better than baselines on a variety of tasks, which demonstrate the effectiveness and generality of our method. |
Tasks | Network Embedding |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12019v1 |
http://arxiv.org/pdf/1903.12019v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-deep-network-embedding-with |
Repo | |
Framework | |
Automatic Image Co-Segmentation: A Survey
Title | Automatic Image Co-Segmentation: A Survey |
Authors | Xiabi Liu, Xin Duan |
Abstract | Image co-segmentation is important for its advantage of alleviating the ill-pose nature of image segmentation through exploring the correlation between related images. Many automatic image co-segmentation algorithms have been developed in the last decade, which are investigated comprehensively in this paper. We firstly analyze visual/semantic cues for guiding image co-segmentation, including object cues and correlation cues. Then we describe the traditional methods in three categories of object elements based, object regions/contours based, common object model based. In the next part, deep learning based methods are reviewed. Furthermore, widely used test datasets and evaluation criteria are introduced and the reported performances of the surveyed algorithms are compared with each other. Finally, we discuss the current challenges and possible future directions and conclude the paper. Hopefully, this comprehensive investigation will be helpful for the development of image co-segmentation technique. |
Tasks | Semantic Segmentation |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07685v1 |
https://arxiv.org/pdf/1911.07685v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-image-co-segmentation-a-survey |
Repo | |
Framework | |
Evolving ab initio trading strategies in heterogeneous environments
Title | Evolving ab initio trading strategies in heterogeneous environments |
Authors | David Rushing Dewhurst, Yi Li, Alexander Bogdan, Jasmine Geng |
Abstract | Securities markets are quintessential complex adaptive systems in which heterogeneous agents compete in an attempt to maximize returns. Species of trading agents are also subject to evolutionary pressure as entire classes of strategies become obsolete and new classes emerge. Using an agent-based model of interacting heterogeneous agents as a flexible environment that can endogenously model many diverse market conditions, we subject deep neural networks to evolutionary pressure to create dominant trading agents. After analyzing the performance of these agents and noting the emergence of anomalous superdiffusion through the evolutionary process, we construct a method to turn high-fitness agents into trading algorithms. We backtest these trading algorithms on real high-frequency foreign exchange data, demonstrating that elite trading algorithms are consistently profitable in a variety of market conditions—even though these algorithms had never before been exposed to real financial data. These results provide evidence to suggest that developing \textit{ab initio} trading strategies by repeated simulation and evolution in a mechanistic market model may be a practical alternative to explicitly training models with past observed market data. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09524v1 |
https://arxiv.org/pdf/1912.09524v1.pdf | |
PWC | https://paperswithcode.com/paper/evolving-ab-initio-trading-strategies-in |
Repo | |
Framework | |
Efficient Inner Product Approximation in Hybrid Spaces
Title | Efficient Inner Product Approximation in Hybrid Spaces |
Authors | Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar |
Abstract | Many emerging use cases of data mining and machine learning operate on large datasets with data from heterogeneous sources, specifically with both sparse and dense components. For example, dense deep neural network embedding vectors are often used in conjunction with sparse textual features to provide high dimensional hybrid representation of documents. Efficient search in such hybrid spaces is very challenging as the techniques that perform well for sparse vectors have little overlap with those that work well for dense vectors. Popular techniques like Locality Sensitive Hashing (LSH) and its data-dependent variants also do not give good accuracy in high dimensional hybrid spaces. Even though hybrid scenarios are becoming more prevalent, currently there exist no efficient techniques in literature that are both fast and accurate. In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy. We also propose efficient data structures that exploit modern computer architectures, resulting in orders of magnitude faster search than the existing baselines. The performance of the proposed method is demonstrated on several datasets including a very large scale industrial dataset containing one billion vectors in a billion dimensional space, achieving over 10x speedup and higher accuracy against competitive baselines. |
Tasks | Network Embedding |
Published | 2019-03-20 |
URL | http://arxiv.org/abs/1903.08690v1 |
http://arxiv.org/pdf/1903.08690v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-inner-product-approximation-in |
Repo | |
Framework | |
Multi-Hot Compact Network Embedding
Title | Multi-Hot Compact Network Embedding |
Authors | Chaozhuo Li, Senzhang Wang, Philip S. Yu, Zhoujun Li |
Abstract | Network embedding, as a promising way of the network representation learning, is capable of supporting various subsequent network mining and analysis tasks, and has attracted growing research interests recently. Traditional approaches assign each node with an independent continuous vector, which will cause huge memory overhead for large networks. In this paper we propose a novel multi-hot compact embedding strategy to effectively reduce memory cost by learning partially shared embeddings. The insight is that a node embedding vector is composed of several basis vectors, which can significantly reduce the number of continuous vectors while maintain similar data representation ability. Specifically, we propose a MCNE model to learn compact embeddings from pre-learned node features. A novel component named compressor is integrated into MCNE to tackle the challenge that popular back-propagation optimization cannot propagate through discrete samples. We further propose an end-to-end model MCNE$_{t}$ to learn compact embeddings from the input network directly. Empirically, we evaluate the proposed models over three real network datasets, and the results demonstrate that our proposals can save about 90% of memory cost of network embeddings without significantly performance decline. |
Tasks | Network Embedding, Representation Learning |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.03213v2 |
https://arxiv.org/pdf/1903.03213v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-hot-compact-network-embedding |
Repo | |
Framework | |
Representation Learning for Recommender Systems with Application to the Scientific Literature
Title | Representation Learning for Recommender Systems with Application to the Scientific Literature |
Authors | Robin Brochier |
Abstract | The scientific literature is a large information network linking various actors (laboratories, companies, institutions, etc.). The vast amount of data generated by this network constitutes a dynamic heterogeneous attributed network (HAN), in which new information is constantly produced and from which it is increasingly difficult to extract content of interest. In this article, I present my first thesis works in partnership with an industrial company, Digital Scientific Research Technology. This later offers a scientific watch tool, Peerus, addressing various issues, such as the real time recommendation of newly published papers or the search for active experts to start new collaborations. To tackle this diversity of applications, a common approach consists in learning representations of the nodes and attributes of this HAN and use them as features for a variety of recommendation tasks. However, most works on attributed network embedding pay too little attention to textual attributes and do not fully take advantage of recent natural language processing techniques. Moreover, proposed methods that jointly learn node and document representations do not provide a way to effectively infer representations for new documents for which network information is missing, which happens to be crucial in real time recommender systems. Finally, the interplay between textual and graph data in text-attributed heterogeneous networks remains an open research direction. |
Tasks | Network Embedding, Recommendation Systems, Representation Learning |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11058v1 |
http://arxiv.org/pdf/1902.11058v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-for-recommender |
Repo | |
Framework | |
Bio-Inspired Foveated Technique for Augmented-Range Vehicle Detection Using Deep Neural Networks
Title | Bio-Inspired Foveated Technique for Augmented-Range Vehicle Detection Using Deep Neural Networks |
Authors | Pedro Azevedo, Sabrina S. Panceri, Rânik Guidolini, Vinicius B. Cardoso, Claudine Badue, Thiago Oliveira-Santos, Alberto F. De Souza |
Abstract | We propose a bio-inspired foveated technique to detect cars in a long range camera view using a deep convolutional neural network (DCNN) for the IARA self-driving car. The DCNN receives as input (i) an image, which is captured by a camera installed on IARA’s roof; and (ii) crops of the image, which are centered in the waypoints computed by IARA’s path planner and whose sizes increase with the distance from IARA. We employ an overlap filter to discard detections of the same car in different crops of the same image based on the percentage of overlap of detections’ bounding boxes. We evaluated the performance of the proposed augmented-range vehicle detection system (ARVDS) using the hardware and software infrastructure available in the IARA self-driving car. Using IARA, we captured thousands of images of real traffic situations containing cars in a long range. Experimental results show that ARVDS increases the Average Precision (AP) of long range car detection from 29.51% (using a single whole image) to 63.15%. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00944v1 |
https://arxiv.org/pdf/1910.00944v1.pdf | |
PWC | https://paperswithcode.com/paper/bio-inspired-foveated-technique-for-augmented |
Repo | |
Framework | |
Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation
Title | Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation |
Authors | Risheng Liu, Long Ma, Yuxi Zhang, Xin Fan, Zhongxuan Luo |
Abstract | Enhancing visual qualities for underexposed images is an extensively concerned task that plays important roles in various areas of multimedia and computer vision. Most existing methods often fail to generate high-quality results with appropriate luminance and abundant details. To address these issues, we in this work develop a novel framework, integrating both knowledge from physical principles and implicit distributions from data to solve the underexposed image correction task. More concretely, we propose a new perspective to formulate this task as an energy-inspired model with advanced hybrid priors. A propagation procedure navigated by the hybrid priors is well designed for simultaneously propagating the reflectance and illumination toward desired results. We conduct extensive experiments to verify the necessity of integrating both underlying principles (i.e., with knowledge) and distributions (i.e., from data) as navigated deep propagation. Plenty of experimental results of underexposed image correction demonstrate that our proposed method performs favorably against the state-of-the-art methods on both subjective and objective assessments. Additionally, we execute the task of face detection to further verify the naturalness and practical value of underexposed image correction. What’s more, we employ our method to single image haze removal whose experimental results further demonstrate its superiorities. |
Tasks | Face Detection, Single Image Haze Removal |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07408v1 |
https://arxiv.org/pdf/1907.07408v1.pdf | |
PWC | https://paperswithcode.com/paper/underexposed-image-correction-via-hybrid |
Repo | |
Framework | |
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Title | Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization |
Authors | Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang |
Abstract | Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, their poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization—stronger-than-typical $\ell_2$ regularization or early stopping—we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce and give convergence guarantees for a stochastic optimizer for the group DRO setting, underpinning the empirical study above. |
Tasks | Natural Language Inference |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08731v1 |
https://arxiv.org/pdf/1911.08731v1.pdf | |
PWC | https://paperswithcode.com/paper/distributionally-robust-neural-networks-for |
Repo | |
Framework | |
Color inference from semantic labeling for person search in videos
Title | Color inference from semantic labeling for person search in videos |
Authors | Jules Simon, Guillaume-Alexandre Bilodeau, David Steele, Harshad Mahadik |
Abstract | We propose an explainable model to generate semantic color labels for person search. In this context, persons are described from their semantic parts, such as hat, shirt, etc. Person search consists in looking for people based on these descriptions. In this work, we aim to improve the accuracy of color labels for people. Our goal is to handle the high variability of human perception. Existing solutions are based on hand-crafted features or learnt features that are not explainable. Moreover most of them only focus on a limited set of colors. We propose a method based on binary search trees and a large peer-labelled color name dataset. This allows us to synthesize the human perception of colors. Using semantic segmentation and our color labeling method, we label segments of pedestrians with their associated colors. We evaluate our solution on person search on datasets such as PCN, and show a precision as high as 80.4%. |
Tasks | Person Search, Semantic Segmentation |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13114v1 |
https://arxiv.org/pdf/1911.13114v1.pdf | |
PWC | https://paperswithcode.com/paper/color-inference-from-semantic-labeling-for |
Repo | |
Framework | |