October 20, 2019

2687 words 13 mins read

Paper Group AWR 247

Paper Group AWR 247

Joint Monocular 3D Vehicle Detection and Tracking. StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks. Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization. Facial Landmark Detection for Manga Images. Rotation Equivariance and Invariance in Convolutional …

Joint Monocular 3D Vehicle Detection and Tracking

Title Joint Monocular 3D Vehicle Detection and Tracking
Authors Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
Abstract Vehicle 3D extents and trajectories are critical cues for predicting the future location of vehicles and planning future agent ego-motion based on those predictions. In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform. Our method leverages 3D box depth-ordering matching for robust instance association and utilizes 3D trajectory prediction for re-identification of occluded vehicles. We also design a motion learning module based on an LSTM for more accurate long-term motion extrapolation. Our experiments on simulation, KITTI, and Argoverse datasets show that our 3D tracking pipeline offers robust data association and tracking. On Argoverse, our image-based method is significantly better for tracking 3D vehicles within 30 meters than the LiDAR-centric baseline methods.
Tasks 3D Object Detection, 3D Pose Estimation, Autonomous Vehicles, Multiple Object Tracking, Object Tracking, Online Multi-Object Tracking, Pose Estimation, Trajectory Prediction
Published 2018-11-26
URL https://arxiv.org/abs/1811.10742v3
PDF https://arxiv.org/pdf/1811.10742v3.pdf
PWC https://paperswithcode.com/paper/joint-monocular-3d-vehicle-detection-and
Repo https://github.com/ucbdrive/3d-vehicle-tracking
Framework pytorch

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Title StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
Authors Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
Abstract This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate converted speech signals quickly enough to allow real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic-sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on variational autoencoding GANs.
Tasks Voice Conversion
Published 2018-06-06
URL http://arxiv.org/abs/1806.02169v2
PDF http://arxiv.org/pdf/1806.02169v2.pdf
PWC https://paperswithcode.com/paper/stargan-vc-non-parallel-many-to-many-voice
Repo https://github.com/bajibabu/CycleGAN-VC
Framework pytorch

Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization

Title Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization
Authors Rishabh Iyer, John T. Halloran, Kai Wei
Abstract This paper introduces Jensen, an easily extensible and scalable toolkit for production-level machine learning and convex optimization. Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.). This framework makes it possible to deploy and train models with a few lines of code, and also extend and build upon this by integrating new loss functions and optimization algorithms.
Tasks
Published 2018-07-17
URL http://arxiv.org/abs/1807.06574v1
PDF http://arxiv.org/pdf/1807.06574v1.pdf
PWC https://paperswithcode.com/paper/jensen-an-easily-extensible-c-toolkit-for
Repo https://github.com/rishabhk108/jensen
Framework none

Facial Landmark Detection for Manga Images

Title Facial Landmark Detection for Manga Images
Authors Marco Stricker, Olivier Augereau, Koichi Kise, Motoi Iwata
Abstract The topic of facial landmark detection has been widely covered for pictures of human faces, but it is still a challenge for drawings. Indeed, the proportions and symmetry of standard human faces are not always used for comics or mangas. The personal style of the author, the limitation of colors, etc. makes the landmark detection on faces in drawings a difficult task. Detecting the landmarks on manga images will be useful to provide new services for easily editing the character faces, estimating the character emotions, or generating automatically some animations such as lip or eye movements. This paper contains two main contributions: 1) a new landmark annotation model for manga faces, and 2) a deep learning approach to detect these landmarks. We use the “Deep Alignment Network”, a multi stage architecture where the first stage makes an initial estimation which gets refined in further stages. The first results show that the proposed method succeed to accurately find the landmarks in more than 80% of the cases.
Tasks Facial Landmark Detection
Published 2018-11-08
URL http://arxiv.org/abs/1811.03214v1
PDF http://arxiv.org/pdf/1811.03214v1.pdf
PWC https://paperswithcode.com/paper/facial-landmark-detection-for-manga-images
Repo https://github.com/oaugereau/FacialLandmarkManga
Framework none

Rotation Equivariance and Invariance in Convolutional Neural Networks

Title Rotation Equivariance and Invariance in Convolutional Neural Networks
Authors Benjamin Chidester, Minh N. Do, Jian Ma
Abstract Performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Many image classification tasks, such as those related to cellular imaging, exhibit invariance to rotation. We present a novel scheme using the magnitude response of the 2D-discrete-Fourier transform (2D-DFT) to encode rotational invariance in neural networks, along with a new, efficient convolutional scheme for encoding rotational equivariance throughout convolutional layers. We implemented this scheme for several image classification tasks and demonstrated improved performance, in terms of classification accuracy, time required to train the model, and robustness to hyperparameter selection, over a standard CNN and another state-of-the-art method.
Tasks Image Classification
Published 2018-05-31
URL http://arxiv.org/abs/1805.12301v1
PDF http://arxiv.org/pdf/1805.12301v1.pdf
PWC https://paperswithcode.com/paper/rotation-equivariance-and-invariance-in
Repo https://github.com/bchidest/RiCNN
Framework tf

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Title Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Authors Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang
Abstract Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization. We conjecture that: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects.We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. DCR samples hard false positives from the base classifier in Faster RCNN and trains a RCNN-styled strong classifier. Experiments show new state-of-the-art results on PASCAL VOC and COCO without any bells and whistles.
Tasks Multi-Task Learning
Published 2018-03-19
URL http://arxiv.org/abs/1803.06799v3
PDF http://arxiv.org/pdf/1803.06799v3.pdf
PWC https://paperswithcode.com/paper/revisiting-rcnn-on-awakening-the
Repo https://github.com/bowenc0221/Decoupled-Classification-Refinement
Framework tf

Maintaining Natural Image Statistics with the Contextual Loss

Title Maintaining Natural Image Statistics with the Contextual Loss
Authors Roey Mechrez, Itamar Talmi, Firas Shama, Lihi Zelnik-Manor
Abstract Maintaining natural image statistics is a crucial factor in restoration and generation of realistic looking images. When training CNNs, photorealism is usually attempted by adversarial training (GAN), that pushes the output images to lie on the manifold of natural images. GANs are very powerful, but not perfect. They are hard to train and the results still often suffer from artifacts. In this paper we propose a complementary approach, that could be applied with or without GAN, whose goal is to train a feed-forward CNN to maintain natural internal statistics. We look explicitly at the distribution of features in an image and train the network to generate images with natural feature distributions. Our approach reduces by orders of magnitude the number of images required for training and achieves state-of-the-art results on both single-image super-resolution, and high-resolution surface normal estimation.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-03-13
URL http://arxiv.org/abs/1803.04626v3
PDF http://arxiv.org/pdf/1803.04626v3.pdf
PWC https://paperswithcode.com/paper/maintaining-natural-image-statistics-with-the
Repo https://github.com/idearibosome/tf-perceptual-eusr
Framework tf

Combating Adversarial Attacks Using Sparse Representations

Title Combating Adversarial Attacks Using Sparse Representations
Authors Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani
Abstract It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks (DNNs). In this paper, we make the case that sparse representations of the input data are a crucial tool for combating such attacks. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K / N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to DNNs, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. Experimental results for the MNIST dataset show the efficacy of the proposed sparsifying front end.
Tasks
Published 2018-03-11
URL http://arxiv.org/abs/1803.03880v3
PDF http://arxiv.org/pdf/1803.03880v3.pdf
PWC https://paperswithcode.com/paper/combating-adversarial-attacks-using-sparse
Repo https://github.com/ZhinusMarzi/Adversarial-attack
Framework tf

Leveraging Virtual and Real Person for Unsupervised Person Re-identification

Title Leveraging Virtual and Real Person for Unsupervised Person Re-identification
Authors Fengxiang Yang, Zhun Zhong, Zhiming Luo, Sheng Lian, Shaozi Li
Abstract Person re-identification (re-ID) is a challenging problem especially when no labels are available for training. Although recent deep re-ID methods have achieved great improvement, it is still difficult to optimize deep re-ID model without annotations in training data. To address this problem, this study introduces a novel approach for unsupervised person re-ID by leveraging virtual and real data. Our approach includes two components: virtual person generation and training of deep re-ID model. For virtual person generation, we learn a person generation model and a camera style transfer model using unlabeled real data to generate virtual persons with different poses and camera styles. The virtual data is formed as labeled training data, enabling subsequently training deep re-ID model in supervision. For training of deep re-ID model, we divide it into three steps: 1) pre-training a coarse re-ID model by using virtual data; 2) collaborative filtering based positive pair mining from the real data; and 3) fine-tuning of the coarse re-ID model by leveraging the mined positive pairs and virtual data. The final re-ID model is achieved by iterating between step 2 and step 3 until convergence. Experimental results on two large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate the effectiveness of our approach and shows that the state of the art is achieved in unsupervised person re-ID.
Tasks Person Re-Identification, Style Transfer, Unsupervised Person Re-Identification
Published 2018-11-05
URL http://arxiv.org/abs/1811.02074v1
PDF http://arxiv.org/pdf/1811.02074v1.pdf
PWC https://paperswithcode.com/paper/leveraging-virtual-and-real-person-for
Repo https://github.com/FlyingRoastDuck/PGPPM
Framework pytorch

Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

Title Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Authors Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang
Abstract Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose “Answerer in Questioner’s Mind” (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer’s intention via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialog tasks: “MNIST Counting Dialog” and “GuessWhat?!". In our experiments, AQM outperforms comparative algorithms by a large margin.
Tasks Goal-Oriented Dialog, Visual Dialog
Published 2018-02-12
URL http://arxiv.org/abs/1802.03881v3
PDF http://arxiv.org/pdf/1802.03881v3.pdf
PWC https://paperswithcode.com/paper/answerer-in-questioners-mind-information
Repo https://github.com/naver/aqm-plus
Framework pytorch

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Title Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Authors Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori
Abstract Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, which is one track of the 7th Dialog System Technology Challenges (DSTC7) workshop1, the task is to build a system that generates responses in a dialog about an input video
Tasks Video Description, Visual Dialog
Published 2018-06-01
URL http://arxiv.org/abs/1806.00525v1
PDF http://arxiv.org/pdf/1806.00525v1.pdf
PWC https://paperswithcode.com/paper/audio-visual-scene-aware-dialog-avsd
Repo https://github.com/hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge
Framework pytorch

Artificial Color Constancy via GoogLeNet with Angular Loss Function

Title Artificial Color Constancy via GoogLeNet with Angular Loss Function
Authors Oleksii Sidorov
Abstract Color Constancy is the ability of the human visual system to perceive colors unchanged independently of the illumination. Giving a machine this feature will be beneficial in many fields where chromatic information is used. Particularly, it significantly improves scene understanding and object recognition. In this paper, we propose transfer learning-based algorithm, which has two main features: accuracy higher than many state-of-the-art algorithms and simplicity of implementation. Despite the fact that GoogLeNet was used in the experiments, given approach may be applied to any CNN. Additionally, we discuss design of a new loss function oriented specifically to this problem, and propose a few the most suitable options.
Tasks Color Constancy, Object Recognition, Scene Understanding, Transfer Learning
Published 2018-11-20
URL https://arxiv.org/abs/1811.08456v2
PDF https://arxiv.org/pdf/1811.08456v2.pdf
PWC https://paperswithcode.com/paper/artificial-color-constancy-via-googlenet-with
Repo https://github.com/acecreamu/color-constancy-googlenet
Framework none

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Title Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Authors Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu
Abstract Clone a voice in 5 seconds to generate arbitrary speech in real-time
Tasks Speaker Verification, Speech Synthesis, Text-To-Speech Synthesis, Transfer Learning
Published 2018-06-12
URL http://arxiv.org/abs/1806.04558v4
PDF http://arxiv.org/pdf/1806.04558v4.pdf
PWC https://paperswithcode.com/paper/transfer-learning-from-speaker-verification
Repo https://github.com/CorentinJ/Real-Time-Voice-Cloning
Framework tf

Constrained Exploration and Recovery from Experience Shaping

Title Constrained Exploration and Recovery from Experience Shaping
Authors Tu-Hoa Pham, Giovanni De Magistris, Don Joven Agravante, Subhajit Chaudhury, Asim Munawar, Ryuki Tachibana
Abstract We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding undesirable actions or states, associated to lower rewards, or penalties. The construction and balancing of different reward components can be difficult in the presence of multiple objectives, yet is crucial for producing a satisfying policy. For example, in reaching a target while avoiding obstacles, low collision penalties can lead to reckless movements while high penalties can discourage exploration. To circumvent this limitation, we examine the effect of past actions in terms of safety to estimate which are acceptable or should be avoided in the future. We then actively reshape the action space of the agent during reinforcement learning, so that reward-driven exploration is constrained within safety limits. We propose an algorithm enabling the learning of such safety constraints in parallel with reinforcement learning and demonstrate its effectiveness in terms of both task completion and training time.
Tasks
Published 2018-09-21
URL http://arxiv.org/abs/1809.08925v1
PDF http://arxiv.org/pdf/1809.08925v1.pdf
PWC https://paperswithcode.com/paper/constrained-exploration-and-recovery-from
Repo https://github.com/IBM/constrained-rl
Framework tf

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

Title Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
Authors Daniil Sorokin, Iryna Gurevych
Abstract The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.
Tasks Entity Disambiguation, Entity Linking, Knowledge Base Question Answering, Question Answering
Published 2018-04-23
URL http://arxiv.org/abs/1804.08460v1
PDF http://arxiv.org/pdf/1804.08460v1.pdf
PWC https://paperswithcode.com/paper/mixing-context-granularities-for-improved
Repo https://github.com/UKPLab/starsem2018-entity-linking
Framework pytorch
comments powered by Disqus