October 20, 2019

2687 words 13 mins read

Paper Group AWR 247

Joint Monocular 3D Vehicle Detection and Tracking. StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks. Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization. Facial Landmark Detection for Manga Images. Rotation Equivariance and Invariance in Convolutional …

Joint Monocular 3D Vehicle Detection and Tracking


Title	Joint Monocular 3D Vehicle Detection and Tracking
Authors	Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
Abstract	Vehicle 3D extents and trajectories are critical cues for predicting the future location of vehicles and planning future agent ego-motion based on those predictions. In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform. Our method leverages 3D box depth-ordering matching for robust instance association and utilizes 3D trajectory prediction for re-identification of occluded vehicles. We also design a motion learning module based on an LSTM for more accurate long-term motion extrapolation. Our experiments on simulation, KITTI, and Argoverse datasets show that our 3D tracking pipeline offers robust data association and tracking. On Argoverse, our image-based method is significantly better for tracking 3D vehicles within 30 meters than the LiDAR-centric baseline methods.
Tasks	3D Object Detection, 3D Pose Estimation, Autonomous Vehicles, Multiple Object Tracking, Object Tracking, Online Multi-Object Tracking, Pose Estimation, Trajectory Prediction
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10742v3
PDF	https://arxiv.org/pdf/1811.10742v3.pdf
PWC	https://paperswithcode.com/paper/joint-monocular-3d-vehicle-detection-and
Repo	https://github.com/ucbdrive/3d-vehicle-tracking
Framework	pytorch

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks


Title	StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
Authors	Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
Abstract	This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate converted speech signals quickly enough to allow real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic-sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on variational autoencoding GANs.
Tasks	Voice Conversion
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02169v2
PDF	http://arxiv.org/pdf/1806.02169v2.pdf
PWC	https://paperswithcode.com/paper/stargan-vc-non-parallel-many-to-many-voice
Repo	https://github.com/bajibabu/CycleGAN-VC
Framework	pytorch

Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization


Title	Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization
Authors	Rishabh Iyer, John T. Halloran, Kai Wei
Abstract	This paper introduces Jensen, an easily extensible and scalable toolkit for production-level machine learning and convex optimization. Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.). This framework makes it possible to deploy and train models with a few lines of code, and also extend and build upon this by integrating new loss functions and optimization algorithms.
Tasks
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06574v1
PDF	http://arxiv.org/pdf/1807.06574v1.pdf
PWC	https://paperswithcode.com/paper/jensen-an-easily-extensible-c-toolkit-for
Repo	https://github.com/rishabhk108/jensen
Framework	none

Facial Landmark Detection for Manga Images


Title	Facial Landmark Detection for Manga Images
Authors	Marco Stricker, Olivier Augereau, Koichi Kise, Motoi Iwata
Abstract	The topic of facial landmark detection has been widely covered for pictures of human faces, but it is still a challenge for drawings. Indeed, the proportions and symmetry of standard human faces are not always used for comics or mangas. The personal style of the author, the limitation of colors, etc. makes the landmark detection on faces in drawings a difficult task. Detecting the landmarks on manga images will be useful to provide new services for easily editing the character faces, estimating the character emotions, or generating automatically some animations such as lip or eye movements. This paper contains two main contributions: 1) a new landmark annotation model for manga faces, and 2) a deep learning approach to detect these landmarks. We use the “Deep Alignment Network”, a multi stage architecture where the first stage makes an initial estimation which gets refined in further stages. The first results show that the proposed method succeed to accurately find the landmarks in more than 80% of the cases.
Tasks	Facial Landmark Detection
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03214v1
PDF	http://arxiv.org/pdf/1811.03214v1.pdf
PWC	https://paperswithcode.com/paper/facial-landmark-detection-for-manga-images
Repo	https://github.com/oaugereau/FacialLandmarkManga
Framework	none

Rotation Equivariance and Invariance in Convolutional Neural Networks


Title	Rotation Equivariance and Invariance in Convolutional Neural Networks
Authors	Benjamin Chidester, Minh N. Do, Jian Ma
Abstract	Performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Many image classification tasks, such as those related to cellular imaging, exhibit invariance to rotation. We present a novel scheme using the magnitude response of the 2D-discrete-Fourier transform (2D-DFT) to encode rotational invariance in neural networks, along with a new, efficient convolutional scheme for encoding rotational equivariance throughout convolutional layers. We implemented this scheme for several image classification tasks and demonstrated improved performance, in terms of classification accuracy, time required to train the model, and robustness to hyperparameter selection, over a standard CNN and another state-of-the-art method.
Tasks	Image Classification
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12301v1
PDF	http://arxiv.org/pdf/1805.12301v1.pdf
PWC	https://paperswithcode.com/paper/rotation-equivariance-and-invariance-in
Repo	https://github.com/bchidest/RiCNN
Framework	tf

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN


Title	Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Authors	Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang
Abstract	Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization. We conjecture that: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects.We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. DCR samples hard false positives from the base classifier in Faster RCNN and trains a RCNN-styled strong classifier. Experiments show new state-of-the-art results on PASCAL VOC and COCO without any bells and whistles.
Tasks	Multi-Task Learning
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06799v3
PDF	http://arxiv.org/pdf/1803.06799v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-rcnn-on-awakening-the
Repo	https://github.com/bowenc0221/Decoupled-Classification-Refinement
Framework	tf

Maintaining Natural Image Statistics with the Contextual Loss


Title	Maintaining Natural Image Statistics with the Contextual Loss
Authors	Roey Mechrez, Itamar Talmi, Firas Shama, Lihi Zelnik-Manor
Abstract	Maintaining natural image statistics is a crucial factor in restoration and generation of realistic looking images. When training CNNs, photorealism is usually attempted by adversarial training (GAN), that pushes the output images to lie on the manifold of natural images. GANs are very powerful, but not perfect. They are hard to train and the results still often suffer from artifacts. In this paper we propose a complementary approach, that could be applied with or without GAN, whose goal is to train a feed-forward CNN to maintain natural internal statistics. We look explicitly at the distribution of features in an image and train the network to generate images with natural feature distributions. Our approach reduces by orders of magnitude the number of images required for training and achieves state-of-the-art results on both single-image super-resolution, and high-resolution surface normal estimation.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-03-13
URL	http://arxiv.org/abs/1803.04626v3
PDF	http://arxiv.org/pdf/1803.04626v3.pdf
PWC	https://paperswithcode.com/paper/maintaining-natural-image-statistics-with-the
Repo	https://github.com/idearibosome/tf-perceptual-eusr
Framework	tf

Combating Adversarial Attacks Using Sparse Representations


Title	Combating Adversarial Attacks Using Sparse Representations
Authors	Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani
Abstract	It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks (DNNs). In this paper, we make the case that sparse representations of the input data are a crucial tool for combating such attacks. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K / N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to DNNs, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. Experimental results for the MNIST dataset show the efficacy of the proposed sparsifying front end.
Tasks
Published	2018-03-11
URL	http://arxiv.org/abs/1803.03880v3
PDF	http://arxiv.org/pdf/1803.03880v3.pdf
PWC	https://paperswithcode.com/paper/combating-adversarial-attacks-using-sparse
Repo	https://github.com/ZhinusMarzi/Adversarial-attack
Framework	tf

Leveraging Virtual and Real Person for Unsupervised Person Re-identification


Title	Leveraging Virtual and Real Person for Unsupervised Person Re-identification
Authors	Fengxiang Yang, Zhun Zhong, Zhiming Luo, Sheng Lian, Shaozi Li
Abstract	Person re-identification (re-ID) is a challenging problem especially when no labels are available for training. Although recent deep re-ID methods have achieved great improvement, it is still difficult to optimize deep re-ID model without annotations in training data. To address this problem, this study introduces a novel approach for unsupervised person re-ID by leveraging virtual and real data. Our approach includes two components: virtual person generation and training of deep re-ID model. For virtual person generation, we learn a person generation model and a camera style transfer model using unlabeled real data to generate virtual persons with different poses and camera styles. The virtual data is formed as labeled training data, enabling subsequently training deep re-ID model in supervision. For training of deep re-ID model, we divide it into three steps: 1) pre-training a coarse re-ID model by using virtual data; 2) collaborative filtering based positive pair mining from the real data; and 3) fine-tuning of the coarse re-ID model by leveraging the mined positive pairs and virtual data. The final re-ID model is achieved by iterating between step 2 and step 3 until convergence. Experimental results on two large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate the effectiveness of our approach and shows that the state of the art is achieved in unsupervised person re-ID.
Tasks	Person Re-Identification, Style Transfer, Unsupervised Person Re-Identification
Published	2018-11-05
URL	http://arxiv.org/abs/1811.02074v1
PDF	http://arxiv.org/pdf/1811.02074v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-virtual-and-real-person-for
Repo	https://github.com/FlyingRoastDuck/PGPPM
Framework	pytorch

Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog


Title	Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Authors	Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang
Abstract	Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose “Answerer in Questioner’s Mind” (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer’s intention via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialog tasks: “MNIST Counting Dialog” and “GuessWhat?!". In our experiments, AQM outperforms comparative algorithms by a large margin.
Tasks	Goal-Oriented Dialog, Visual Dialog
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03881v3
PDF	http://arxiv.org/pdf/1802.03881v3.pdf
PWC	https://paperswithcode.com/paper/answerer-in-questioners-mind-information
Repo	https://github.com/naver/aqm-plus
Framework	pytorch

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7


Title	Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Authors	Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori
Abstract	Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, which is one track of the 7th Dialog System Technology Challenges (DSTC7) workshop1, the task is to build a system that generates responses in a dialog about an input video
Tasks	Video Description, Visual Dialog
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00525v1
PDF	http://arxiv.org/pdf/1806.00525v1.pdf
PWC	https://paperswithcode.com/paper/audio-visual-scene-aware-dialog-avsd
Repo	https://github.com/hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge
Framework	pytorch

Artificial Color Constancy via GoogLeNet with Angular Loss Function


Title	Artificial Color Constancy via GoogLeNet with Angular Loss Function
Authors	Oleksii Sidorov
Abstract	Color Constancy is the ability of the human visual system to perceive colors unchanged independently of the illumination. Giving a machine this feature will be beneficial in many fields where chromatic information is used. Particularly, it significantly improves scene understanding and object recognition. In this paper, we propose transfer learning-based algorithm, which has two main features: accuracy higher than many state-of-the-art algorithms and simplicity of implementation. Despite the fact that GoogLeNet was used in the experiments, given approach may be applied to any CNN. Additionally, we discuss design of a new loss function oriented specifically to this problem, and propose a few the most suitable options.
Tasks	Color Constancy, Object Recognition, Scene Understanding, Transfer Learning
Published	2018-11-20
URL	https://arxiv.org/abs/1811.08456v2
PDF	https://arxiv.org/pdf/1811.08456v2.pdf
PWC	https://paperswithcode.com/paper/artificial-color-constancy-via-googlenet-with
Repo	https://github.com/acecreamu/color-constancy-googlenet
Framework	none

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis


Title	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Authors	Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu
Abstract	Clone a voice in 5 seconds to generate arbitrary speech in real-time
Tasks	Speaker Verification, Speech Synthesis, Text-To-Speech Synthesis, Transfer Learning
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04558v4
PDF	http://arxiv.org/pdf/1806.04558v4.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-from-speaker-verification
Repo	https://github.com/CorentinJ/Real-Time-Voice-Cloning
Framework	tf

Constrained Exploration and Recovery from Experience Shaping


Title	Constrained Exploration and Recovery from Experience Shaping
Authors	Tu-Hoa Pham, Giovanni De Magistris, Don Joven Agravante, Subhajit Chaudhury, Asim Munawar, Ryuki Tachibana
Abstract	We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding undesirable actions or states, associated to lower rewards, or penalties. The construction and balancing of different reward components can be difficult in the presence of multiple objectives, yet is crucial for producing a satisfying policy. For example, in reaching a target while avoiding obstacles, low collision penalties can lead to reckless movements while high penalties can discourage exploration. To circumvent this limitation, we examine the effect of past actions in terms of safety to estimate which are acceptable or should be avoided in the future. We then actively reshape the action space of the agent during reinforcement learning, so that reward-driven exploration is constrained within safety limits. We propose an algorithm enabling the learning of such safety constraints in parallel with reinforcement learning and demonstrate its effectiveness in terms of both task completion and training time.
Tasks
Published	2018-09-21
URL	http://arxiv.org/abs/1809.08925v1
PDF	http://arxiv.org/pdf/1809.08925v1.pdf
PWC	https://paperswithcode.com/paper/constrained-exploration-and-recovery-from
Repo	https://github.com/IBM/constrained-rl
Framework	tf

Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories


Title	Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
Authors	Daniil Sorokin, Iryna Gurevych
Abstract	The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.
Tasks	Entity Disambiguation, Entity Linking, Knowledge Base Question Answering, Question Answering
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08460v1
PDF	http://arxiv.org/pdf/1804.08460v1.pdf
PWC	https://paperswithcode.com/paper/mixing-context-granularities-for-improved
Repo	https://github.com/UKPLab/starsem2018-entity-linking
Framework	pytorch