Paper Group AWR 247
Joint Monocular 3D Vehicle Detection and Tracking. StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks. Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization. Facial Landmark Detection for Manga Images. Rotation Equivariance and Invariance in Convolutional …
Joint Monocular 3D Vehicle Detection and Tracking
Title | Joint Monocular 3D Vehicle Detection and Tracking |
Authors | Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu |
Abstract | Vehicle 3D extents and trajectories are critical cues for predicting the future location of vehicles and planning future agent ego-motion based on those predictions. In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform. Our method leverages 3D box depth-ordering matching for robust instance association and utilizes 3D trajectory prediction for re-identification of occluded vehicles. We also design a motion learning module based on an LSTM for more accurate long-term motion extrapolation. Our experiments on simulation, KITTI, and Argoverse datasets show that our 3D tracking pipeline offers robust data association and tracking. On Argoverse, our image-based method is significantly better for tracking 3D vehicles within 30 meters than the LiDAR-centric baseline methods. |
Tasks | 3D Object Detection, 3D Pose Estimation, Autonomous Vehicles, Multiple Object Tracking, Object Tracking, Online Multi-Object Tracking, Pose Estimation, Trajectory Prediction |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10742v3 |
https://arxiv.org/pdf/1811.10742v3.pdf | |
PWC | https://paperswithcode.com/paper/joint-monocular-3d-vehicle-detection-and |
Repo | https://github.com/ucbdrive/3d-vehicle-tracking |
Framework | pytorch |
StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
Title | StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks |
Authors | Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo |
Abstract | This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate converted speech signals quickly enough to allow real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic-sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on variational autoencoding GANs. |
Tasks | Voice Conversion |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02169v2 |
http://arxiv.org/pdf/1806.02169v2.pdf | |
PWC | https://paperswithcode.com/paper/stargan-vc-non-parallel-many-to-many-voice |
Repo | https://github.com/bajibabu/CycleGAN-VC |
Framework | pytorch |
Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization
Title | Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization |
Authors | Rishabh Iyer, John T. Halloran, Kai Wei |
Abstract | This paper introduces Jensen, an easily extensible and scalable toolkit for production-level machine learning and convex optimization. Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.). This framework makes it possible to deploy and train models with a few lines of code, and also extend and build upon this by integrating new loss functions and optimization algorithms. |
Tasks | |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06574v1 |
http://arxiv.org/pdf/1807.06574v1.pdf | |
PWC | https://paperswithcode.com/paper/jensen-an-easily-extensible-c-toolkit-for |
Repo | https://github.com/rishabhk108/jensen |
Framework | none |
Facial Landmark Detection for Manga Images
Title | Facial Landmark Detection for Manga Images |
Authors | Marco Stricker, Olivier Augereau, Koichi Kise, Motoi Iwata |
Abstract | The topic of facial landmark detection has been widely covered for pictures of human faces, but it is still a challenge for drawings. Indeed, the proportions and symmetry of standard human faces are not always used for comics or mangas. The personal style of the author, the limitation of colors, etc. makes the landmark detection on faces in drawings a difficult task. Detecting the landmarks on manga images will be useful to provide new services for easily editing the character faces, estimating the character emotions, or generating automatically some animations such as lip or eye movements. This paper contains two main contributions: 1) a new landmark annotation model for manga faces, and 2) a deep learning approach to detect these landmarks. We use the “Deep Alignment Network”, a multi stage architecture where the first stage makes an initial estimation which gets refined in further stages. The first results show that the proposed method succeed to accurately find the landmarks in more than 80% of the cases. |
Tasks | Facial Landmark Detection |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03214v1 |
http://arxiv.org/pdf/1811.03214v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-landmark-detection-for-manga-images |
Repo | https://github.com/oaugereau/FacialLandmarkManga |
Framework | none |
Rotation Equivariance and Invariance in Convolutional Neural Networks
Title | Rotation Equivariance and Invariance in Convolutional Neural Networks |
Authors | Benjamin Chidester, Minh N. Do, Jian Ma |
Abstract | Performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Many image classification tasks, such as those related to cellular imaging, exhibit invariance to rotation. We present a novel scheme using the magnitude response of the 2D-discrete-Fourier transform (2D-DFT) to encode rotational invariance in neural networks, along with a new, efficient convolutional scheme for encoding rotational equivariance throughout convolutional layers. We implemented this scheme for several image classification tasks and demonstrated improved performance, in terms of classification accuracy, time required to train the model, and robustness to hyperparameter selection, over a standard CNN and another state-of-the-art method. |
Tasks | Image Classification |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12301v1 |
http://arxiv.org/pdf/1805.12301v1.pdf | |
PWC | https://paperswithcode.com/paper/rotation-equivariance-and-invariance-in |
Repo | https://github.com/bchidest/RiCNN |
Framework | tf |
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Title | Revisiting RCNN: On Awakening the Classification Power of Faster RCNN |
Authors | Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang |
Abstract | Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization. We conjecture that: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects.We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. DCR samples hard false positives from the base classifier in Faster RCNN and trains a RCNN-styled strong classifier. Experiments show new state-of-the-art results on PASCAL VOC and COCO without any bells and whistles. |
Tasks | Multi-Task Learning |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06799v3 |
http://arxiv.org/pdf/1803.06799v3.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-rcnn-on-awakening-the |
Repo | https://github.com/bowenc0221/Decoupled-Classification-Refinement |
Framework | tf |
Maintaining Natural Image Statistics with the Contextual Loss
Title | Maintaining Natural Image Statistics with the Contextual Loss |
Authors | Roey Mechrez, Itamar Talmi, Firas Shama, Lihi Zelnik-Manor |
Abstract | Maintaining natural image statistics is a crucial factor in restoration and generation of realistic looking images. When training CNNs, photorealism is usually attempted by adversarial training (GAN), that pushes the output images to lie on the manifold of natural images. GANs are very powerful, but not perfect. They are hard to train and the results still often suffer from artifacts. In this paper we propose a complementary approach, that could be applied with or without GAN, whose goal is to train a feed-forward CNN to maintain natural internal statistics. We look explicitly at the distribution of features in an image and train the network to generate images with natural feature distributions. Our approach reduces by orders of magnitude the number of images required for training and achieves state-of-the-art results on both single-image super-resolution, and high-resolution surface normal estimation. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04626v3 |
http://arxiv.org/pdf/1803.04626v3.pdf | |
PWC | https://paperswithcode.com/paper/maintaining-natural-image-statistics-with-the |
Repo | https://github.com/idearibosome/tf-perceptual-eusr |
Framework | tf |
Combating Adversarial Attacks Using Sparse Representations
Title | Combating Adversarial Attacks Using Sparse Representations |
Authors | Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani |
Abstract | It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks (DNNs). In this paper, we make the case that sparse representations of the input data are a crucial tool for combating such attacks. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K / N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to DNNs, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. Experimental results for the MNIST dataset show the efficacy of the proposed sparsifying front end. |
Tasks | |
Published | 2018-03-11 |
URL | http://arxiv.org/abs/1803.03880v3 |
http://arxiv.org/pdf/1803.03880v3.pdf | |
PWC | https://paperswithcode.com/paper/combating-adversarial-attacks-using-sparse |
Repo | https://github.com/ZhinusMarzi/Adversarial-attack |
Framework | tf |
Leveraging Virtual and Real Person for Unsupervised Person Re-identification
Title | Leveraging Virtual and Real Person for Unsupervised Person Re-identification |
Authors | Fengxiang Yang, Zhun Zhong, Zhiming Luo, Sheng Lian, Shaozi Li |
Abstract | Person re-identification (re-ID) is a challenging problem especially when no labels are available for training. Although recent deep re-ID methods have achieved great improvement, it is still difficult to optimize deep re-ID model without annotations in training data. To address this problem, this study introduces a novel approach for unsupervised person re-ID by leveraging virtual and real data. Our approach includes two components: virtual person generation and training of deep re-ID model. For virtual person generation, we learn a person generation model and a camera style transfer model using unlabeled real data to generate virtual persons with different poses and camera styles. The virtual data is formed as labeled training data, enabling subsequently training deep re-ID model in supervision. For training of deep re-ID model, we divide it into three steps: 1) pre-training a coarse re-ID model by using virtual data; 2) collaborative filtering based positive pair mining from the real data; and 3) fine-tuning of the coarse re-ID model by leveraging the mined positive pairs and virtual data. The final re-ID model is achieved by iterating between step 2 and step 3 until convergence. Experimental results on two large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate the effectiveness of our approach and shows that the state of the art is achieved in unsupervised person re-ID. |
Tasks | Person Re-Identification, Style Transfer, Unsupervised Person Re-Identification |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.02074v1 |
http://arxiv.org/pdf/1811.02074v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-virtual-and-real-person-for |
Repo | https://github.com/FlyingRoastDuck/PGPPM |
Framework | pytorch |
Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Title | Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog |
Authors | Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang |
Abstract | Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose “Answerer in Questioner’s Mind” (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer’s intention via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialog tasks: “MNIST Counting Dialog” and “GuessWhat?!". In our experiments, AQM outperforms comparative algorithms by a large margin. |
Tasks | Goal-Oriented Dialog, Visual Dialog |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03881v3 |
http://arxiv.org/pdf/1802.03881v3.pdf | |
PWC | https://paperswithcode.com/paper/answerer-in-questioners-mind-information |
Repo | https://github.com/naver/aqm-plus |
Framework | pytorch |
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Title | Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 |
Authors | Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, Chiori Hori |
Abstract | Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, which is one track of the 7th Dialog System Technology Challenges (DSTC7) workshop1, the task is to build a system that generates responses in a dialog about an input video |
Tasks | Video Description, Visual Dialog |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00525v1 |
http://arxiv.org/pdf/1806.00525v1.pdf | |
PWC | https://paperswithcode.com/paper/audio-visual-scene-aware-dialog-avsd |
Repo | https://github.com/hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge |
Framework | pytorch |
Artificial Color Constancy via GoogLeNet with Angular Loss Function
Title | Artificial Color Constancy via GoogLeNet with Angular Loss Function |
Authors | Oleksii Sidorov |
Abstract | Color Constancy is the ability of the human visual system to perceive colors unchanged independently of the illumination. Giving a machine this feature will be beneficial in many fields where chromatic information is used. Particularly, it significantly improves scene understanding and object recognition. In this paper, we propose transfer learning-based algorithm, which has two main features: accuracy higher than many state-of-the-art algorithms and simplicity of implementation. Despite the fact that GoogLeNet was used in the experiments, given approach may be applied to any CNN. Additionally, we discuss design of a new loss function oriented specifically to this problem, and propose a few the most suitable options. |
Tasks | Color Constancy, Object Recognition, Scene Understanding, Transfer Learning |
Published | 2018-11-20 |
URL | https://arxiv.org/abs/1811.08456v2 |
https://arxiv.org/pdf/1811.08456v2.pdf | |
PWC | https://paperswithcode.com/paper/artificial-color-constancy-via-googlenet-with |
Repo | https://github.com/acecreamu/color-constancy-googlenet |
Framework | none |
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Title | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis |
Authors | Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu |
Abstract | Clone a voice in 5 seconds to generate arbitrary speech in real-time |
Tasks | Speaker Verification, Speech Synthesis, Text-To-Speech Synthesis, Transfer Learning |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04558v4 |
http://arxiv.org/pdf/1806.04558v4.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-from-speaker-verification |
Repo | https://github.com/CorentinJ/Real-Time-Voice-Cloning |
Framework | tf |
Constrained Exploration and Recovery from Experience Shaping
Title | Constrained Exploration and Recovery from Experience Shaping |
Authors | Tu-Hoa Pham, Giovanni De Magistris, Don Joven Agravante, Subhajit Chaudhury, Asim Munawar, Ryuki Tachibana |
Abstract | We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding undesirable actions or states, associated to lower rewards, or penalties. The construction and balancing of different reward components can be difficult in the presence of multiple objectives, yet is crucial for producing a satisfying policy. For example, in reaching a target while avoiding obstacles, low collision penalties can lead to reckless movements while high penalties can discourage exploration. To circumvent this limitation, we examine the effect of past actions in terms of safety to estimate which are acceptable or should be avoided in the future. We then actively reshape the action space of the agent during reinforcement learning, so that reward-driven exploration is constrained within safety limits. We propose an algorithm enabling the learning of such safety constraints in parallel with reinforcement learning and demonstrate its effectiveness in terms of both task completion and training time. |
Tasks | |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.08925v1 |
http://arxiv.org/pdf/1809.08925v1.pdf | |
PWC | https://paperswithcode.com/paper/constrained-exploration-and-recovery-from |
Repo | https://github.com/IBM/constrained-rl |
Framework | tf |
Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
Title | Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories |
Authors | Daniil Sorokin, Iryna Gurevych |
Abstract | The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories. |
Tasks | Entity Disambiguation, Entity Linking, Knowledge Base Question Answering, Question Answering |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08460v1 |
http://arxiv.org/pdf/1804.08460v1.pdf | |
PWC | https://paperswithcode.com/paper/mixing-context-granularities-for-improved |
Repo | https://github.com/UKPLab/starsem2018-entity-linking |
Framework | pytorch |