October 20, 2019

3183 words 15 mins read

Paper Group ANR 41

Unbounded Output Networks for Classification. Fast Global Convergence via Landscape of Empirical Loss. Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry. Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI. PronouncUR: An Urdu Pronunciation Lexicon Generator. Safe Exploration in Markov D …

Unbounded Output Networks for Classification


Title	Unbounded Output Networks for Classification
Authors	Stefan Elfwing, Eiji Uchibe, Kenji Doya
Abstract	We proposed the expected energy-based restricted Boltzmann machine (EE-RBM) as a discriminative RBM method for classification. Two characteristics of the EE-RBM are that the output is unbounded and that the target value of correct classification is set to a value much greater than one. In this study, by adopting features of the EE-RBM approach to feed-forward neural networks, we propose the UnBounded output network (UBnet) which is characterized by three features: (1) unbounded output units; (2) the target value of correct classification is set to a value much greater than one; and (3) the models are trained by a modified mean-squared error objective. We evaluate our approach using the MNIST, CIFAR-10, and CIFAR-100 benchmark datasets. We first demonstrate, for shallow UBnets on MNIST, that a setting of the target value equal to the number of hidden units significantly outperforms a setting of the target value equal to one, and it also outperforms standard neural networks by about 25%. We then validate our approach by achieving high-level classification performance on the three datasets using unbounded output residual networks. We finally use MNIST to analyze the learned features and weights, and we demonstrate that UBnets are much more robust against adversarial examples than the standard approach of using a softmax output layer and training the networks by a cross-entropy objective.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09443v1
PDF	http://arxiv.org/pdf/1807.09443v1.pdf
PWC	https://paperswithcode.com/paper/unbounded-output-networks-for-classification
Repo
Framework

Fast Global Convergence via Landscape of Empirical Loss


Title	Fast Global Convergence via Landscape of Empirical Loss
Authors	Chao Qu, Yan Li, Huan Xu
Abstract	While optimizing convex objective (loss) functions has been a powerhouse for machine learning for at least two decades, non-convex loss functions have attracted fast growing interests recently, due to many desirable properties such as superior robustness and classification accuracy, compared with their convex counterparts. The main obstacle for non-convex estimators is that it is in general intractable to find the optimal solution. In this paper, we study the computational issues for some non-convex M-estimators. In particular, we show that the stochastic variance reduction methods converge to the global optimal with linear rate, by exploiting the statistical property of the population loss. En route, we improve the convergence analysis for the batch gradient method in \cite{mei2016landscape}.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04617v1
PDF	http://arxiv.org/pdf/1802.04617v1.pdf
PWC	https://paperswithcode.com/paper/fast-global-convergence-via-landscape-of
Repo
Framework

Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry


Title	Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry
Authors	Yimin Lin, Zhaoxiang Liu, Jianfeng Huang, Chaopeng Wang, Guoguang Du, Jinqiang Bai, Shiguo Lian, Bill Huang
Abstract	Although a wide variety of deep neural networks for robust Visual Odometry (VO) can be found in the literature, they are still unable to solve the drift problem in long-term robot navigation. Thus, this paper aims to propose novel deep end-to-end networks for long-term 6-DoF VO task. It mainly fuses relative and global networks based on Recurrent Convolutional Neural Networks (RCNNs) to improve the monocular localization accuracy. Indeed, the relative sub-networks are implemented to smooth the VO trajectory, while global subnetworks are designed to avoid drift problem. All the parameters are jointly optimized using Cross Transformation Constraints (CTC), which represents temporal geometric consistency of the consecutive frames, and Mean Square Error (MSE) between the predicted pose and ground truth. The experimental results on both indoor and outdoor datasets show that our method outperforms other state-of-the-art learning-based VO methods in terms of pose accuracy.
Tasks	Autonomous Navigation, Pose Estimation, Robot Navigation, Visual Localization, Visual Odometry
Published	2018-12-19
URL	https://arxiv.org/abs/1812.07869v2
PDF	https://arxiv.org/pdf/1812.07869v2.pdf
PWC	https://paperswithcode.com/paper/deep-global-relative-networks-for-end-to-end
Repo
Framework

Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI


Title	Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI
Authors	Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro
Abstract	We propose a new method for breast cancer screening from DCE-MRI based on a post-hoc approach that is trained using weakly annotated data (i.e., labels are available only at the image level without any lesion delineation). Our proposed post-hoc method automatically diagnosis the whole volume and, for positive cases, it localizes the malignant lesions that led to such diagnosis. Conversely, traditional approaches follow a pre-hoc approach that initially localises suspicious areas that are subsequently classified to establish the breast malignancy – this approach is trained using strongly annotated data (i.e., it needs a delineation and classification of all lesions in an image). Another goal of this paper is to establish the advantages and disadvantages of both approaches when applied to breast screening from DCE-MRI. Relying on experiments on a breast DCE-MRI dataset that contains scans of 117 patients, our results show that the post-hoc method is more accurate for diagnosing the whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method achieves an AUC of 0.81. However, the performance for localising the malignant lesions remains challenging for the post-hoc method due to the weakly labelled dataset employed during training.
Tasks
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09404v2
PDF	http://arxiv.org/pdf/1809.09404v2.pdf
PWC	https://paperswithcode.com/paper/pre-and-post-hoc-diagnosis-and-interpretation
Repo
Framework

PronouncUR: An Urdu Pronunciation Lexicon Generator


Title	PronouncUR: An Urdu Pronunciation Lexicon Generator
Authors	Haris Bin Zia, Agha Ali Raza, Awais Athar
Abstract	State-of-the-art speech recognition systems rely heavily on three basic components: an acoustic model, a pronunciation lexicon and a language model. To build these components, a researcher needs linguistic as well as technical expertise, which is a barrier in low-resource domains. Techniques to construct these three components without having expert domain knowledge are in great demand. Urdu, despite having millions of speakers all over the world, is a low-resource language in terms of standard publically available linguistic resources. In this paper, we present a grapheme-to-phoneme conversion tool for Urdu that generates a pronunciation lexicon in a form suitable for use with speech recognition systems from a list of Urdu words. The tool predicts the pronunciation of words using a LSTM-based model trained on a handcrafted expert lexicon of around 39,000 words and shows an accuracy of 64% upon internal evaluation. For external evaluation on a speech recognition task, we obtain a word error rate comparable to one achieved using a fully handcrafted expert lexicon.
Tasks	Language Modelling, Speech Recognition
Published	2018-01-01
URL	http://arxiv.org/abs/1801.00409v2
PDF	http://arxiv.org/pdf/1801.00409v2.pdf
PWC	https://paperswithcode.com/paper/pronouncur-an-urdu-pronunciation-lexicon
Repo
Framework

Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process


Title	Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process
Authors	Akifumi Wachi, Hiroshi Kajino, Asim Munawar
Abstract	In many real-world applications (e.g., planetary exploration, robot navigation), an autonomous agent must be able to explore a space with guaranteed safety. Most safe exploration algorithms in the field of reinforcement learning and robotics have been based on the assumption that the safety features are a priori known and time-invariant. This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant. In this setting, the agent explores MDPs while constraining the probability of entering unsafe states defined by a safety function being below a threshold. The unknown and time-variant safety values are modeled using a spatio-temporal Gaussian process. However, there remains an issue that an agent may have no viable action in a shrinking true safe space. To address this issue, we formulate a problem maximizing the cumulative number of safe states in the worst case scenario with respect to future observations. The effectiveness of this approach was demonstrated in two simulation settings, including one using real lunar terrain data.
Tasks	Robot Navigation, Safe Exploration
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04232v1
PDF	http://arxiv.org/pdf/1809.04232v1.pdf
PWC	https://paperswithcode.com/paper/safe-exploration-in-markov-decision-processes
Repo
Framework

Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems


Title	Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems
Authors	Miguel de Prado, Nuria Pazos, Luca Benini
Abstract	Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks’ accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs’ inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time “to-solution” is much better than with Random Search and achieves up to 15x better results for a short-time search.
Tasks
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07315v1
PDF	http://arxiv.org/pdf/1811.07315v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-infer-rl-based-search-for-dnn
Repo
Framework

Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation


Title	Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
Authors	Trung-Nghia Le, Akihiro Sugimoto
Abstract	Focusing on only semantic instances that only salient in a scene gains more benefits for robot navigation and self-driving cars than looking at all objects in the whole scene. This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, namely, semantic salient instances. We provide the baseline for the new task of video semantic salient instance segmentation (VSSIS), that is, Semantic Instance - Salient Object (SISO) framework. The SISO framework is simple yet efficient, leveraging advantages of two different segmentation tasks, i.e. semantic instance segmentation and salient object segmentation to eventually fuse them for the final result. In SISO, we introduce a sequential fusion by looking at overlapping pixels between semantic instances and salient regions to have non-overlapping instances one by one. We also introduce a recurrent instance propagation to refine the shapes and semantic meanings of instances, and an identity tracking to maintain both the identity and the semantic meaning of instances over the entire video. Experimental results demonstrated the effectiveness of our SISO baseline, which can handle occlusions in videos. In addition, to tackle the task of VSSIS, we augment the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels, obtaining SEmantic Salient Instance Video (SESIV) dataset. Our SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels.
Tasks	Instance Segmentation, Robot Navigation, Self-Driving Cars, Semantic Segmentation
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01452v3
PDF	http://arxiv.org/pdf/1807.01452v3.pdf
PWC	https://paperswithcode.com/paper/semantic-instance-meets-salient-object-study
Repo
Framework

Monitoring Targeted Hate in Online Environments


Title	Monitoring Targeted Hate in Online Environments
Authors	Tim Isbister, Magnus Sahlgren, Lisa Kaati, Milan Obaidi, Nazar Akrami
Abstract	Hateful comments, swearwords and sometimes even death threats are becoming a reality for many people today in online environments. This is especially true for journalists, politicians, artists, and other public figures. This paper describes how hate directed towards individuals can be measured in online environments using a simple dictionary-based approach. We present a case study on Swedish politicians, and use examples from this study to discuss shortcomings of the proposed dictionary-based approach. We also outline possibilities for potential refinements of the proposed approach.
Tasks
Published	2018-03-13
URL	http://arxiv.org/abs/1803.04757v1
PDF	http://arxiv.org/pdf/1803.04757v1.pdf
PWC	https://paperswithcode.com/paper/monitoring-targeted-hate-in-online
Repo
Framework

Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval


Title	Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval
Authors	Xuefei Zhe, Shifeng Chen, Hong Yan
Abstract	Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate metric for $L2$-normalized embedding space, i.e., a hyper-sphere. Another challenge of current DDML methods is that their loss functions are usually based on rigid data formats, such as the triplet tuple. Thus, an extra process is needed to prepare data in specific formats. In addition, their losses are obtained from a limited number of samples, which leads to a lack of the global view of the embedding space. In this paper, we replace the Euclidean distance with the cosine similarity to better utilize the $L2$-normalization, which is able to attenuate the curse of dimensionality. More specifically, a novel loss function based on the von Mises-Fisher distribution is proposed to learn a compact hyper-spherical embedding space. Moreover, a new efficient learning algorithm is developed to better capture the global structure of the embedding space. Experiments for both classification and retrieval tasks on several standard datasets show that our method achieves state-of-the-art performance with a simpler training procedure. Furthermore, we demonstrate that, even with a small number of convolutional layers, our model can still obtain significantly better classification performance than the widely used softmax loss.
Tasks	Image Classification, Metric Learning
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09662v2
PDF	http://arxiv.org/pdf/1802.09662v2.pdf
PWC	https://paperswithcode.com/paper/directional-statistics-based-deep-metric
Repo
Framework

HEVC Inter Coding Using Deep Recurrent Neural Networks and Artificial Reference Pictures


Title	HEVC Inter Coding Using Deep Recurrent Neural Networks and Artificial Reference Pictures
Authors	Felix Haub, Thorsten Laude, Jörn Ostermann
Abstract	The efficiency of motion compensated prediction in modern video codecs highly depends on the available reference pictures. Occlusions and non-linear motion pose challenges for the motion compensation and often result in high bit rates for the prediction error. We propose the generation of artificial reference pictures using deep recurrent neural networks. Conceptually, a reference picture at the time instance of the currently coded picture is generated from previously reconstructed conventional reference pictures. Based on these artificial reference pictures, we propose a complete coding pipeline based on HEVC. By using the artificial reference pictures for motion compensated prediction, average BD-rate gains of 1.5% over HEVC are achieved.
Tasks	Motion Compensation
Published	2018-12-05
URL	http://arxiv.org/abs/1812.02137v1
PDF	http://arxiv.org/pdf/1812.02137v1.pdf
PWC	https://paperswithcode.com/paper/hevc-inter-coding-using-deep-recurrent-neural
Repo
Framework

IterGANs: Iterative GANs to Learn and Control 3D Object Transformation


Title	IterGANs: Iterative GANs to Learn and Control 3D Object Transformation
Authors	Ysbrand Galama, Thomas Mensink
Abstract	We are interested in learning visual representations which allow for 3D manipulations of visual objects based on a single 2D image. We cast this into an image-to-image transformation task, and propose Iterative Generative Adversarial Networks (IterGANs) which iteratively transform an input image into an output image. Our models learn a visual representation that can be used for objects seen in training, but also for never seen objects. Since object manipulation requires a full understanding of the geometry and appearance of the object, our IterGANs learn an implicit 3D model and a full appearance model of the object, which are both inferred from a single (test) image. Two advantages of IterGANs are that the intermediate generated images can be used for an additional supervision signal, even in an unsupervised fashion, and that the number of iterations can be used as a control signal to steer the transformation. Experiments on rotated objects and scenes show how IterGANs help with the generation process.
Tasks
Published	2018-04-16
URL	https://arxiv.org/abs/1804.05651v2
PDF	https://arxiv.org/pdf/1804.05651v2.pdf
PWC	https://paperswithcode.com/paper/itergans-iterative-gans-to-learn-and-control
Repo
Framework

MoSculp: Interactive Visualization of Shape and Time


Title	MoSculp: Interactive Visualization of Shape and Time
Authors	Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman
Abstract	We present a system that allows users to visualize complex human motion via 3D motion sculptures—a representation that conveys the 3D structure swept by a human body as it moves through space. Given an input video, our system computes the motion sculptures and provides a user interface for rendering it in different styles, including the options to insert the sculpture back into the original video, render it in a synthetic scene or physically print it. To provide this end-to-end workflow, we introduce an algorithm that estimates that human’s 3D geometry over time from a set of 2D images and develop a 3D-aware image-based rendering approach that embeds the sculpture back into the scene. By automating the process, our system takes motion sculpture creation out of the realm of professional artists, and makes it applicable to a wide range of existing video material. By providing viewers with 3D information, motion sculptures reveal space-time motion information that is difficult to perceive with the naked eye, and allow viewers to interpret how different parts of the object interact over time. We validate the effectiveness of this approach with user studies, finding that our motion sculpture visualizations are significantly more informative about motion than existing stroboscopic and space-time visualization methods.
Tasks
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05491v2
PDF	http://arxiv.org/pdf/1809.05491v2.pdf
PWC	https://paperswithcode.com/paper/mosculp-interactive-visualization-of-shape
Repo
Framework

Tied Multitask Learning for Neural Speech Translation


Title	Tied Multitask Learning for Neural Speech Translation
Authors	Antonios Anastasopoulos, David Chiang
Abstract	We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task, since higher-level intermediate representations should provide useful information. Second, we apply regularization that encourages transitivity and invertibility. We show that the application of these notions on jointly trained models improves performance on the tasks of low-resource speech transcription and translation. It also leads to better performance when using attention information for word discovery over unsegmented input.
Tasks
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06655v2
PDF	http://arxiv.org/pdf/1802.06655v2.pdf
PWC	https://paperswithcode.com/paper/tied-multitask-learning-for-neural-speech
Repo
Framework

Recurrent Multi-Graph Neural Networks for Travel Cost Prediction


Title	Recurrent Multi-Graph Neural Networks for Travel Cost Prediction
Authors	Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen, Lu Chen
Abstract	Origin-destination (OD) matrices are often used in urban planning, where a city is partitioned into regions and an element (i, j) in an OD matrix records the cost (e.g., travel time, fuel consumption, or travel speed) from region i to region j. In this paper, we partition a day into multiple intervals, e.g., 96 15-min intervals and each interval is associated with an OD matrix which represents the costs in the interval; and we consider sparse and stochastic OD matrices, where the elements represent stochastic but not deterministic costs and some elements are missing due to lack of data between two regions. We solve the sparse, stochastic OD matrix forecasting problem. Given a sequence of historical OD matrices that are sparse, we aim at predicting future OD matrices with no empty elements. We propose a generic learning framework to solve the problem by dealing with sparse matrices via matrix factorization and two graph convolutional neural networks and capturing temporal dynamics via recurrent neural network. Empirical studies using two taxi datasets from different countries verify the effectiveness of the proposed framework.
Tasks
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05157v1
PDF	http://arxiv.org/pdf/1811.05157v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-multi-graph-neural-networks-for
Repo
Framework