Paper Group ANR 41
Unbounded Output Networks for Classification. Fast Global Convergence via Landscape of Empirical Loss. Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry. Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI. PronouncUR: An Urdu Pronunciation Lexicon Generator. Safe Exploration in Markov D …
Unbounded Output Networks for Classification
Title | Unbounded Output Networks for Classification |
Authors | Stefan Elfwing, Eiji Uchibe, Kenji Doya |
Abstract | We proposed the expected energy-based restricted Boltzmann machine (EE-RBM) as a discriminative RBM method for classification. Two characteristics of the EE-RBM are that the output is unbounded and that the target value of correct classification is set to a value much greater than one. In this study, by adopting features of the EE-RBM approach to feed-forward neural networks, we propose the UnBounded output network (UBnet) which is characterized by three features: (1) unbounded output units; (2) the target value of correct classification is set to a value much greater than one; and (3) the models are trained by a modified mean-squared error objective. We evaluate our approach using the MNIST, CIFAR-10, and CIFAR-100 benchmark datasets. We first demonstrate, for shallow UBnets on MNIST, that a setting of the target value equal to the number of hidden units significantly outperforms a setting of the target value equal to one, and it also outperforms standard neural networks by about 25%. We then validate our approach by achieving high-level classification performance on the three datasets using unbounded output residual networks. We finally use MNIST to analyze the learned features and weights, and we demonstrate that UBnets are much more robust against adversarial examples than the standard approach of using a softmax output layer and training the networks by a cross-entropy objective. |
Tasks | |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09443v1 |
http://arxiv.org/pdf/1807.09443v1.pdf | |
PWC | https://paperswithcode.com/paper/unbounded-output-networks-for-classification |
Repo | |
Framework | |
Fast Global Convergence via Landscape of Empirical Loss
Title | Fast Global Convergence via Landscape of Empirical Loss |
Authors | Chao Qu, Yan Li, Huan Xu |
Abstract | While optimizing convex objective (loss) functions has been a powerhouse for machine learning for at least two decades, non-convex loss functions have attracted fast growing interests recently, due to many desirable properties such as superior robustness and classification accuracy, compared with their convex counterparts. The main obstacle for non-convex estimators is that it is in general intractable to find the optimal solution. In this paper, we study the computational issues for some non-convex M-estimators. In particular, we show that the stochastic variance reduction methods converge to the global optimal with linear rate, by exploiting the statistical property of the population loss. En route, we improve the convergence analysis for the batch gradient method in \cite{mei2016landscape}. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04617v1 |
http://arxiv.org/pdf/1802.04617v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-global-convergence-via-landscape-of |
Repo | |
Framework | |
Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry
Title | Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry |
Authors | Yimin Lin, Zhaoxiang Liu, Jianfeng Huang, Chaopeng Wang, Guoguang Du, Jinqiang Bai, Shiguo Lian, Bill Huang |
Abstract | Although a wide variety of deep neural networks for robust Visual Odometry (VO) can be found in the literature, they are still unable to solve the drift problem in long-term robot navigation. Thus, this paper aims to propose novel deep end-to-end networks for long-term 6-DoF VO task. It mainly fuses relative and global networks based on Recurrent Convolutional Neural Networks (RCNNs) to improve the monocular localization accuracy. Indeed, the relative sub-networks are implemented to smooth the VO trajectory, while global subnetworks are designed to avoid drift problem. All the parameters are jointly optimized using Cross Transformation Constraints (CTC), which represents temporal geometric consistency of the consecutive frames, and Mean Square Error (MSE) between the predicted pose and ground truth. The experimental results on both indoor and outdoor datasets show that our method outperforms other state-of-the-art learning-based VO methods in terms of pose accuracy. |
Tasks | Autonomous Navigation, Pose Estimation, Robot Navigation, Visual Localization, Visual Odometry |
Published | 2018-12-19 |
URL | https://arxiv.org/abs/1812.07869v2 |
https://arxiv.org/pdf/1812.07869v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-global-relative-networks-for-end-to-end |
Repo | |
Framework | |
Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI
Title | Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI |
Authors | Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro |
Abstract | We propose a new method for breast cancer screening from DCE-MRI based on a post-hoc approach that is trained using weakly annotated data (i.e., labels are available only at the image level without any lesion delineation). Our proposed post-hoc method automatically diagnosis the whole volume and, for positive cases, it localizes the malignant lesions that led to such diagnosis. Conversely, traditional approaches follow a pre-hoc approach that initially localises suspicious areas that are subsequently classified to establish the breast malignancy – this approach is trained using strongly annotated data (i.e., it needs a delineation and classification of all lesions in an image). Another goal of this paper is to establish the advantages and disadvantages of both approaches when applied to breast screening from DCE-MRI. Relying on experiments on a breast DCE-MRI dataset that contains scans of 117 patients, our results show that the post-hoc method is more accurate for diagnosing the whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method achieves an AUC of 0.81. However, the performance for localising the malignant lesions remains challenging for the post-hoc method due to the weakly labelled dataset employed during training. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09404v2 |
http://arxiv.org/pdf/1809.09404v2.pdf | |
PWC | https://paperswithcode.com/paper/pre-and-post-hoc-diagnosis-and-interpretation |
Repo | |
Framework | |
PronouncUR: An Urdu Pronunciation Lexicon Generator
Title | PronouncUR: An Urdu Pronunciation Lexicon Generator |
Authors | Haris Bin Zia, Agha Ali Raza, Awais Athar |
Abstract | State-of-the-art speech recognition systems rely heavily on three basic components: an acoustic model, a pronunciation lexicon and a language model. To build these components, a researcher needs linguistic as well as technical expertise, which is a barrier in low-resource domains. Techniques to construct these three components without having expert domain knowledge are in great demand. Urdu, despite having millions of speakers all over the world, is a low-resource language in terms of standard publically available linguistic resources. In this paper, we present a grapheme-to-phoneme conversion tool for Urdu that generates a pronunciation lexicon in a form suitable for use with speech recognition systems from a list of Urdu words. The tool predicts the pronunciation of words using a LSTM-based model trained on a handcrafted expert lexicon of around 39,000 words and shows an accuracy of 64% upon internal evaluation. For external evaluation on a speech recognition task, we obtain a word error rate comparable to one achieved using a fully handcrafted expert lexicon. |
Tasks | Language Modelling, Speech Recognition |
Published | 2018-01-01 |
URL | http://arxiv.org/abs/1801.00409v2 |
http://arxiv.org/pdf/1801.00409v2.pdf | |
PWC | https://paperswithcode.com/paper/pronouncur-an-urdu-pronunciation-lexicon |
Repo | |
Framework | |
Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process
Title | Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process |
Authors | Akifumi Wachi, Hiroshi Kajino, Asim Munawar |
Abstract | In many real-world applications (e.g., planetary exploration, robot navigation), an autonomous agent must be able to explore a space with guaranteed safety. Most safe exploration algorithms in the field of reinforcement learning and robotics have been based on the assumption that the safety features are a priori known and time-invariant. This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant. In this setting, the agent explores MDPs while constraining the probability of entering unsafe states defined by a safety function being below a threshold. The unknown and time-variant safety values are modeled using a spatio-temporal Gaussian process. However, there remains an issue that an agent may have no viable action in a shrinking true safe space. To address this issue, we formulate a problem maximizing the cumulative number of safe states in the worst case scenario with respect to future observations. The effectiveness of this approach was demonstrated in two simulation settings, including one using real lunar terrain data. |
Tasks | Robot Navigation, Safe Exploration |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04232v1 |
http://arxiv.org/pdf/1809.04232v1.pdf | |
PWC | https://paperswithcode.com/paper/safe-exploration-in-markov-decision-processes |
Repo | |
Framework | |
Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems
Title | Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems |
Authors | Miguel de Prado, Nuria Pazos, Luca Benini |
Abstract | Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks’ accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs’ inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time “to-solution” is much better than with Random Search and achieves up to 15x better results for a short-time search. |
Tasks | |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07315v1 |
http://arxiv.org/pdf/1811.07315v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-infer-rl-based-search-for-dnn |
Repo | |
Framework | |
Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
Title | Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation |
Authors | Trung-Nghia Le, Akihiro Sugimoto |
Abstract | Focusing on only semantic instances that only salient in a scene gains more benefits for robot navigation and self-driving cars than looking at all objects in the whole scene. This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, namely, semantic salient instances. We provide the baseline for the new task of video semantic salient instance segmentation (VSSIS), that is, Semantic Instance - Salient Object (SISO) framework. The SISO framework is simple yet efficient, leveraging advantages of two different segmentation tasks, i.e. semantic instance segmentation and salient object segmentation to eventually fuse them for the final result. In SISO, we introduce a sequential fusion by looking at overlapping pixels between semantic instances and salient regions to have non-overlapping instances one by one. We also introduce a recurrent instance propagation to refine the shapes and semantic meanings of instances, and an identity tracking to maintain both the identity and the semantic meaning of instances over the entire video. Experimental results demonstrated the effectiveness of our SISO baseline, which can handle occlusions in videos. In addition, to tackle the task of VSSIS, we augment the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels, obtaining SEmantic Salient Instance Video (SESIV) dataset. Our SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels. |
Tasks | Instance Segmentation, Robot Navigation, Self-Driving Cars, Semantic Segmentation |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01452v3 |
http://arxiv.org/pdf/1807.01452v3.pdf | |
PWC | https://paperswithcode.com/paper/semantic-instance-meets-salient-object-study |
Repo | |
Framework | |
Monitoring Targeted Hate in Online Environments
Title | Monitoring Targeted Hate in Online Environments |
Authors | Tim Isbister, Magnus Sahlgren, Lisa Kaati, Milan Obaidi, Nazar Akrami |
Abstract | Hateful comments, swearwords and sometimes even death threats are becoming a reality for many people today in online environments. This is especially true for journalists, politicians, artists, and other public figures. This paper describes how hate directed towards individuals can be measured in online environments using a simple dictionary-based approach. We present a case study on Swedish politicians, and use examples from this study to discuss shortcomings of the proposed dictionary-based approach. We also outline possibilities for potential refinements of the proposed approach. |
Tasks | |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04757v1 |
http://arxiv.org/pdf/1803.04757v1.pdf | |
PWC | https://paperswithcode.com/paper/monitoring-targeted-hate-in-online |
Repo | |
Framework | |
Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval
Title | Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval |
Authors | Xuefei Zhe, Shifeng Chen, Hong Yan |
Abstract | Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate metric for $L2$-normalized embedding space, i.e., a hyper-sphere. Another challenge of current DDML methods is that their loss functions are usually based on rigid data formats, such as the triplet tuple. Thus, an extra process is needed to prepare data in specific formats. In addition, their losses are obtained from a limited number of samples, which leads to a lack of the global view of the embedding space. In this paper, we replace the Euclidean distance with the cosine similarity to better utilize the $L2$-normalization, which is able to attenuate the curse of dimensionality. More specifically, a novel loss function based on the von Mises-Fisher distribution is proposed to learn a compact hyper-spherical embedding space. Moreover, a new efficient learning algorithm is developed to better capture the global structure of the embedding space. Experiments for both classification and retrieval tasks on several standard datasets show that our method achieves state-of-the-art performance with a simpler training procedure. Furthermore, we demonstrate that, even with a small number of convolutional layers, our model can still obtain significantly better classification performance than the widely used softmax loss. |
Tasks | Image Classification, Metric Learning |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09662v2 |
http://arxiv.org/pdf/1802.09662v2.pdf | |
PWC | https://paperswithcode.com/paper/directional-statistics-based-deep-metric |
Repo | |
Framework | |
HEVC Inter Coding Using Deep Recurrent Neural Networks and Artificial Reference Pictures
Title | HEVC Inter Coding Using Deep Recurrent Neural Networks and Artificial Reference Pictures |
Authors | Felix Haub, Thorsten Laude, Jörn Ostermann |
Abstract | The efficiency of motion compensated prediction in modern video codecs highly depends on the available reference pictures. Occlusions and non-linear motion pose challenges for the motion compensation and often result in high bit rates for the prediction error. We propose the generation of artificial reference pictures using deep recurrent neural networks. Conceptually, a reference picture at the time instance of the currently coded picture is generated from previously reconstructed conventional reference pictures. Based on these artificial reference pictures, we propose a complete coding pipeline based on HEVC. By using the artificial reference pictures for motion compensated prediction, average BD-rate gains of 1.5% over HEVC are achieved. |
Tasks | Motion Compensation |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.02137v1 |
http://arxiv.org/pdf/1812.02137v1.pdf | |
PWC | https://paperswithcode.com/paper/hevc-inter-coding-using-deep-recurrent-neural |
Repo | |
Framework | |
IterGANs: Iterative GANs to Learn and Control 3D Object Transformation
Title | IterGANs: Iterative GANs to Learn and Control 3D Object Transformation |
Authors | Ysbrand Galama, Thomas Mensink |
Abstract | We are interested in learning visual representations which allow for 3D manipulations of visual objects based on a single 2D image. We cast this into an image-to-image transformation task, and propose Iterative Generative Adversarial Networks (IterGANs) which iteratively transform an input image into an output image. Our models learn a visual representation that can be used for objects seen in training, but also for never seen objects. Since object manipulation requires a full understanding of the geometry and appearance of the object, our IterGANs learn an implicit 3D model and a full appearance model of the object, which are both inferred from a single (test) image. Two advantages of IterGANs are that the intermediate generated images can be used for an additional supervision signal, even in an unsupervised fashion, and that the number of iterations can be used as a control signal to steer the transformation. Experiments on rotated objects and scenes show how IterGANs help with the generation process. |
Tasks | |
Published | 2018-04-16 |
URL | https://arxiv.org/abs/1804.05651v2 |
https://arxiv.org/pdf/1804.05651v2.pdf | |
PWC | https://paperswithcode.com/paper/itergans-iterative-gans-to-learn-and-control |
Repo | |
Framework | |
MoSculp: Interactive Visualization of Shape and Time
Title | MoSculp: Interactive Visualization of Shape and Time |
Authors | Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman |
Abstract | We present a system that allows users to visualize complex human motion via 3D motion sculptures—a representation that conveys the 3D structure swept by a human body as it moves through space. Given an input video, our system computes the motion sculptures and provides a user interface for rendering it in different styles, including the options to insert the sculpture back into the original video, render it in a synthetic scene or physically print it. To provide this end-to-end workflow, we introduce an algorithm that estimates that human’s 3D geometry over time from a set of 2D images and develop a 3D-aware image-based rendering approach that embeds the sculpture back into the scene. By automating the process, our system takes motion sculpture creation out of the realm of professional artists, and makes it applicable to a wide range of existing video material. By providing viewers with 3D information, motion sculptures reveal space-time motion information that is difficult to perceive with the naked eye, and allow viewers to interpret how different parts of the object interact over time. We validate the effectiveness of this approach with user studies, finding that our motion sculpture visualizations are significantly more informative about motion than existing stroboscopic and space-time visualization methods. |
Tasks | |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05491v2 |
http://arxiv.org/pdf/1809.05491v2.pdf | |
PWC | https://paperswithcode.com/paper/mosculp-interactive-visualization-of-shape |
Repo | |
Framework | |
Tied Multitask Learning for Neural Speech Translation
Title | Tied Multitask Learning for Neural Speech Translation |
Authors | Antonios Anastasopoulos, David Chiang |
Abstract | We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task, since higher-level intermediate representations should provide useful information. Second, we apply regularization that encourages transitivity and invertibility. We show that the application of these notions on jointly trained models improves performance on the tasks of low-resource speech transcription and translation. It also leads to better performance when using attention information for word discovery over unsegmented input. |
Tasks | |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06655v2 |
http://arxiv.org/pdf/1802.06655v2.pdf | |
PWC | https://paperswithcode.com/paper/tied-multitask-learning-for-neural-speech |
Repo | |
Framework | |
Recurrent Multi-Graph Neural Networks for Travel Cost Prediction
Title | Recurrent Multi-Graph Neural Networks for Travel Cost Prediction |
Authors | Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen, Lu Chen |
Abstract | Origin-destination (OD) matrices are often used in urban planning, where a city is partitioned into regions and an element (i, j) in an OD matrix records the cost (e.g., travel time, fuel consumption, or travel speed) from region i to region j. In this paper, we partition a day into multiple intervals, e.g., 96 15-min intervals and each interval is associated with an OD matrix which represents the costs in the interval; and we consider sparse and stochastic OD matrices, where the elements represent stochastic but not deterministic costs and some elements are missing due to lack of data between two regions. We solve the sparse, stochastic OD matrix forecasting problem. Given a sequence of historical OD matrices that are sparse, we aim at predicting future OD matrices with no empty elements. We propose a generic learning framework to solve the problem by dealing with sparse matrices via matrix factorization and two graph convolutional neural networks and capturing temporal dynamics via recurrent neural network. Empirical studies using two taxi datasets from different countries verify the effectiveness of the proposed framework. |
Tasks | |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05157v1 |
http://arxiv.org/pdf/1811.05157v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-multi-graph-neural-networks-for |
Repo | |
Framework | |