January 26, 2020

3150 words 15 mins read

Paper Group ANR 1396

Paper Group ANR 1396

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers. Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance. A Bandit Framework for Optimal Selection of Reinforcement Learning Agents. Evaluation of a Recommender System for Assisting Novice Game Designers. Semi-supervised Image Attribute Editing using Generative Adv …

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Title Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
Authors Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song
Abstract Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer’s prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.
Tasks Speech Recognition
Published 2019-11-26
URL https://arxiv.org/abs/1911.11502v1
PDF https://arxiv.org/pdf/1911.11502v1.pdf
PWC https://paperswithcode.com/paper/hearing-lips-improving-lip-reading-by
Repo
Framework

Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance

Title Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance
Authors Qiuyu Zhu, Zikuang He, Xin Ye
Abstract The main purpose of incremental learning is to learn new knowledge while not forgetting the knowledge which have been learned before. At present, the main challenge in this area is the catastrophe forgetting, namely the network will lose their performance in the old tasks after training for new tasks. In this paper, we introduce an ensemble method of incremental classifier to alleviate this problem, which is based on the cosine distance between the output feature and the pre-defined center, and can let each task to be preserved in different networks. During training, we make use of PEDCC-Loss to train the CNN network. In the stage of testing, the prediction is determined by the cosine distance between the network latent features and pre-defined center. The experimental results on EMINST and CIFAR100 show that our method outperforms the recent LwF method, which use the knowledge distillation, and iCaRL method, which keep some old samples while training for new task. The method can achieve the goal of not forgetting old knowledge while training new classes, and solve the problem of catastrophic forgetting better.
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.04734v1
PDF https://arxiv.org/pdf/1906.04734v1.pdf
PWC https://paperswithcode.com/paper/incremental-classifier-learning-based-on
Repo
Framework

A Bandit Framework for Optimal Selection of Reinforcement Learning Agents

Title A Bandit Framework for Optimal Selection of Reinforcement Learning Agents
Authors Andreas Merentitis, Kashif Rasul, Roland Vollgraf, Abdul-Saboor Sheikh, Urs Bergmann
Abstract Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architecture, hyperparameters, etc.) of a reinforcement agent depends on the application. In this work, we propose a multi-arm bandit framework that selects from a set of different reinforcement learning agents to choose the one with the best inductive bias. To alleviate the problem of sparse rewards, the reinforcement learning agents are augmented with surrogate rewards. This helps the bandit framework to select the best agents early, since these rewards are smoother and less sparse than the environment reward. The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps. Our experimental results on standard environments show that the proposed framework is able to consistently select the optimal agent after a finite number of steps, while collecting more cumulative reward compared to selecting a sub-optimal architecture or uniformly alternating between different agents.
Tasks
Published 2019-02-10
URL http://arxiv.org/abs/1902.03657v1
PDF http://arxiv.org/pdf/1902.03657v1.pdf
PWC https://paperswithcode.com/paper/a-bandit-framework-for-optimal-selection-of
Repo
Framework

Evaluation of a Recommender System for Assisting Novice Game Designers

Title Evaluation of a Recommender System for Assisting Novice Game Designers
Authors Tiago Machado, Daniel Gopstein, Oded Nov, Angela Wang, Andy Nealen, Julian Togelius
Abstract Game development is a complex task involving multiple disciplines and technologies. Developers and researchers alike have suggested that AI-driven game design assistants may improve developer workflow. We present a recommender system for assisting humans in game design as well as a rigorous human subjects study to validate it. The AI-driven game design assistance system suggests game mechanics to designers based on characteristics of the game being developed. We believe this method can bring creative insights and increase users’ productivity. We conducted quantitative studies that showed the recommender system increases users’ levels of accuracy and computational affect, and decreases their levels of workload.
Tasks Recommendation Systems
Published 2019-08-13
URL https://arxiv.org/abs/1908.04629v1
PDF https://arxiv.org/pdf/1908.04629v1.pdf
PWC https://paperswithcode.com/paper/evaluation-of-a-recommender-system-for
Repo
Framework

Semi-supervised Image Attribute Editing using Generative Adversarial Networks

Title Semi-supervised Image Attribute Editing using Generative Adversarial Networks
Authors Yahya Dogan, Hacer Yalim Keles
Abstract Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The method to achieve this goal is to find an accurate latent vector representation of an image and a direction corresponding to the attribute. Almost all the works in the literature use labeled datasets in a supervised setting for this purpose. In this study, we introduce an architecture called Cyclic Reverse Generator (CRG), which allows learning the inverse function of the generator accurately via an encoder in an unsupervised setting by utilizing cyclic cost minimization. Attribute editing is then performed using the CRG models for finding desired attribute representations in the latent space. In this work, we use two arbitrary reference images, with and without desired attributes, to compute an attribute direction for editing. We show that the proposed approach performs better in terms of image reconstruction compared to the existing end-to-end generative models both quantitatively and qualitatively. We demonstrate state-of-the-art results on both real images and generated images in MNIST and CelebA datasets.
Tasks Image Reconstruction
Published 2019-07-03
URL https://arxiv.org/abs/1907.01841v1
PDF https://arxiv.org/pdf/1907.01841v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-image-attribute-editing-using
Repo
Framework

A Review on Deep Learning in Medical Image Reconstruction

Title A Review on Deep Learning in Medical Image Reconstruction
Authors Haimiao Zhang, Bin Dong
Abstract Medical imaging is crucial in modern clinics to guide the diagnosis and treatment of diseases. Medical image reconstruction is one of the most fundamental and important components of medical imaging, whose major objective is to acquire high-quality medical images for clinical usage at the minimal cost and risk to the patients. Mathematical models in medical image reconstruction or, more generally, image restoration in computer vision, have been playing a prominent role. Earlier mathematical models are mostly designed by human knowledge or hypothesis on the image to be reconstructed, and we shall call these models handcrafted models. Later, handcrafted plus data-driven modeling started to emerge which still mostly relies on human designs, while part of the model is learned from the observed data. More recently, as more data and computation resources are made available, deep learning based models (or deep models) pushed the data-driven modeling to the extreme where the models are mostly based on learning with minimal human designs. Both handcrafted and data-driven modeling have their own advantages and disadvantages. One of the major research trends in medical imaging is to combine handcrafted modeling with deep modeling so that we can enjoy benefits from both approaches. The major part of this article is to provide a conceptual review of some recent works on deep modeling from the unrolling dynamics viewpoint. This viewpoint stimulates new designs of neural network architectures with inspirations from optimization algorithms and numerical differential equations. Given the popularity of deep modeling, there are still vast remaining challenges in the field, as well as opportunities which we shall discuss at the end of this article.
Tasks Image Reconstruction, Image Restoration
Published 2019-06-23
URL https://arxiv.org/abs/1906.10643v1
PDF https://arxiv.org/pdf/1906.10643v1.pdf
PWC https://paperswithcode.com/paper/a-review-on-deep-learning-in-medical-image
Repo
Framework

HishabNet: Detection, Localization and Calculation of Handwritten Bengali Mathematical Expressions

Title HishabNet: Detection, Localization and Calculation of Handwritten Bengali Mathematical Expressions
Authors Md Nafee Al Islam, Siamul Karim Khan
Abstract Recently, recognition of handwritten Bengali letters and digits have captured a lot of attention among the researchers of the AI community. In this work, we propose a Convolutional Neural Network (CNN) based object detection model which can recognize and evaluate handwritten Bengali mathematical expressions. This method is able to detect multiple Bengali digits and operators and locate their positions in the image. With that information, it is able to construct numbers from series of digits and perform mathematical operations on them. For the object detection task, the state-of-the-art YOLOv3 algorithm was utilized. For training and evaluating the model, we have engineered a new dataset ‘Hishab’ which is the first Bengali handwritten digits dataset intended for object detection. The model achieved an overall validation mean average precision (mAP) of 98.6%. Also, the classification accuracy of the feature extractor backbone CNN used in our model was tested on two publicly available Bengali handwritten digits datasets: NumtaDB and CMATERdb. The backbone CNN achieved a test set accuracy of 99.6252% on NumtaDB and 99.0833% on CMATERdb.
Tasks Object Detection
Published 2019-09-02
URL https://arxiv.org/abs/1909.00823v1
PDF https://arxiv.org/pdf/1909.00823v1.pdf
PWC https://paperswithcode.com/paper/hishabnet-detection-localization-and
Repo
Framework

Model-based Deep Medical Imaging: the roadmap of generalizing iterative reconstruction model using deep learning

Title Model-based Deep Medical Imaging: the roadmap of generalizing iterative reconstruction model using deep learning
Authors Jing Cheng, Haifeng Wang, Yanjie Zhu, Qiegen Liu, Qiyang Zhang, Ting Su, Jianwei Chen, Yongshuai Ge, Zhanli Hu, Xin Liu, Hairong Zheng, Leslie Ying, Dong Liang
Abstract Medical imaging is playing a more and more important role in clinics. However, there are several issues in different imaging modalities such as slow imaging speed in MRI, radiation injury in CT and PET. Therefore, accelerating MRI, reducing radiation dose in CT and PET have been ongoing research topics since their invention. Usually, acquiring less data is a direct but important strategy to address these issues. However, less acquisition usually results in aliasing artifacts in reconstructions. Recently, deep learning (DL) has been introduced in medical image reconstruction and shown potential on significantly speeding up MR reconstruction and reducing radiation dose. In this paper, we propose a general framework on combining the reconstruction model with deep learning to maximize the potential of deep learning and model-based reconstruction, and give the examples to demonstrate the performance and requirements of unrolling different algorithms using deep learning.
Tasks Image Reconstruction
Published 2019-06-19
URL https://arxiv.org/abs/1906.08143v4
PDF https://arxiv.org/pdf/1906.08143v4.pdf
PWC https://paperswithcode.com/paper/model-based-deep-mr-imaging-the-roadmap-of
Repo
Framework

Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Title Deep Variational Networks with Exponential Weighting for Learning Computed Tomography
Authors Valery Vishnevskiy, Richard Rau, Orcun Goksel
Abstract Tomographic image reconstruction is relevant for many medical imaging modalities including X-ray, ultrasound (US) computed tomography (CT) and photoacoustics, for which the access to full angular range tomographic projections might be not available in clinical practice due to physical or time constraints. Reconstruction from incomplete data in low signal-to-noise ratio regime is a challenging and ill-posed inverse problem that usually leads to unsatisfactory image quality. While informative image priors may be learned using generic deep neural network architectures, the artefacts caused by an ill-conditioned design matrix often have global spatial support and cannot be efficiently filtered out by means of convolutions. In this paper we propose to learn an inverse mapping in an end-to-end fashion via unrolling optimization iterations of a prototypical reconstruction algorithm. We herein introduce a network architecture that performs filtering jointly in both sinogram and spatial domains. To efficiently train such deep network we propose a novel regularization approach based on deep exponential weighting. Experiments on US and X-ray CT data show that our proposed method is qualitatively and quantitatively superior to conventional non-linear reconstruction methods as well as state-of-the-art deep networks for image reconstruction. Fast inference time of the proposed algorithm allows for sophisticated reconstructions in real-time critical settings, demonstrated with US SoS imaging of an ex vivo bovine phantom.
Tasks Computed Tomography (CT), Image Reconstruction
Published 2019-06-13
URL https://arxiv.org/abs/1906.05528v1
PDF https://arxiv.org/pdf/1906.05528v1.pdf
PWC https://paperswithcode.com/paper/deep-variational-networks-with-exponential
Repo
Framework

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Title Mask-Predict: Parallel Decoding of Conditional Masked Language Models
Authors Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer
Abstract Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.
Tasks Language Modelling, Machine Translation
Published 2019-04-19
URL https://arxiv.org/abs/1904.09324v2
PDF https://arxiv.org/pdf/1904.09324v2.pdf
PWC https://paperswithcode.com/paper/190409324
Repo
Framework

Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis

Title Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Authors Yanyao Bian, Changbin Chen, Yongguo Kang, Zhenglin Pan
Abstract Speech style control and transfer techniques aim to enrich the diversity and expressiveness of synthesized speech. Existing approaches model all speech styles into one representation, lacking the ability to control a specific speech feature independently. To address this issue, we introduce a novel multi-reference structure to Tacotron and propose intercross training approach, which together ensure that each sub-encoder of the multi-reference encoder independently disentangles and controls a specific style. Experimental results show that our model is able to control and transfer desired speech styles individually.
Tasks Speech Synthesis
Published 2019-04-04
URL http://arxiv.org/abs/1904.02373v1
PDF http://arxiv.org/pdf/1904.02373v1.pdf
PWC https://paperswithcode.com/paper/multi-reference-tacotron-by-intercross
Repo
Framework

Spatiotemporal Information Processing with a Reservoir Decision-making Network

Title Spatiotemporal Information Processing with a Reservoir Decision-making Network
Authors Yuanyuan Mi, Xiaohan Lin, Xiaolong Zou, Zilong Ji, Tiejun Huang, Si Wu
Abstract Spatiotemporal information processing is fundamental to brain functions. The present study investigates a canonic neural network model for spatiotemporal pattern recognition. Specifically, the model consists of two modules, a reservoir subnetwork and a decision-making subnetwork. The former projects complex spatiotemporal patterns into spatially separated neural representations, and the latter reads out these neural representations via integrating information over time; the two modules are combined together via supervised-learning using known examples. We elucidate the working mechanism of the model and demonstrate its feasibility for discriminating complex spatiotemporal patterns. Our model reproduces the phenomenon of recognizing looming patterns in the neural system, and can learn to discriminate gait with very few training examples. We hope this study gives us insight into understanding how spatiotemporal information is processed in the brain and helps us to develop brain-inspired application algorithms.
Tasks Decision Making
Published 2019-07-28
URL https://arxiv.org/abs/1907.12071v1
PDF https://arxiv.org/pdf/1907.12071v1.pdf
PWC https://paperswithcode.com/paper/spatiotemporal-information-processing-with-a
Repo
Framework

Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Title Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices
Authors Foroozan Karimzadeh, Ningyuan Cao, Brian Crafton, Justin Romberg, Arijit Raychowdhury
Abstract Deep neural networks (DNNs) have been emerged as the state-of-the-art algorithms in broad range of applications. To reduce the memory foot-print of DNNs, in particular for embedded applications, sparsification techniques have been proposed. Unfortunately, these techniques come with a large hardware overhead. In this paper, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a Linear Feedback Shift Registers (LFSRs). Using the proposed method, we demonstrate a total saving of energy and area up to 63.96% and 64.23% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression-rate and iso-accuracy.
Tasks
Published 2019-11-09
URL https://arxiv.org/abs/1911.04468v1
PDF https://arxiv.org/pdf/1911.04468v1.pdf
PWC https://paperswithcode.com/paper/hardware-aware-pruning-of-dnns-using-lfsr
Repo
Framework

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

Title Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Authors Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston
Abstract The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Gal'an-Garc'ia et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it strategy with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods will be made open source and publicly available.
Tasks
Published 2019-08-17
URL https://arxiv.org/abs/1908.06083v1
PDF https://arxiv.org/pdf/1908.06083v1.pdf
PWC https://paperswithcode.com/paper/build-it-break-it-fix-it-for-dialogue-safety
Repo
Framework

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Title Tango: A Deep Neural Network Benchmark Suite for Various Accelerators
Authors Aajna Karki, Chethan Palangotu Keshava, Spoorthi Mysore Shivakumar, Joshua Skow, Goutam Madhukeshwar Hegde, Hyeran Jeon
Abstract Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark workloads. Though a few DNN benchmark suites have been recently released, most of them require to install proprietary DNN libraries or resource-intensive DNN frameworks, which are hard to run on resource-limited mobile platforms or architecture simulators. To provide a more scalable evaluation environment, we propose a new DNN benchmark suite that can run on any platform that supports CUDA and OpenCL. The proposed benchmark suite includes the most widely used five convolution neural networks and two recurrent neural networks. We provide in-depth architectural statistics of these networks while running them on an architecture simulator, a server- and a mobile-GPU, and a mobile FPGA.
Tasks
Published 2019-01-14
URL http://arxiv.org/abs/1901.04987v1
PDF http://arxiv.org/pdf/1901.04987v1.pdf
PWC https://paperswithcode.com/paper/tango-a-deep-neural-network-benchmark-suite
Repo
Framework
comments powered by Disqus