Paper Group ANR 1696
Leveraging End-to-End Speech Recognition with Neural Architecture Search. RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved. A multi-agent system approach in evaluating human spatio-temporal vulnerability to seismic risk using social attachment. Clustering by the way of atomic fission. …
Leveraging End-to-End Speech Recognition with Neural Architecture Search
Title | Leveraging End-to-End Speech Recognition with Neural Architecture Search |
Authors | Ahmed Baruwa, Mojeed Abisiga, Ibrahim Gbadegesin, Afeez Fakunle |
Abstract | Deep neural networks (DNNs) have been demonstrated to outperform many traditional machine learning algorithms in Automatic Speech Recognition (ASR). In this paper, we show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization at a very low computational cost. Phone recognition tests with the popular LibriSpeech and TIMIT benchmarks proved this fact by displaying the ability to discover and train novel candidate models within a few hours (less than a day) many times faster than the attention-based seq2seq models. Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results. |
Tasks | End-To-End Speech Recognition, Neural Architecture Search, Speech Recognition |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05946v1 |
https://arxiv.org/pdf/1912.05946v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-end-to-end-speech-recognition-with |
Repo | |
Framework | |
RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved
Title | RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved |
Authors | Yu He, Jianxin Li, Yangqiu Song, Xinmiao Zhang, Fanzhang Peng, Hao Peng |
Abstract | Higher-order proximity preserved network embedding has attracted increasing attention recently. In particular, due to the superior scalability, random-walk based network embedding has also been well developed, which could efficiently explore higher-order neighborhood via multi-hop random walks. However, despite the success of current random-walk based methods, most of them are usually not expressive enough to preserve the personalized higher-order proximity and lack a straightforward objective to theoretically articulate what and how network proximity is preserved. In this paper, to address the above issues, we present a general scalable random-walk based network embedding framework, in which random walk is explicitly incorporated into a sound objective designed theoretically to preserve arbitrary higher-order proximity. Further, we introduce the random walk with restart process into the framework to naturally and effectively achieve personalized-weighted preservation of proximities of different orders. We conduct extensive experiments on several real-world networks and demonstrate that our proposed method consistently and substantially outperforms the state-of-the-art network embedding methods. |
Tasks | Network Embedding |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07874v1 |
https://arxiv.org/pdf/1911.07874v1.pdf | |
PWC | https://paperswithcode.com/paper/rwne-a-scalable-random-walk-based-network |
Repo | |
Framework | |
A multi-agent system approach in evaluating human spatio-temporal vulnerability to seismic risk using social attachment
Title | A multi-agent system approach in evaluating human spatio-temporal vulnerability to seismic risk using social attachment |
Authors | Julius Bañgate, Julie Dugdale, Elise Beck, Carole Adam |
Abstract | Social attachment theory states that individuals seek the proximity of attachment figures (e.g. family members, friends, colleagues, familiar places or objects) when faced with threat. During disasters, this means that family members may seek each other before evacuating, gather personal property before heading to familiar exits and places, or follow groups/crowds, etc. This hard-wired human tendency should be considered in the assessment of risk and the creation of disaster management plans. Doing so may result in more realistic evacuation procedures and may minimise the number of casualties and injuries. In this context, a dynamic spatio-temporal analysis of seismic risk is presented using SOLACE, a multi-agent model of pedestrian behaviour based on social attachment theory implemented using the Belief-Desire-Intention approach. The model focuses on the influence of human, social, physical and temporal factors on successful evacuation. Human factors considered include perception and mobility defined by age. Social factors are defined by attachment bonds, social groups, population distribution, and cultural norms. Physical factors refer to the location of the epicentre of the earthquake, spatial distribution/layout and attributes of environmental objects such as buildings, roads, barriers (cars), placement of safe areas, evacuation routes, and the resulting debris/damage from the earthquake. Experiments tested the influence of time of the day, presence of disabled persons and earthquake intensity. Initial results show that factors that influence arrivals in safe areas include (a) human factors (age, disability, speed), (b) pre-evacuation behaviours, (c) perception distance (social attachment, time of day), (d) social interaction during evacuation, and (e) physical and spatial aspects, such as limitations imposed by debris (damage), and the distance to safe areas. To validate the results, scenarios will be designed with stakeholders, who will also take part in the definition of a serious game. The recommendation of this research is that both social and physical aspects should be considered when defining vulnerability in the analysis of risk. |
Tasks | |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.01365v1 |
https://arxiv.org/pdf/1905.01365v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-agent-system-approach-in-evaluating |
Repo | |
Framework | |
Clustering by the way of atomic fission
Title | Clustering by the way of atomic fission |
Authors | Shizhan Lu |
Abstract | Cluster analysis which focuses on the grouping and categorization of similar elements is widely used in various fields of research. Inspired by the phenomenon of atomic fission, a novel density-based clustering algorithm is proposed in this paper, called fission clustering (FC). It focuses on mining the dense families of a dataset and utilizes the information of the distance matrix to fissure clustering dataset into subsets. When we face the dataset which has a few points surround the dense families of clusters, K-nearest neighbors local density indicator is applied to distinguish and remove the points of sparse areas so as to obtain a dense subset that is constituted by the dense families of clusters. A number of frequently-used datasets were used to test the performance of this clustering approach, and to compare the results with those of algorithms. The proposed algorithm is found to outperform other algorithms in speed and accuracy. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11416v1 |
https://arxiv.org/pdf/1906.11416v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-by-the-way-of-atomic-fission |
Repo | |
Framework | |
Nested Network with Two-Stream Pyramid for Salient Object Detection in Optical Remote Sensing Images
Title | Nested Network with Two-Stream Pyramid for Salient Object Detection in Optical Remote Sensing Images |
Authors | Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, Sam Kwong |
Abstract | Arising from the various object types and scales, diverse imaging orientations, and cluttered backgrounds in optical remote sensing image (RSI), it is difficult to directly extend the success of salient object detection for nature scene image to the optical RSI. In this paper, we propose an end-to-end deep network called LV-Net based on the shape of network architecture, which detects salient objects from optical RSIs in a purely data-driven fashion. The proposed LV-Net consists of two key modules, i.e., a two-stream pyramid module (L-shaped module) and an encoder-decoder module with nested connections (V-shaped module). Specifically, the L-shaped module extracts a set of complementary information hierarchically by using a two-stream pyramid structure, which is beneficial to perceiving the diverse scales and local details of salient objects. The V-shaped module gradually integrates encoder detail features with decoder semantic features through nested connections, which aims at suppressing the cluttered backgrounds and highlighting the salient objects. In addition, we construct the first publicly available optical RSI dataset for salient object detection, including 800 images with varying spatial resolutions, diverse saliency types, and pixel-wise ground truth. Experiments on this benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art salient object detection methods both qualitatively and quantitatively. |
Tasks | Object Detection, Salient Object Detection |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08462v1 |
https://arxiv.org/pdf/1906.08462v1.pdf | |
PWC | https://paperswithcode.com/paper/nested-network-with-two-stream-pyramid-for |
Repo | |
Framework | |
Adversarial Self-Paced Learning for Mixture Models of Hawkes Processes
Title | Adversarial Self-Paced Learning for Mixture Models of Hawkes Processes |
Authors | Dixin Luo, Hongteng Xu, Lawrence Carin |
Abstract | We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event sequences drawn from different Hawkes processes, the proposed method learns the target model iteratively, which generates “easy” sequences and uses them in an adversarial and self-paced manner. In each iteration, we first generate a set of augmented sequences from original observed sequences. Based on the fact that an easy sample of the target model can be an adversarial sample of a misspecified model, we apply a maximum likelihood estimation with an adversarial self-paced mechanism. In this manner the target model is updated, and the augmented sequences that obey it are employed for the next learning iteration. Experimental results show that the proposed method outperforms traditional methods consistently. |
Tasks | Data Augmentation |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08397v1 |
https://arxiv.org/pdf/1906.08397v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-self-paced-learning-for-mixture |
Repo | |
Framework | |
Segmenting Hyperspectral Images Using Spectral-Spatial Convolutional Neural Networks With Training-Time Data Augmentation
Title | Segmenting Hyperspectral Images Using Spectral-Spatial Convolutional Neural Networks With Training-Time Data Augmentation |
Authors | Jakub Nalepa, Lukasz Tulczyjew, Michal Myller, Michal Kawulok |
Abstract | Hyperspectral imaging provides detailed information about the scanned objects, as it captures their spectral characteristics within a large number of wavelength bands. Classification of such data has become an active research topic due to its wide applicability in a variety of fields. Deep learning has established the state of the art in the area, and it constitutes the current research mainstream. In this letter, we introduce a new spectral-spatial convolutional neural network, benefitting from a battery of data augmentation techniques which help deal with a real-life problem of lacking ground-truth training data. Our rigorous experiments showed that the proposed method outperforms other spectral-spatial techniques from the literature, and delivers precise hyperspectral classification in real time. |
Tasks | Data Augmentation |
Published | 2019-07-27 |
URL | https://arxiv.org/abs/1907.11935v1 |
https://arxiv.org/pdf/1907.11935v1.pdf | |
PWC | https://paperswithcode.com/paper/segmenting-hyperspectral-images-using |
Repo | |
Framework | |
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
Title | Connections Between Mirror Descent, Thompson Sampling and the Information Ratio |
Authors | Julian Zimmert, Tor Lattimore |
Abstract | The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for $k$-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on $\ell_p$-balls and bandits with graph feedback. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11817v1 |
https://arxiv.org/pdf/1905.11817v1.pdf | |
PWC | https://paperswithcode.com/paper/connections-between-mirror-descent-thompson |
Repo | |
Framework | |
DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL
Title | DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL |
Authors | Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam |
Abstract | We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment. We utilize the VIC framework, which maximizes an agent’s `empowerment’, i.e. the ability to reliably reach a diverse set of states – and formulate a sandwich bound on the empowerment objective that allows identification of decision states. Unlike previous work, our decision states are discovered without extrinsic rewards – simply by interacting with the world. Our results show that our decision states are: (1) often interpretable, and (2) lead to better exploration on downstream goal-driven tasks in partially observable environments. | |
Tasks | Hierarchical Reinforcement Learning |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10580v3 |
https://arxiv.org/pdf/1907.10580v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-discovery-of-decision-states-for |
Repo | |
Framework | |
DepthwiseGANs: Fast Training Generative Adversarial Networks for Realistic Image Synthesis
Title | DepthwiseGANs: Fast Training Generative Adversarial Networks for Realistic Image Synthesis |
Authors | Mkhuseli Ngxande, Jules-Raymond Tapamo, Michael Burke |
Abstract | Recent work has shown significant progress in the direction of synthetic data generation using Generative Adversarial Networks (GANs). GANs have been applied in many fields of computer vision including text-to-image conversion, domain transfer, super-resolution, and image-to-video applications. In computer vision, traditional GANs are based on deep convolutional neural networks. However, deep convolutional neural networks can require extensive computational resources because they are based on multiple operations performed by convolutional layers, which can consist of millions of trainable parameters. Training a GAN model can be difficult and it takes a significant amount of time to reach an equilibrium point. In this paper, we investigate the use of depthwise separable convolutions to reduce training time while maintaining data generation performance. Our results show that a DepthwiseGAN architecture can generate realistic images in shorter training periods when compared to a StarGan architecture, but that model capacity still plays a significant role in generative modelling. In addition, we show that depthwise separable convolutions perform best when only applied to the generator. For quality evaluation of generated images, we use the Fr'echet Inception Distance (FID), which compares the similarity between the generated image distribution and that of the training dataset. |
Tasks | Image Generation, Super-Resolution, Synthetic Data Generation |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02225v1 |
http://arxiv.org/pdf/1903.02225v1.pdf | |
PWC | https://paperswithcode.com/paper/depthwisegans-fast-training-generative |
Repo | |
Framework | |
Small-GAN: Speeding Up GAN Training Using Core-sets
Title | Small-GAN: Speeding Up GAN Training Using Core-sets |
Authors | Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena |
Abstract | Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of ‘real’ images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection. |
Tasks | Active Learning, Anomaly Detection, Image Generation |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13540v1 |
https://arxiv.org/pdf/1910.13540v1.pdf | |
PWC | https://paperswithcode.com/paper/small-gan-speeding-up-gan-training-using-core |
Repo | |
Framework | |
Low-Power Computer Vision: Status, Challenges, Opportunities
Title | Low-Power Computer Vision: Status, Challenges, Opportunities |
Authors | Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, Juzheng Liu, Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh Nguyen, Eunbyung Park, Denis Repin, Liang Shen, Tao Sheng, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo |
Abstract | Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners’ solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision. |
Tasks | |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.07714v1 |
http://arxiv.org/pdf/1904.07714v1.pdf | |
PWC | https://paperswithcode.com/paper/low-power-computer-vision-status-challenges |
Repo | |
Framework | |
L0 Regularization Based Neural Network Design and Compression
Title | L0 Regularization Based Neural Network Design and Compression |
Authors | S. Asim Ahmed |
Abstract | We consider complexity of Deep Neural Networks (DNNs) and their associated massive over-parameterization. Such over-parametrization may entail susceptibility to adversarial attacks, loss of interpretability and adverse Size, Weight and Power - Cost (SWaP-C) considerations. We ask if there are methodical ways (regularization) to reduce complexity and how can we interpret trade-off between desired metric and complexity of DNN. Reducing complexity is directly applicable to scaling of AI applications to real world problems (especially for off-the-cloud applications). We show that presence and evaluation of the knee of the tradeoff curve. We apply a form of L0 regularization to MNIST data and signal modulation classifications. We show that such regularization captures saliency in the input space as well. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13652v1 |
https://arxiv.org/pdf/1905.13652v1.pdf | |
PWC | https://paperswithcode.com/paper/l0-regularization-based-neural-network-design |
Repo | |
Framework | |
Say What I Want: Towards the Dark Side of Neural Dialogue Models
Title | Say What I Want: Towards the Dark Side of Neural Dialogue Models |
Authors | Haochen Liu, Tyler Derr, Zitao Liu, Jiliang Tang |
Abstract | Neural dialogue models have been widely adopted in various chatbot applications because of their good performance in simulating and generalizing human conversations. However, there exists a dark side of these models – due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services. In this work, we investigate whether we can craft inputs that lead a well-trained black-box neural dialogue model to generate targeted outputs. We formulate this as a reinforcement learning (RL) problem and train a Reverse Dialogue Generator which efficiently finds such inputs for targeted outputs. Experiments conducted on a representative neural dialogue model show that our proposed model is able to discover such desired inputs in a considerable portion of cases. Overall, our work reveals this weakness of neural dialogue models and may prompt further researches of developing corresponding solutions to avoid it. |
Tasks | Chatbot |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06044v3 |
https://arxiv.org/pdf/1909.06044v3.pdf | |
PWC | https://paperswithcode.com/paper/say-what-i-want-towards-the-dark-side-of |
Repo | |
Framework | |
Mitigating Annotation Artifacts in Natural Language Inference Datasets to Improve Cross-dataset Generalization Ability
Title | Mitigating Annotation Artifacts in Natural Language Inference Datasets to Improve Cross-dataset Generalization Ability |
Authors | Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, Tiejun Zhao |
Abstract | Natural language inference (NLI) aims at predicting the relationship between a given pair of premise and hypothesis. However, several works have found that there widely exists a bias pattern called annotation artifacts in NLI datasets, making it possible to identify the label only by looking at the hypothesis. This irregularity makes the evaluation results over-estimated and affects models’ generalization ability. In this paper, we consider a more trust-worthy setting, i.e., cross-dataset evaluation. We explore the impacts of annotation artifacts in cross-dataset testing. Furthermore, we propose a training framework to mitigate the impacts of the bias pattern. Experimental results demonstrate that our methods can alleviate the negative effect of the artifacts and improve the generalization ability of models. |
Tasks | Natural Language Inference |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04242v2 |
https://arxiv.org/pdf/1909.04242v2.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-annotation-artifacts-in-natural |
Repo | |
Framework | |