January 26, 2020

3224 words 16 mins read

Paper Group ANR 1418

Procedural Synthesis of Remote Sensing Images for Robust Change Detection with Neural Networks. Humor Detection: A Transformer Gets the Last Laugh. A Drug Recommendation System (Dr.S) for cancer cell lines. Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships. DNN-based Speaker Embedding Using Subjective Inter-speaker Simil …

Procedural Synthesis of Remote Sensing Images for Robust Change Detection with Neural Networks


Title	Procedural Synthesis of Remote Sensing Images for Robust Change Detection with Neural Networks
Authors	Maria Kolos, Anton Marin, Alexey Artemov, Evgeny Burnaev
Abstract	Data-driven methods such as convolutional neural networks (CNNs) are known to deliver state-of-the-art performance on image recognition tasks when the training data are abundant. However, in some instances, such as change detection in remote sensing images, annotated data cannot be obtained in sufficient quantities. In this work, we propose a simple and efficient method for creating realistic targeted synthetic datasets in the remote sensing domain, leveraging the opportunities offered by game development engines. We provide a description of the pipeline for procedural geometry generation and rendering as well as an evaluation of the efficiency of produced datasets in a change detection scenario. Our evaluations demonstrate that our pipeline helps to improve the performance and convergence of deep learning models when the amount of real-world data is severely limited.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.07877v1
PDF	https://arxiv.org/pdf/1905.07877v1.pdf
PWC	https://paperswithcode.com/paper/procedural-synthesis-of-remote-sensing-images
Repo
Framework

Humor Detection: A Transformer Gets the Last Laugh


Title	Humor Detection: A Transformer Gets the Last Laugh
Authors	Orion Weller, Kevin Seppi
Abstract	Much previous work has been done in attempting to identify humor in text. In this paper we extend that capability by proposing a new task: assessing whether or not a joke is humorous. We present a novel way of approaching this problem by building a model that learns to identify humorous jokes based on ratings gleaned from Reddit pages, consisting of almost 16,000 labeled instances. Using these ratings to determine the level of humor, we then employ a Transformer architecture for its advantages in learning from sentence context. We demonstrate the effectiveness of this approach and show results that are comparable to human performance. We further demonstrate our model’s increased capabilities on humor identification problems, such as the previously created datasets for short jokes and puns. These experiments show that this method outperforms all previous work done on these tasks, with an F-measure of 93.1% for the Puns dataset and 98.6% on the Short Jokes dataset.
Tasks	Humor Detection
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00252v1
PDF	https://arxiv.org/pdf/1909.00252v1.pdf
PWC	https://paperswithcode.com/paper/humor-detection-a-transformer-gets-the-last
Repo
Framework

A Drug Recommendation System (Dr.S) for cancer cell lines


Title	A Drug Recommendation System (Dr.S) for cancer cell lines
Authors	Marleen Balvert, Georgios Patoulidis, Andrew Patti, Timo M. Deist, Christine Eyler, Bas E. Dutilh, Alexander Schönhuth, David Craft
Abstract	Personalizing drug prescriptions in cancer care based on genomic information requires associating genomic markers with treatment effects. This is an unsolved challenge requiring genomic patient data in yet unavailable volumes as well as appropriate quantitative methods. We attempt to solve this challenge for an experimental proxy for which sufficient data is available: 42 drugs tested on 1018 cancer cell lines. Our goal is to develop a method to identify the drug that is most promising based on a cell line’s genomic information. For this, we need to identify for each drug the machine learning method, choice of hyperparameters and genomic features for optimal predictive performance. We extensively compare combinations of gene sets (both curated and random), genetic features, and machine learning algorithms for all 42 drugs. For each drug, the best performing combination (considering only the curated gene sets) is selected. We use these top model parameters for each drug to build and demonstrate a Drug Recommendation System (Dr.S). Insights resulting from this analysis are formulated as best practices for developing drug recommendation systems. The complete software system, called the Cell Line Analyzer, is written in Python and available on github.
Tasks	Recommendation Systems
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11548v1
PDF	https://arxiv.org/pdf/1912.11548v1.pdf
PWC	https://paperswithcode.com/paper/a-drug-recommendation-system-drs-for-cancer
Repo
Framework

Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships


Title	Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships
Authors	Gal Sadeh Kenigsfield, Ran El-Yaniv
Abstract	One of the most difficult tasks in scene understanding is recognizing interactions between objects in an image. This task is often called visual relationship detection (VRD). We consider the question of whether, given auxiliary textual data in addition to the standard visual data used for training VRD models, VRD performance can be improved. We present a new deep model that can leverage additional textual data. Our model relies on a shared text–image representation of subject-verb-object relationships appearing in the text, and object interactions in images. Our method is the first to enable recognition of visual relationships missing in the visual training data and appearing only in the auxiliary text. We test our approach on two different text sources: text originating in images and text originating in books. We test and validate our approach using two large-scale recognition tasks: VRD and Scene Graph Generation. We show a surprising result: Our approach works better with text originating in books, and outperforms the text originating in images on the task of unseen relationship recognition. It is comparable to the model which utilizes text originating in images on the task of seen relationship recognition.
Tasks	Graph Generation, Scene Graph Generation, Scene Understanding
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12324v1
PDF	https://arxiv.org/pdf/1910.12324v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-auxiliary-text-for-deep-1
Repo
Framework

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis


Title	DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis
Authors	Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Abstract	This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to multi-speaker modeling in speech synthesis, it does not correlate with the subjective inter-speaker similarity and is not necessarily appropriate speaker representation for open speakers whose speech utterances are not included in the training data. We propose two training algorithms for DNN-based speaker embedding model using an inter-speaker similarity matrix obtained by large-scale subjective scoring. One is based on similarity vector embedding and trains the model to predict a vector of the similarity matrix as speaker representation. The other is based on similarity matrix embedding and trains the model to minimize the squared Frobenius norm between the similarity matrix and the Gram matrix of $d$-vectors, i.e., the inter-speaker similarity derived from the $d$-vectors. We crowdsourced the inter-speaker similarity scores of 153 Japanese female speakers, and the experimental results demonstrate that our algorithms learn speaker embedding that is highly correlated with the subjective similarity. We also apply the proposed speaker embedding to multi-speaker modeling in DNN-based speech synthesis and reveal that the proposed similarity vector embedding improves synthetic speech quality for open speakers whose speech utterances are unseen during the training.
Tasks	Speech Synthesis
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08294v1
PDF	https://arxiv.org/pdf/1907.08294v1.pdf
PWC	https://paperswithcode.com/paper/dnn-based-speaker-embedding-using-subjective
Repo
Framework

Neural Network based Explicit Mixture Models and Expectation-maximization based Learning


Title	Neural Network based Explicit Mixture Models and Expectation-maximization based Learning
Authors	Dong Liu, Minh Thành Vu, Saikat Chatterjee, Lars K. Rasmussen
Abstract	We propose two neural network based mixture models in this article. The proposed mixture models are explicit in nature. The explicit models have analytical forms with the advantages of computing likelihood and efficiency of generating samples. Computation of likelihood is an important aspect of our models. Expectation-maximization based algorithms are developed for learning parameters of the proposed models. We provide sufficient conditions to realize the expectation-maximization based learning. The main requirements are invertibility of neural networks that are used as generators and Jacobian computation of functional form of the neural networks. The requirements are practically realized using a flow-based neural network. In our first mixture model, we use multiple flow-based neural networks as generators. Naturally the model is complex. A single latent variable is used as the common input to all the neural networks. The second mixture model uses a single flow-based neural network as a generator to reduce complexity. The single generator has a latent variable input that follows a Gaussian mixture distribution. We demonstrate efficiency of proposed mixture models through extensive experiments for generating samples and maximum likelihood based classification.
Tasks
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13432v1
PDF	https://arxiv.org/pdf/1907.13432v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-based-explicit-mixture-models
Repo
Framework

Massive Autonomous UAV Path Planning: A Neural Network Based Mean-Field Game Theoretic Approach


Title	Massive Autonomous UAV Path Planning: A Neural Network Based Mean-Field Game Theoretic Approach
Authors	Hamid Shiri, Jihong Park, Mehdi Bennis
Abstract	This paper investigates the autonomous control of massive unmanned aerial vehicles (UAVs) for mission-critical applications (e.g., dispatching many UAVs from a source to a destination for firefighting). Achieving their fast travel and low motion energy without inter-UAV collision under wind perturbation is a daunting control task, which incurs huge communication energy for exchanging UAV states in real time. We tackle this problem by exploiting a mean-field game (MFG) theoretic control method that requires the UAV state exchanges only once at the initial source. Afterwards, each UAV can control its acceleration by locally solving two partial differential equations (PDEs), known as the Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. This approach, however, brings about huge computation energy for solving the PDEs, particularly under multi-dimensional UAV states. We address this issue by utilizing a machine learning (ML) method where two separate ML models approximate the solutions of the HJB and FPK equations. These ML models are trained and exploited using an online gradient descent method with low computational complexity. Numerical evaluations validate that the proposed ML aided MFG theoretic algorithm, referred to as MFG learning control, is effective in collision avoidance with low communication energy and acceptable computation energy.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04152v1
PDF	https://arxiv.org/pdf/1905.04152v1.pdf
PWC	https://paperswithcode.com/paper/massive-autonomous-uav-path-planning-a-neural
Repo
Framework

Band-to-Band Tunneling based Ultra-Energy Efficient Silicon Neuron


Title	Band-to-Band Tunneling based Ultra-Energy Efficient Silicon Neuron
Authors	Tanmay Chavan, Sangya Dutta, Nihar R. Mohapatra, Udayan Ganguly
Abstract	The human brain comprises about a hundred billion neurons connected through quadrillion synapses. Spiking Neural Networks (SNNs) take inspiration from the brain to model complex cognitive and learning tasks. Neuromorphic engineering implements SNNs in hardware, aspiring to mimic the brain at scale (i.e., 100 billion neurons) with biological area and energy efficiency. The design of ultra-energy efficient and compact neurons is essential for the large-scale implementation of SNNs in hardware. In this work, we have experimentally demonstrated a Partially Depleted (PD) Silicon-On-Insulator (SOI) MOSFET based Leaky-Integrate & Fire (LIF) neuron where energy-and area-efficiency is enabled by two elements of design - first tunneling based operation and second compact sub-threshold SOI control circuit design. Band-to-Band Tunneling (BTBT) induced hole storage in the body is used for the “Integrate” function of the neuron. A compact control circuit “Fires” a spike when the body potential exceeds the firing threshold. The neuron then “Resets” by removing the stored holes from the body contact of the device. Additionally, the control circuit provides “Leakiness” in the neuron which is an essential property of biological neurons. The proposed neuron provides 10x higher area efficiency compared to CMOS design with equivalent energy/spike. Alternatively, it has 10^4x higher energy efficiency at area-equivalent neuron technologies. Biologically comparable energy- and area-efficiency along with CMOS compatibility make the proposed device attractive for large-scale hardware implementation of SNNs.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09726v1
PDF	http://arxiv.org/pdf/1902.09726v1.pdf
PWC	https://paperswithcode.com/paper/band-to-band-tunneling-based-ultra-energy
Repo
Framework

Hour-Ahead Load Forecasting Using AMI Data


Title	Hour-Ahead Load Forecasting Using AMI Data
Authors	Sarwan Ali, Haris Mansoor, Imdadullah Khan, Naveed Arshad, Muhammad Asad Khan, Safiullah Faizullah
Abstract	Accurate short-term load forecasting is essential for efficient operation of the power sector. Predicting load at a fine granularity such as individual households or buildings is challenging due to higher volatility and uncertainty in the load. In aggregate loads such as at grids level, the inherent stochasticity and fluctuations are averaged-out, the problem becomes substantially easier. We propose an approach for short-term load forecasting at individual consumers (households) level, called Forecasting using Matrix Factorization (FMF). FMF does not use any consumers’ demographic or activity patterns information. Therefore, it can be applied to any locality with the readily available smart meters and weather data. We perform extensive experiments on three benchmark datasets and demonstrate that FMF significantly outperforms the computationally expensive state-of-the-art methods for this problem. We achieve up to 26.5% and 24.4 % improvement in RMSE over Regression Tree and Support Vector Machine, respectively and up to 36% and 73.2% improvement in MAPE over Random Forest and Long Short-Term Memory neural network, respectively.
Tasks	Load Forecasting
Published	2019-12-28
URL	https://arxiv.org/abs/1912.12479v2
PDF	https://arxiv.org/pdf/1912.12479v2.pdf
PWC	https://paperswithcode.com/paper/hour-ahead-load-forecasting-using-ami-data
Repo
Framework

On Neural Phone Recognition of Mixed-Source ECoG Signals


Title	On Neural Phone Recognition of Mixed-Source ECoG Signals
Authors	Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang
Abstract	The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in a simulated cocktail party scenario. The experimental results show that the relative degradation of the NSR system performance when tested in a mixed-source scenario is significantly lower than that of automatic speech recognition (ASR). In this paper, we have significantly enhanced the performance of our recently published framework by using manual alignments for initialization instead of the flat start technique. We have also improved the NSR system performance by accounting for the possible transcription mismatch between the acoustic and neural signals.
Tasks	Speech Recognition
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05869v1
PDF	https://arxiv.org/pdf/1912.05869v1.pdf
PWC	https://paperswithcode.com/paper/on-neural-phone-recognition-of-mixed-source
Repo
Framework

Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin


Title	Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin
Authors	Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi
Abstract	We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $\alpha \cdot \mathrm{OPT}{\gamma} + \epsilon$, where $\mathrm{OPT}{\gamma}$ is the optimal $\gamma$-margin error rate and $\alpha \geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $\alpha \geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $\alpha$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $\alpha = 1.01$-approximate proper learner that uses $O(1/(\epsilon^2\gamma^2))$ samples (which is optimal) and runs in time $\mathrm{poly}(d/\epsilon) \cdot 2^{\tilde{O}(1/\gamma^2)}$. On the negative side, we show that {\em any} constant factor approximate proper learner has runtime $\mathrm{poly}(d/\epsilon) \cdot 2^{(1/\gamma)^{2-o(1)}}$, assuming the Exponential Time Hypothesis.
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11335v1
PDF	https://arxiv.org/pdf/1908.11335v1.pdf
PWC	https://paperswithcode.com/paper/nearly-tight-bounds-for-robust-proper
Repo
Framework

Practical Speech Recognition with HTK


Title	Practical Speech Recognition with HTK
Authors	Zulkarnaen Hatala
Abstract	The practical aspects of developing an Automatic Speech Recognition System (ASR) with HTK are reviewed. Steps are explained concerning hardware, software, libraries, applications and computer programs used. The common procedure to rapidly apply speech recognition system is summarized. The procedure is illustrated, to implement a speech based electrical switch in home automation for the Indonesian language. The main key of the procedure is to match the environment for training and testing using the training data recorded from the testing program, HVite. Often the silence detector of HTK is wrongly triggered by noises because the microphone is too sensitive. This problem is mitigated by simply scaling down the volume. In this sub-word phone-based speech recognition, noise is included in the training database and labelled particularly. Illustration of the procedure is applied to a home automation application. Electrical switches are controlled by Indonesian speech recognizer. The results show 100% command completion rate.
Tasks	Speech Recognition
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02119v1
PDF	https://arxiv.org/pdf/1908.02119v1.pdf
PWC	https://paperswithcode.com/paper/practical-speech-recognition-with-htk
Repo
Framework

A new Edge Detector Based on Parametric Surface Model: Regression Surface Descriptor


Title	A new Edge Detector Based on Parametric Surface Model: Regression Surface Descriptor
Authors	Rémi Cogranne, Rémi Slysz, Laurence Moreau, Houman Borouchaki
Abstract	In this paper we present a new methodology for edge detection in digital images. The first originality of the proposed method is to consider image content as a parametric surface. Then, an original parametric local model of this surface representing image content is proposed. The few parameters involved in the proposed model are shown to be very sensitive to discontinuities in surface which correspond to edges in image content. This naturally leads to the design of an efficient edge detector. Moreover, a thorough analysis of the proposed model also allows us to explain how these parameters can be used to obtain edge descriptors such as orientations and curvatures. In practice, the proposed methodology offers two main advantages. First, it has high customization possibilities in order to be adjusted to a wide range of different problems, from coarse to fine scale edge detection. Second, it is very robust to blurring process and additive noise. Numerical results are presented to emphasis these properties and to confirm efficiency of the proposed method through a comparative study with other edge detectors.
Tasks	Edge Detection
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10235v1
PDF	http://arxiv.org/pdf/1904.10235v1.pdf
PWC	https://paperswithcode.com/paper/a-new-edge-detector-based-on-parametric
Repo
Framework

A Step Towards Exposing Bias in Trained Deep Convolutional Neural Network Models


Title	A Step Towards Exposing Bias in Trained Deep Convolutional Neural Network Models
Authors	Daniel Omeiza
Abstract	We present Smooth Grad-CAM++, a technique which combines two recent techniques: SMOOTHGRAD and Grad-CAM++. Smooth Grad-CAM++ has the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance. We experimented with few images, and we discovered that Smooth Grad-CAM++ produced more visually sharp maps with larger number of salient pixels highlighted in the given input images when compared with other methods. Smooth Grad-CAM++ will give insight into what our deep CNN models (including models trained on medical scan or imagery) learn. Hence informing decisions on creating a representative training set.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.02094v1
PDF	https://arxiv.org/pdf/1912.02094v1.pdf
PWC	https://paperswithcode.com/paper/a-step-towards-exposing-bias-in-trained-deep
Repo
Framework

End-to-End 3D-PointCloud Semantic Segmentation for Autonomous Driving


Title	End-to-End 3D-PointCloud Semantic Segmentation for Autonomous Driving
Authors	Mohammed Abdou, Mahmoud Elkhateeb, Ibrahim Sobh, Ahmad Elsallab
Abstract	3D semantic scene labeling is a fundamental task for Autonomous Driving. Recent work shows the capability of Deep Neural Networks in labeling 3D point sets provided by sensors like LiDAR, and Radar. Imbalanced distribution of classes in the dataset is one of the challenges that face 3D semantic scene labeling task. This leads to misclassifying for the non-dominant classes which suffer from two main problems: a) rare appearance in the dataset, and b) few sensor points reflected from one object of these classes. This paper proposes a Weighted Self-Incremental Transfer Learning as a generalized methodology that solves the imbalanced training dataset problems. It re-weights the components of the loss function computed from individual classes based on their frequencies in the training dataset, and applies Self-Incremental Transfer Learning by running the Neural Network model on non-dominant classes first, then dominant classes one-by-one are added. The experimental results introduce a new 3D point cloud semantic segmentation benchmark for KITTI dataset.
Tasks	Autonomous Driving, Scene Labeling, Semantic Segmentation, Transfer Learning
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10964v1
PDF	https://arxiv.org/pdf/1906.10964v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-3d-pointcloud-semantic
Repo
Framework