January 30, 2020

3059 words 15 mins read

Paper Group ANR 347

SANTLR: Speech Annotation Toolkit for Low Resource Languages. Global Guarantees for Blind Demodulation with Generative Priors. Image Resizing by Reconstruction from Deep Features. Towards Explainable AI Planning as a Service. FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger D …

SANTLR: Speech Annotation Toolkit for Low Resource Languages


Title	SANTLR: Speech Annotation Toolkit for Low Resource Languages
Authors	Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze
Abstract	While low resource speech recognition has attracted a lot of attention from the speech community, there are a few tools available to facilitate low resource speech collection. In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages. It is a web-based toolkit which allows researchers to easily collect and annotate a corpus of speech in a low resource language. Annotators may use this toolkit for two purposes: transcription or recording. In transcription, annotators would transcribe audio files provided by the researchers; in recording, annotators would record their voice by reading provided texts. We highlight two properties of this toolkit. First, SANTLR has a very user-friendly User Interface (UI). Both researchers and annotators may use this simple web interface to interact. There is no requirement for the annotators to have any expertise in audio or text processing. The toolkit would handle all preprocessing and postprocessing steps. Second, we employ a multi-step ranking mechanism facilitate the annotation process. In particular, the toolkit would give higher priority to utterances which are easier to annotate and are more beneficial to achieving the goal of the annotation, e.g. quickly training an acoustic model.
Tasks	Speech Recognition
Published	2019-08-02
URL	https://arxiv.org/abs/1908.01067v1
PDF	https://arxiv.org/pdf/1908.01067v1.pdf
PWC	https://paperswithcode.com/paper/santlr-speech-annotation-toolkit-for-low
Repo
Framework


Title	Global Guarantees for Blind Demodulation with Generative Priors
Authors	Paul Hand, Babhru Joshi
Abstract	We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, $\mathcal{G}^{(1)}:\mathbb{R}^n\rightarrow\mathbb{R}^\ell$ and $\mathcal{G}^{(2)}:\mathbb{R}^p\rightarrow\mathbb{R}^\ell$. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy $\ell = \Omega(n^2+p^2)$, up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12576v1
PDF	https://arxiv.org/pdf/1905.12576v1.pdf
PWC	https://paperswithcode.com/paper/global-guarantees-for-blind-demodulation-with
Repo
Framework

Image Resizing by Reconstruction from Deep Features


Title	Image Resizing by Reconstruction from Deep Features
Authors	Moab Arar, Dov Danon, Daniel Cohen-Or, Ariel Shamir
Abstract	Traditional image resizing methods usually work in pixel space and use various saliency measures. The challenge is to adjust the image shape while trying to preserve important content. In this paper we perform image resizing in feature space where the deep layers of a neural network contain rich important semantic information. We directly adjust the image feature maps, extracted from a pre-trained classification network, and reconstruct the resized image using a neural-network based optimization. This novel approach leverages the hierarchical encoding of the network, and in particular, the high-level discriminative power of its deeper layers, that recognizes semantic objects and regions and allows maintaining their aspect ratio. Our use of reconstruction from deep features diminishes the artifacts introduced by image-space resizing operators. We evaluate our method on benchmarks, compare to alternative approaches, and demonstrate its strength on challenging images.
Tasks
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08475v1
PDF	http://arxiv.org/pdf/1904.08475v1.pdf
PWC	https://paperswithcode.com/paper/image-resizing-by-reconstruction-from-deep
Repo
Framework

Towards Explainable AI Planning as a Service


Title	Towards Explainable AI Planning as a Service
Authors	Michael Cashmore, Anna Collins, Benjamin Krarup, Senka Krivic, Daniele Magazzeni, David Smith
Abstract	Explainable AI is an important area of research within which Explainable Planning is an emerging topic. In this paper, we argue that Explainable Planning can be designed as a service – that is, as a wrapper around an existing planning system that utilises the existing planner to assist in answering contrastive questions. We introduce a prototype framework to facilitate this, along with some examples of how a planner can be used to address certain types of contrastive questions. We discuss the main advantages and limitations of such an approach and we identify open questions for Explainable Planning as a service that identify several possible research directions.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05059v1
PDF	https://arxiv.org/pdf/1908.05059v1.pdf
PWC	https://paperswithcode.com/paper/towards-explainable-ai-planning-as-a-service
Repo
Framework

FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching


Title	FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching
Authors	Daksh Thapar, Gaurav Jaswal, Aditya Nigam
Abstract	Current finger knuckle image recognition systems, often require users to place fingers’ major or minor joints flatly towards the capturing sensor. To extend these systems for user non-intrusive application scenarios, such as consumer electronics, forensic, defence etc, we suggest matching the full dorsal fingers, rather than the major/ minor region of interest (ROI) alone. In particular, this paper makes a comprehensive study on the comparisons between full finger and fusion of finger ROI’s for finger knuckle image recognition. These experiments suggest that using full-finger, provides a more elegant solution. Addressing the finger matching problem, we propose a CNN (convolutional neural network) which creates a $128$-D feature embedding of an image. It is trained via. triplet loss function, which enforces the L2 distance between the embeddings of the same subject to be approaching zero, whereas the distance between any 2 embeddings of different subjects to be at least a margin. For precise training of the network, we use dynamic adaptive margin, data augmentation, and hard negative mining. In distinguished experiments, the individual performance of finger, as well as weighted sum score level fusion of major knuckle, minor knuckle, and nail modalities have been computed, justifying our assumption to consider full finger as biometrics instead of its counterparts. The proposed method is evaluated using two publicly available finger knuckle image datasets i.e., PolyU FKP dataset and PolyU Contactless FKI Datasets.
Tasks	Data Augmentation
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01289v1
PDF	http://arxiv.org/pdf/1904.01289v1.pdf
PWC	https://paperswithcode.com/paper/fkimnet-a-finger-dorsal-image-matching
Repo
Framework

IMEXnet: A Forward Stable Deep Neural Network


Title	IMEXnet: A Forward Stable Deep Neural Network
Authors	Eldad Haber, Keegan Lensink, Eran Treister, Lars Ruthotto
Abstract	Deep convolutional neural networks have revolutionized many machine learning and computer vision tasks, however, some remaining key challenges limit their wider use. These challenges include improving the network’s robustness to perturbations of the input image and the limited ``field of view’’ of convolution operators. We introduce the IMEXnet that addresses these challenges by adapting semi-implicit methods for partial differential equations. Compared to similar explicit networks, such as residual networks, our network is more stable, which has recently shown to reduce the sensitivity to small changes in the input features and improve generalization. The addition of an implicit step connects all pixels in each channel of the image and therefore addresses the field of view problem while still being comparable to standard convolutions in terms of the number of parameters and computational complexity. We also present a new dataset for semantic segmentation and demonstrate the effectiveness of our architecture using the NYU Depth dataset. \|
Tasks	Semantic Segmentation
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02639v2
PDF	https://arxiv.org/pdf/1903.02639v2.pdf
PWC	https://paperswithcode.com/paper/imexnet-a-forward-stable-deep-neural-network
Repo
Framework

ESFNet: Efficient Network for Building Extraction from High-Resolution Aerial Images


Title	ESFNet: Efficient Network for Building Extraction from High-Resolution Aerial Images
Authors	Jingbo Lin, Weipeng Jing, Houbing Song, Guangsheng Chen
Abstract	Building footprint extraction from high-resolution aerial images is always an essential part of urban dynamic monitoring, planning and management. It has also been a challenging task in remote sensing research. In recent years, deep neural networks have made great achievement in improving accuracy of building extraction from remote sensing imagery. However, most of existing approaches usually require large amount of parameters and floating point operations for high accuracy, it leads to high memory consumption and low inference speed which are harmful to research. In this paper, we proposed a novel efficient network named ESFNet which employs separable factorized residual block and utilizes the dilated convolutions, aiming to preserve slight accuracy loss with low computational cost and memory consumption. Our ESFNet obtains a better trade-off between accuracy and efficiency, it can run at over 100 FPS on single Tesla V100, requires 6x fewer FLOPs and has 18x fewer parameters than state-of-the-art real-time architecture ERFNet while preserving similar accuracy without any additional context module, post-processing and pre-trained scheme. We evaluated our networks on WHU Building Dataset and compared it with other state-of-the-art architectures. The result and comprehensive analysis show that our networks are benefit for efficient remote sensing researches, and the idea can be further extended to other areas. The code is public available at: https://github.com/mrluin/ESFNet-Pytorch
Tasks
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12337v2
PDF	http://arxiv.org/pdf/1903.12337v2.pdf
PWC	https://paperswithcode.com/paper/esfnet-efficient-network-for-building
Repo
Framework

A Fast and Accurate One-Stage Approach to Visual Grounding


Title	A Fast and Accurate One-Stage Approach to Visual Grounding
Authors	Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo
Abstract	We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank two-stage methods are capped by the quality of the region candidates they propose in the first stage — if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the top. To avoid this caveat, we propose a one-stage model that enables end-to-end joint optimization. The main idea is as straightforward as fusing a text query’s embedding into the YOLOv3 object detector, augmented by spatial features so as to account for spatial mentions in the query. Despite being simple, this one-stage approach shows great potential in terms of both accuracy and speed for both phrase localization and referring expression comprehension, according to our experiments. Given these results along with careful investigations into some popular region proposals, we advocate for visual grounding a paradigm shift from the conventional two-stage methods to the one-stage framework.
Tasks
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06354v1
PDF	https://arxiv.org/pdf/1908.06354v1.pdf
PWC	https://paperswithcode.com/paper/a-fast-and-accurate-one-stage-approach-to
Repo
Framework

Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering


Title	Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering
Authors	Wenqi Zhu, Yuesheng Zhu, Li Zhong, Shuai Yang
Abstract	Sparse Subspace Clustering (SSC) is one of the most popular methods for clustering data points into their underlying subspaces. However, SSC may suffer from heavy computational burden. Orthogonal Matching Pursuit applied on SSC accelerates the computation but the trade-off is the loss of clustering accuracy. In this paper, we propose a noise-robust algorithm, Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering (RCOMP-SSC), to improve the clustering accuracy and maintain the low computational time by restricting the number of connections of each data point during the iteration of OMP. Also, we develop a framework of control matrix to realize RCOMP-SCC. And the framework is scalable for other data point selection strategies. Our analysis and experiments on synthetic data and two real-world databases (EYaleB & Usps) demonstrate the superiority of our algorithm compared with other clustering methods in terms of accuracy and computational time.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00420v1
PDF	http://arxiv.org/pdf/1905.00420v1.pdf
PWC	https://paperswithcode.com/paper/restricted-connection-orthogonal-matching
Repo
Framework

Multi-vision Attention Networks for On-line Red Jujube Grading


Title	Multi-vision Attention Networks for On-line Red Jujube Grading
Authors	Xiaoye Sun, Liyan Ma, Gongyan Li
Abstract	To solve the red jujube classification problem, this paper designs a convolutional neural network model with low computational cost and high classification accuracy. The architecture of the model is inspired by the multi-visual mechanism of the organism and DenseNet. To further improve our model, we add the attention mechanism of SE-Net. We also construct a dataset which contains 23,735 red jujube images captured by a jujube grading system. According to the appearance of the jujube and the characteristics of the grading system, the dataset is divided into four classes: invalid, rotten, wizened and normal. The numerical experiments show that the classification accuracy of our model reaches to 91.89%, which is comparable to DenseNet-121, InceptionV3, InceptionV4, and Inception-ResNet v2. However, our model has real-time performance.
Tasks
Published	2019-03-31
URL	http://arxiv.org/abs/1904.00388v1
PDF	http://arxiv.org/pdf/1904.00388v1.pdf
PWC	https://paperswithcode.com/paper/multi-vision-attention-networks-for-on-line
Repo
Framework

Comparing domain wall synapse with other Non Volatile Memory devices for on-chip learning in Analog Hardware Neural Network


Title	Comparing domain wall synapse with other Non Volatile Memory devices for on-chip learning in Analog Hardware Neural Network
Authors	Divya Kaushik, Utkarsh Singh, Upasana Sahu, Indu Sreedevi, Debanjan Bhowmik
Abstract	Resistive Random Access Memory (RRAM) and Phase Change Memory (PCM) devices have been popularly used as synapses in crossbar array based analog Neural Network (NN) circuit to achieve more energy and time efficient data classification compared to conventional computers. Here we demonstrate the advantages of recently proposed spin orbit torque driven Domain Wall (DW) device as synapse compared to the RRAM and PCM devices with respect to on-chip learning (training in hardware) in such NN. Synaptic characteristic of DW synapse, obtained by us from micromagnetic modeling, turns out to be much more linear and symmetric (between positive and negative update) than that of RRAM and PCM synapse. This makes design of peripheral analog circuits for on-chip learning much easier in DW synapse based NN compared to that for RRAM and PCM synapses. We next incorporate the DW synapse as a Verilog-A model in the crossbar array based NN circuit we design on SPICE circuit simulator. Successful on-chip learning is demonstrated through SPICE simulations on the popular Fisher’s Iris dataset. Time and energy required for learning turn out to be orders of magnitude lower for DW synapse based NN circuit compared to that for RRAM and PCM synapse based NN circuits.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12919v1
PDF	https://arxiv.org/pdf/1910.12919v1.pdf
PWC	https://paperswithcode.com/paper/comparing-domain-wall-synapse-with-other-non
Repo
Framework

Ranking and synchronization from pairwise measurements via SVD


Title	Ranking and synchronization from pairwise measurements via SVD
Authors	Alexandre d’Aspremont, Mihai Cucuringu, Hemant Tyagi
Abstract	Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; ${i,j} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking of $n$ teams (induced by $r$) given a small subset of noisy pairwise rank offsets. We propose a simple SVD-based algorithmic pipeline for both the problem of time synchronization and ranking. We provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise perturbations with outliers, using results from matrix perturbation and random matrix theory. Our theoretical findings are complemented by a detailed set of numerical experiments on both synthetic and real data, showcasing the competitiveness of our proposed algorithms with other state-of-the-art methods.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02746v2
PDF	https://arxiv.org/pdf/1906.02746v2.pdf
PWC	https://paperswithcode.com/paper/ranking-and-synchronization-from-pairwise
Repo
Framework

Understanding over-parameterized deep networks by geometrization


Title	Understanding over-parameterized deep networks by geometrization
Authors	Xiao Dong, Ling Zhou
Abstract	A complete understanding of the widely used over-parameterized deep networks is a key step for AI. In this work we try to give a geometric picture of over-parameterized deep networks using our geometrization scheme. We show that the Riemannian geometry of network complexity plays a key role in understanding the basic properties of over-parameterizaed deep networks, including the generalization, convergence and parameter sensitivity. We also point out deep networks share lots of similarities with quantum computation systems. This can be regarded as a strong support of our proposal that geometrization is not only the bible for physics, it is also the key idea to understand deep learning systems.
Tasks
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03793v1
PDF	http://arxiv.org/pdf/1902.03793v1.pdf
PWC	https://paperswithcode.com/paper/understanding-over-parameterized-deep
Repo
Framework

A Trainable Multiplication Layer for Auto-correlation and Co-occurrence Extraction


Title	A Trainable Multiplication Layer for Auto-correlation and Co-occurrence Extraction
Authors	Hideaki Hayashi, Seiichi Uchida
Abstract	In this paper, we propose a trainable multiplication layer (TML) for a neural network that can be used to calculate the multiplication between the input features. Taking an image as an input, the TML raises each pixel value to the power of a weight and then multiplies them, thereby extracting the higher-order local auto-correlation from the input image. The TML can also be used to extract co-occurrence from the feature map of a convolutional network. The training of the TML is formulated based on backpropagation with constraints to the weights, enabling us to learn discriminative multiplication patterns in an end-to-end manner. In the experiments, the characteristics of the TML are investigated by visualizing learned kernels and the corresponding output features. The applicability of the TML for classification and neural network interpretation is also evaluated using public datasets.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12871v1
PDF	https://arxiv.org/pdf/1905.12871v1.pdf
PWC	https://paperswithcode.com/paper/a-trainable-multiplication-layer-for-auto
Repo
Framework

Inducing Sparse Coding and And-Or Grammar from Generator Network


Title	Inducing Sparse Coding and And-Or Grammar from Generator Network
Authors	Xianglei Xing, Song-Chun Zhu, Ying Nian Wu
Abstract	We introduce an explainable generative model by applying sparse operation on the feature maps of the generator network. Meaningful hierarchical representations are obtained using the proposed generative model with sparse activations. The convolutional kernels from the bottom layer to the top layer of the generator network can learn primitives such as edges and colors, object parts, and whole objects layer by layer. From the perspective of the generator network, we propose a method for inducing both sparse coding and the AND-OR grammar for images. Experiments show that our method is capable of learning meaningful and explainable hierarchical representations.
Tasks
Published	2019-01-20
URL	http://arxiv.org/abs/1901.11494v1
PDF	http://arxiv.org/pdf/1901.11494v1.pdf
PWC	https://paperswithcode.com/paper/inducing-sparse-coding-and-and-or-grammar
Repo
Framework