Paper Group ANR 347
SANTLR: Speech Annotation Toolkit for Low Resource Languages. Global Guarantees for Blind Demodulation with Generative Priors. Image Resizing by Reconstruction from Deep Features. Towards Explainable AI Planning as a Service. FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger D …
SANTLR: Speech Annotation Toolkit for Low Resource Languages
Title | SANTLR: Speech Annotation Toolkit for Low Resource Languages |
Authors | Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze |
Abstract | While low resource speech recognition has attracted a lot of attention from the speech community, there are a few tools available to facilitate low resource speech collection. In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages. It is a web-based toolkit which allows researchers to easily collect and annotate a corpus of speech in a low resource language. Annotators may use this toolkit for two purposes: transcription or recording. In transcription, annotators would transcribe audio files provided by the researchers; in recording, annotators would record their voice by reading provided texts. We highlight two properties of this toolkit. First, SANTLR has a very user-friendly User Interface (UI). Both researchers and annotators may use this simple web interface to interact. There is no requirement for the annotators to have any expertise in audio or text processing. The toolkit would handle all preprocessing and postprocessing steps. Second, we employ a multi-step ranking mechanism facilitate the annotation process. In particular, the toolkit would give higher priority to utterances which are easier to annotate and are more beneficial to achieving the goal of the annotation, e.g. quickly training an acoustic model. |
Tasks | Speech Recognition |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.01067v1 |
https://arxiv.org/pdf/1908.01067v1.pdf | |
PWC | https://paperswithcode.com/paper/santlr-speech-annotation-toolkit-for-low |
Repo | |
Framework | |
Global Guarantees for Blind Demodulation with Generative Priors
Title | Global Guarantees for Blind Demodulation with Generative Priors |
Authors | Paul Hand, Babhru Joshi |
Abstract | We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, $\mathcal{G}^{(1)}:\mathbb{R}^n\rightarrow\mathbb{R}^\ell$ and $\mathcal{G}^{(2)}:\mathbb{R}^p\rightarrow\mathbb{R}^\ell$. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy $\ell = \Omega(n^2+p^2)$, up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset. |
Tasks | |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12576v1 |
https://arxiv.org/pdf/1905.12576v1.pdf | |
PWC | https://paperswithcode.com/paper/global-guarantees-for-blind-demodulation-with |
Repo | |
Framework | |
Image Resizing by Reconstruction from Deep Features
Title | Image Resizing by Reconstruction from Deep Features |
Authors | Moab Arar, Dov Danon, Daniel Cohen-Or, Ariel Shamir |
Abstract | Traditional image resizing methods usually work in pixel space and use various saliency measures. The challenge is to adjust the image shape while trying to preserve important content. In this paper we perform image resizing in feature space where the deep layers of a neural network contain rich important semantic information. We directly adjust the image feature maps, extracted from a pre-trained classification network, and reconstruct the resized image using a neural-network based optimization. This novel approach leverages the hierarchical encoding of the network, and in particular, the high-level discriminative power of its deeper layers, that recognizes semantic objects and regions and allows maintaining their aspect ratio. Our use of reconstruction from deep features diminishes the artifacts introduced by image-space resizing operators. We evaluate our method on benchmarks, compare to alternative approaches, and demonstrate its strength on challenging images. |
Tasks | |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08475v1 |
http://arxiv.org/pdf/1904.08475v1.pdf | |
PWC | https://paperswithcode.com/paper/image-resizing-by-reconstruction-from-deep |
Repo | |
Framework | |
Towards Explainable AI Planning as a Service
Title | Towards Explainable AI Planning as a Service |
Authors | Michael Cashmore, Anna Collins, Benjamin Krarup, Senka Krivic, Daniele Magazzeni, David Smith |
Abstract | Explainable AI is an important area of research within which Explainable Planning is an emerging topic. In this paper, we argue that Explainable Planning can be designed as a service – that is, as a wrapper around an existing planning system that utilises the existing planner to assist in answering contrastive questions. We introduce a prototype framework to facilitate this, along with some examples of how a planner can be used to address certain types of contrastive questions. We discuss the main advantages and limitations of such an approach and we identify open questions for Explainable Planning as a service that identify several possible research directions. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05059v1 |
https://arxiv.org/pdf/1908.05059v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-explainable-ai-planning-as-a-service |
Repo | |
Framework | |
FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching
Title | FKIMNet: A Finger Dorsal Image Matching Network Comparing Component (Major, Minor and Nail) Matching with Holistic (Finger Dorsal) Matching |
Authors | Daksh Thapar, Gaurav Jaswal, Aditya Nigam |
Abstract | Current finger knuckle image recognition systems, often require users to place fingers’ major or minor joints flatly towards the capturing sensor. To extend these systems for user non-intrusive application scenarios, such as consumer electronics, forensic, defence etc, we suggest matching the full dorsal fingers, rather than the major/ minor region of interest (ROI) alone. In particular, this paper makes a comprehensive study on the comparisons between full finger and fusion of finger ROI’s for finger knuckle image recognition. These experiments suggest that using full-finger, provides a more elegant solution. Addressing the finger matching problem, we propose a CNN (convolutional neural network) which creates a $128$-D feature embedding of an image. It is trained via. triplet loss function, which enforces the L2 distance between the embeddings of the same subject to be approaching zero, whereas the distance between any 2 embeddings of different subjects to be at least a margin. For precise training of the network, we use dynamic adaptive margin, data augmentation, and hard negative mining. In distinguished experiments, the individual performance of finger, as well as weighted sum score level fusion of major knuckle, minor knuckle, and nail modalities have been computed, justifying our assumption to consider full finger as biometrics instead of its counterparts. The proposed method is evaluated using two publicly available finger knuckle image datasets i.e., PolyU FKP dataset and PolyU Contactless FKI Datasets. |
Tasks | Data Augmentation |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01289v1 |
http://arxiv.org/pdf/1904.01289v1.pdf | |
PWC | https://paperswithcode.com/paper/fkimnet-a-finger-dorsal-image-matching |
Repo | |
Framework | |
IMEXnet: A Forward Stable Deep Neural Network
Title | IMEXnet: A Forward Stable Deep Neural Network |
Authors | Eldad Haber, Keegan Lensink, Eran Treister, Lars Ruthotto |
Abstract | Deep convolutional neural networks have revolutionized many machine learning and computer vision tasks, however, some remaining key challenges limit their wider use. These challenges include improving the network’s robustness to perturbations of the input image and the limited ``field of view’’ of convolution operators. We introduce the IMEXnet that addresses these challenges by adapting semi-implicit methods for partial differential equations. Compared to similar explicit networks, such as residual networks, our network is more stable, which has recently shown to reduce the sensitivity to small changes in the input features and improve generalization. The addition of an implicit step connects all pixels in each channel of the image and therefore addresses the field of view problem while still being comparable to standard convolutions in terms of the number of parameters and computational complexity. We also present a new dataset for semantic segmentation and demonstrate the effectiveness of our architecture using the NYU Depth dataset. | |
Tasks | Semantic Segmentation |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02639v2 |
https://arxiv.org/pdf/1903.02639v2.pdf | |
PWC | https://paperswithcode.com/paper/imexnet-a-forward-stable-deep-neural-network |
Repo | |
Framework | |
ESFNet: Efficient Network for Building Extraction from High-Resolution Aerial Images
Title | ESFNet: Efficient Network for Building Extraction from High-Resolution Aerial Images |
Authors | Jingbo Lin, Weipeng Jing, Houbing Song, Guangsheng Chen |
Abstract | Building footprint extraction from high-resolution aerial images is always an essential part of urban dynamic monitoring, planning and management. It has also been a challenging task in remote sensing research. In recent years, deep neural networks have made great achievement in improving accuracy of building extraction from remote sensing imagery. However, most of existing approaches usually require large amount of parameters and floating point operations for high accuracy, it leads to high memory consumption and low inference speed which are harmful to research. In this paper, we proposed a novel efficient network named ESFNet which employs separable factorized residual block and utilizes the dilated convolutions, aiming to preserve slight accuracy loss with low computational cost and memory consumption. Our ESFNet obtains a better trade-off between accuracy and efficiency, it can run at over 100 FPS on single Tesla V100, requires 6x fewer FLOPs and has 18x fewer parameters than state-of-the-art real-time architecture ERFNet while preserving similar accuracy without any additional context module, post-processing and pre-trained scheme. We evaluated our networks on WHU Building Dataset and compared it with other state-of-the-art architectures. The result and comprehensive analysis show that our networks are benefit for efficient remote sensing researches, and the idea can be further extended to other areas. The code is public available at: https://github.com/mrluin/ESFNet-Pytorch |
Tasks | |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12337v2 |
http://arxiv.org/pdf/1903.12337v2.pdf | |
PWC | https://paperswithcode.com/paper/esfnet-efficient-network-for-building |
Repo | |
Framework | |
A Fast and Accurate One-Stage Approach to Visual Grounding
Title | A Fast and Accurate One-Stage Approach to Visual Grounding |
Authors | Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo |
Abstract | We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. The performances of existing propose-and-rank two-stage methods are capped by the quality of the region candidates they propose in the first stage — if none of the candidates could cover the ground truth region, there is no hope in the second stage to rank the right region to the top. To avoid this caveat, we propose a one-stage model that enables end-to-end joint optimization. The main idea is as straightforward as fusing a text query’s embedding into the YOLOv3 object detector, augmented by spatial features so as to account for spatial mentions in the query. Despite being simple, this one-stage approach shows great potential in terms of both accuracy and speed for both phrase localization and referring expression comprehension, according to our experiments. Given these results along with careful investigations into some popular region proposals, we advocate for visual grounding a paradigm shift from the conventional two-stage methods to the one-stage framework. |
Tasks | |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06354v1 |
https://arxiv.org/pdf/1908.06354v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fast-and-accurate-one-stage-approach-to |
Repo | |
Framework | |
Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering
Title | Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering |
Authors | Wenqi Zhu, Yuesheng Zhu, Li Zhong, Shuai Yang |
Abstract | Sparse Subspace Clustering (SSC) is one of the most popular methods for clustering data points into their underlying subspaces. However, SSC may suffer from heavy computational burden. Orthogonal Matching Pursuit applied on SSC accelerates the computation but the trade-off is the loss of clustering accuracy. In this paper, we propose a noise-robust algorithm, Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering (RCOMP-SSC), to improve the clustering accuracy and maintain the low computational time by restricting the number of connections of each data point during the iteration of OMP. Also, we develop a framework of control matrix to realize RCOMP-SCC. And the framework is scalable for other data point selection strategies. Our analysis and experiments on synthetic data and two real-world databases (EYaleB & Usps) demonstrate the superiority of our algorithm compared with other clustering methods in terms of accuracy and computational time. |
Tasks | |
Published | 2019-05-01 |
URL | http://arxiv.org/abs/1905.00420v1 |
http://arxiv.org/pdf/1905.00420v1.pdf | |
PWC | https://paperswithcode.com/paper/restricted-connection-orthogonal-matching |
Repo | |
Framework | |
Multi-vision Attention Networks for On-line Red Jujube Grading
Title | Multi-vision Attention Networks for On-line Red Jujube Grading |
Authors | Xiaoye Sun, Liyan Ma, Gongyan Li |
Abstract | To solve the red jujube classification problem, this paper designs a convolutional neural network model with low computational cost and high classification accuracy. The architecture of the model is inspired by the multi-visual mechanism of the organism and DenseNet. To further improve our model, we add the attention mechanism of SE-Net. We also construct a dataset which contains 23,735 red jujube images captured by a jujube grading system. According to the appearance of the jujube and the characteristics of the grading system, the dataset is divided into four classes: invalid, rotten, wizened and normal. The numerical experiments show that the classification accuracy of our model reaches to 91.89%, which is comparable to DenseNet-121, InceptionV3, InceptionV4, and Inception-ResNet v2. However, our model has real-time performance. |
Tasks | |
Published | 2019-03-31 |
URL | http://arxiv.org/abs/1904.00388v1 |
http://arxiv.org/pdf/1904.00388v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-vision-attention-networks-for-on-line |
Repo | |
Framework | |
Comparing domain wall synapse with other Non Volatile Memory devices for on-chip learning in Analog Hardware Neural Network
Title | Comparing domain wall synapse with other Non Volatile Memory devices for on-chip learning in Analog Hardware Neural Network |
Authors | Divya Kaushik, Utkarsh Singh, Upasana Sahu, Indu Sreedevi, Debanjan Bhowmik |
Abstract | Resistive Random Access Memory (RRAM) and Phase Change Memory (PCM) devices have been popularly used as synapses in crossbar array based analog Neural Network (NN) circuit to achieve more energy and time efficient data classification compared to conventional computers. Here we demonstrate the advantages of recently proposed spin orbit torque driven Domain Wall (DW) device as synapse compared to the RRAM and PCM devices with respect to on-chip learning (training in hardware) in such NN. Synaptic characteristic of DW synapse, obtained by us from micromagnetic modeling, turns out to be much more linear and symmetric (between positive and negative update) than that of RRAM and PCM synapse. This makes design of peripheral analog circuits for on-chip learning much easier in DW synapse based NN compared to that for RRAM and PCM synapses. We next incorporate the DW synapse as a Verilog-A model in the crossbar array based NN circuit we design on SPICE circuit simulator. Successful on-chip learning is demonstrated through SPICE simulations on the popular Fisher’s Iris dataset. Time and energy required for learning turn out to be orders of magnitude lower for DW synapse based NN circuit compared to that for RRAM and PCM synapse based NN circuits. |
Tasks | |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12919v1 |
https://arxiv.org/pdf/1910.12919v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-domain-wall-synapse-with-other-non |
Repo | |
Framework | |
Ranking and synchronization from pairwise measurements via SVD
Title | Ranking and synchronization from pairwise measurements via SVD |
Authors | Alexandre d’Aspremont, Mihai Cucuringu, Hemant Tyagi |
Abstract | Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; ${i,j} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking of $n$ teams (induced by $r$) given a small subset of noisy pairwise rank offsets. We propose a simple SVD-based algorithmic pipeline for both the problem of time synchronization and ranking. We provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise perturbations with outliers, using results from matrix perturbation and random matrix theory. Our theoretical findings are complemented by a detailed set of numerical experiments on both synthetic and real data, showcasing the competitiveness of our proposed algorithms with other state-of-the-art methods. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02746v2 |
https://arxiv.org/pdf/1906.02746v2.pdf | |
PWC | https://paperswithcode.com/paper/ranking-and-synchronization-from-pairwise |
Repo | |
Framework | |
Understanding over-parameterized deep networks by geometrization
Title | Understanding over-parameterized deep networks by geometrization |
Authors | Xiao Dong, Ling Zhou |
Abstract | A complete understanding of the widely used over-parameterized deep networks is a key step for AI. In this work we try to give a geometric picture of over-parameterized deep networks using our geometrization scheme. We show that the Riemannian geometry of network complexity plays a key role in understanding the basic properties of over-parameterizaed deep networks, including the generalization, convergence and parameter sensitivity. We also point out deep networks share lots of similarities with quantum computation systems. This can be regarded as a strong support of our proposal that geometrization is not only the bible for physics, it is also the key idea to understand deep learning systems. |
Tasks | |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03793v1 |
http://arxiv.org/pdf/1902.03793v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-over-parameterized-deep |
Repo | |
Framework | |
A Trainable Multiplication Layer for Auto-correlation and Co-occurrence Extraction
Title | A Trainable Multiplication Layer for Auto-correlation and Co-occurrence Extraction |
Authors | Hideaki Hayashi, Seiichi Uchida |
Abstract | In this paper, we propose a trainable multiplication layer (TML) for a neural network that can be used to calculate the multiplication between the input features. Taking an image as an input, the TML raises each pixel value to the power of a weight and then multiplies them, thereby extracting the higher-order local auto-correlation from the input image. The TML can also be used to extract co-occurrence from the feature map of a convolutional network. The training of the TML is formulated based on backpropagation with constraints to the weights, enabling us to learn discriminative multiplication patterns in an end-to-end manner. In the experiments, the characteristics of the TML are investigated by visualizing learned kernels and the corresponding output features. The applicability of the TML for classification and neural network interpretation is also evaluated using public datasets. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12871v1 |
https://arxiv.org/pdf/1905.12871v1.pdf | |
PWC | https://paperswithcode.com/paper/a-trainable-multiplication-layer-for-auto |
Repo | |
Framework | |
Inducing Sparse Coding and And-Or Grammar from Generator Network
Title | Inducing Sparse Coding and And-Or Grammar from Generator Network |
Authors | Xianglei Xing, Song-Chun Zhu, Ying Nian Wu |
Abstract | We introduce an explainable generative model by applying sparse operation on the feature maps of the generator network. Meaningful hierarchical representations are obtained using the proposed generative model with sparse activations. The convolutional kernels from the bottom layer to the top layer of the generator network can learn primitives such as edges and colors, object parts, and whole objects layer by layer. From the perspective of the generator network, we propose a method for inducing both sparse coding and the AND-OR grammar for images. Experiments show that our method is capable of learning meaningful and explainable hierarchical representations. |
Tasks | |
Published | 2019-01-20 |
URL | http://arxiv.org/abs/1901.11494v1 |
http://arxiv.org/pdf/1901.11494v1.pdf | |
PWC | https://paperswithcode.com/paper/inducing-sparse-coding-and-and-or-grammar |
Repo | |
Framework | |