January 27, 2020

3301 words 16 mins read

Paper Group ANR 1099

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations. Variance Reduction for Matrix Games. A Long-Short Demands-Aware Model for Next-Item Recommendation. A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities. Enhancing Learnability of classification algorithms …

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations


Title	Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Authors	Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré
Abstract	Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural priors they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the $O(N \log N)$ Cooley-Tukey FFT algorithm to machine precision, for dimensions $N$ up to $1024$. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters.
Tasks
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05895v1
PDF	http://arxiv.org/pdf/1903.05895v1.pdf
PWC	https://paperswithcode.com/paper/learning-fast-algorithms-for-linear
Repo
Framework

Variance Reduction for Matrix Games


Title	Variance Reduction for Matrix Games
Authors	Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian
Abstract	We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $\epsilon \le \sqrt{n/\mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $\ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovski’s “conceptual prox-method” and a novel reduced-variance gradient estimator based on “sampling from the difference” between the current iterate and a reference point.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.02056v2
PDF	https://arxiv.org/pdf/1907.02056v2.pdf
PWC	https://paperswithcode.com/paper/variance-reduction-for-matrix-games
Repo
Framework

A Long-Short Demands-Aware Model for Next-Item Recommendation


Title	A Long-Short Demands-Aware Model for Next-Item Recommendation
Authors	Ting Bai, Pan Du, Wayne Xin Zhao, Ji-Rong Wen, Jian-Yun Nie
Abstract	Recommending the right products is the central problem in recommender systems, but the right products should also be recommended at the right time to meet the demands of users, so as to maximize their values. Users’ demands, implying strong purchase intents, can be the most useful way to promote products sales if well utilized. Previous recommendation models mainly focused on user’s general interests to find the right products. However, the aspect of meeting users’ demands at the right time has been much less explored. To address this problem, we propose a novel Long-Short Demands-aware Model (LSDM), in which both user’s interests towards items and user’s demands over time are incorporated. We summarize two aspects: termed as long-time demands (e.g., purchasing the same product repetitively showing a long-time persistent interest) and short-time demands (e.g., co-purchase like buying paintbrushes after pigments). To utilize such long-short demands of users, we create different clusters to group the successive product purchases together according to different time spans, and use recurrent neural networks to model each sequence of clusters at a time scale. The long-short purchase demands with multi-time scales are finally aggregated by joint learning strategies. Experimental results on three real-world commerce datasets demonstrate the effectiveness of our model for next-item recommendation, showing the usefulness of modeling users’ long-short purchase demands of items with multi-time scales.
Tasks	Recommendation Systems
Published	2019-02-12
URL	http://arxiv.org/abs/1903.00066v1
PDF	http://arxiv.org/pdf/1903.00066v1.pdf
PWC	https://paperswithcode.com/paper/a-long-short-demands-aware-model-for-next
Repo
Framework

A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities


Title	A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities
Authors	Sayyed-Ali Hossayni, Yousef Alizadeh-Q, Vahid Tavana, Seyed M. Hosseini Nejad, Mohammad-R Akbarzadeh-T, Esteve Del Acebo, Josep Lluis De la Rosa i Esteva, Enrico Grosso, Massimo Tistarelli, Przemyslaw Kudlacik
Abstract	Forensic Document Analysis (FDA) addresses the problem of finding the authorship of a given document. Identification of the document writer via a number of its modalities (e.g. handwriting, signature, linguistic writing style (i.e. stylome), etc.) has been studied in the FDA state-of-the-art. But, no research is conducted on the fusion of stylome and signature modalities. In this paper, we propose such a bimodal FDA system (which has vast applications in judicial, police-related, and historical documents analysis) with a focus on time-complexity. The proposed bimodal system can be trained and tested with linear time complexity. For this purpose, we first revisit Multinomial Na"ive Bayes (MNB), as the best state-of-the-art linear-complexity authorship attribution system and, then, prove its superior accuracy to the well-known linear-complexity classifiers in the state-of-the-art. Then, we propose a fuzzy version of MNB for being fused with a state-of-the-art well-known linear-complexity fuzzy signature recognition system. For the evaluation purposes, we construct a chimeric dataset, composed of signatures and textual contents of different letters. Despite its linear-complexity, the proposed multi-biometric system is proven to meaningfully improve its state-of-the-art unimodal counterparts, regarding the accuracy, F-Score, Detection Error Trade-off (DET), Cumulative Match Characteristics (CMC), and Match Score Histograms (MSH) evaluation metrics.
Tasks
Published	2019-01-26
URL	http://arxiv.org/abs/1902.02176v1
PDF	http://arxiv.org/pdf/1902.02176v1.pdf
PWC	https://paperswithcode.com/paper/a-linear-complexity-multi-biometric-forensic
Repo
Framework

Enhancing Learnability of classification algorithms using simple data preprocessing in fMRI scans of Alzheimer’s disease


Title	Enhancing Learnability of classification algorithms using simple data preprocessing in fMRI scans of Alzheimer’s disease
Authors	Rishu Garg, Rekh Ram Janghel, Yogesh Rathore
Abstract	Alzheimer’s Disease (AD) is the most common type of dementia. In all leading countries, it is one of the primary reasons of death in senior citizens. Currently, it is diagnosed by calculating the MSME score and by the manual study of MRI Scan. Also, different machine learning methods are utilized for automatic diagnosis but existing has some limitations in terms of accuracy. In this paper, we have proposed some novel preprocessing techniques that have significantly increased the accuracy and at the same time decreased the training time of various classification algorithms. First, we have converted the ADNI dataset which was in 4D format into 2D form. We have also mitigated the computation costs by reducing the parameters of the input dataset while preserving important and relevant data. We have achieved this by using different preprocessing steps like grayscale image conversion, Histogram equalization and selective clipping of dataset. We observed a highest accuracy of 97.52% and a sensitivity of 97.6% in our testing dataset.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04453v1
PDF	https://arxiv.org/pdf/1912.04453v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-learnability-of-classification
Repo
Framework

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes


Title	AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes
Authors	Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier
Abstract	This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.
Tasks	Injury Prediction
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05972v2
PDF	https://arxiv.org/pdf/1908.05972v2.pdf
PWC	https://paperswithcode.com/paper/ai-predicts-independent-construction-safety
Repo
Framework

A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data


Title	A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data
Authors	Massimiliano Gargiulo, Domenico Antonio Giuseppe Dell’Aglio, Antonio Iodice, Daniele Riccio, Giuseppe Ruello
Abstract	Remote Sensing applications can benefit from a relatively fine spatial resolution multispectral (MS) images and a high revisit frequency ensured by the twin satellites Sentinel-2. Unfortunately, only four out of thirteen bands are provided at the highest resolution of 10 meters, and the others at 20 or 60 meters. For instance the Short-Wave Infrared (SWIR) bands, provided at 20 meters, are very useful to detect active fires. Aiming to a more detailed Active Fire Detection (AFD) maps, we propose a super-resolution data fusion method based on Convolutional Neural Network (CNN) to move towards the 10-m spatial resolution the SWIR bands. The proposed CNN-based solution achieves better results than alternative methods in terms of some accuracy metrics. Moreover we test the super-resolved bands from an application point of view by monitoring active fire through classic indices. Advantages and limits of our proposed approach are validated on specific geographical area (the mount Vesuvius, close to Naples) that was damaged by widespread fires during the summer of 2017.
Tasks	Accuracy Metrics, Super-Resolution
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10413v1
PDF	https://arxiv.org/pdf/1906.10413v1.pdf
PWC	https://paperswithcode.com/paper/a-cnn-based-super-resolution-technique-for
Repo
Framework

Compression of Acoustic Event Detection Models With Quantized Distillation


Title	Compression of Acoustic Event Detection Models With Quantized Distillation
Authors	Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Abstract	Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems. Recently deep neural network significantly advances this field and reduces detection errors to a large scale. However how to efficiently execute deep models in AED has received much less attention. Meanwhile state-of-the-art AED models are based on large deep models, which are computational demanding and challenging to deploy on devices with constrained computational resources. In this paper, we present a simple yet effective compression approach which jointly leverages knowledge distillation and quantization to compress larger network (teacher model) into compact network (student model). Experimental results show proposed technique not only lowers error rate of original compact network by 15% through distillation but also further reduces its model size to a large extent (2% of teacher, 12% of full-precision student) through quantization.
Tasks	Quantization
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00873v1
PDF	https://arxiv.org/pdf/1907.00873v1.pdf
PWC	https://paperswithcode.com/paper/compression-of-acoustic-event-detection-1
Repo
Framework

BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization


Title	BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization
Authors	Il Yong Chun, Xuehang Zheng, Yong Long, Jeffrey A. Fessler
Abstract	Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents the convergence property of the modified BCD-Net. Numerical results with phantom data show that applying faster numerical solvers to model-based image reconstruction (MBIR) modules of BCD-Net leads to faster and more accurate BCD-Net; BCD-Net significantly improves the reconstruction accuracy, compared to the state-of-the-art MBIR method using learned transforms; BCD-Net achieves better image quality, compared to a state-of-the-art iterative NN architecture, ADMM-Net. Numerical results with clinical data show that BCD-Net generalizes significantly better than a state-of-the-art deep (non-iterative) regression NN, FBPConvNet, that lacks MBIR modules.
Tasks	Computed Tomography (CT), Image Reconstruction
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01287v1
PDF	https://arxiv.org/pdf/1908.01287v1.pdf
PWC	https://paperswithcode.com/paper/bcd-net-for-low-dose-ct-reconstruction
Repo
Framework

Robustness-Driven Exploration with Probabilistic Metric Temporal Logic


Title	Robustness-Driven Exploration with Probabilistic Metric Temporal Logic
Authors	Xiaotian Liu, Pengyi Shi, Sarra Alqahtani, Victor Paúl Pauca, Miles Silman
Abstract	The ability to perform autonomous exploration is essential for unmanned aerial vehicles (UAV) operating in unstructured or unknown environments where it is hard or even impossible to describe the environment beforehand. However, algorithms for autonomous exploration often focus on optimizing time and coverage in a greedy fashion. That type of exploration can collect irrelevant data and wastes time navigating areas with no important information. In this paper, we propose a method for exploiting the discovered knowledge about the environment while exploring it by relying on a theory of robustness based on Probabilistic Metric Temporal Logic (P-MTL) as applied to offline verification and online control of hybrid systems. By maximizing the satisfaction of the predefined P-MTL specifications of the exploration problem, the robustness values guide the UAV towards areas with more interesting information to gain. We use Markov Chain Monte Carlo to solve the P-MTL constraints. We demonstrate the effectiveness of the proposed approach by simulating autonomous exploration over Amazonian rainforest where our approach is used to detect areas occupied by illegal Artisanal Small-scale Gold Mining (ASGM) activities. The results show that our approach outperform a greedy exploration approach (Autonomous Exploration Planner) by 38% in terms of ASGM coverage.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01704v1
PDF	https://arxiv.org/pdf/1912.01704v1.pdf
PWC	https://paperswithcode.com/paper/robustness-driven-exploration-with
Repo
Framework

Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes


Title	Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes
Authors	Yunan Wu, Lan Wang
Abstract	Recently, there has been growing interest in estimating optimal treatment regimes which are individualized decision rules that can achieve maximal average outcomes. This paper considers the problem of inference for optimal treatment regimes in the model-free setting, where the specification of an outcome regression model is not needed. Existing model-free estimators are usually not suitable for the purpose of inference because they either have nonstandard asymptotic distributions, or are designed to achieve fisher-consistent classification performance. This paper first studies a smoothed robust estimator that directly targets estimating the parameters corresponding to the Bayes decision rule for estimating the optimal treatment regime. This estimator is shown to have an asymptotic normal distribution. Furthermore, it is proved that a resampling procedure provides asymptotically accurate inference for both the parameters indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11043v1
PDF	https://arxiv.org/pdf/1911.11043v1.pdf
PWC	https://paperswithcode.com/paper/resampling-based-confidence-intervals-for
Repo
Framework

DISCo: Deep learning, Instance Segmentation, and Correlations for cell segmentation in calcium imaging videos


Title	DISCo: Deep learning, Instance Segmentation, and Correlations for cell segmentation in calcium imaging videos
Authors	Elke Kirschbaum, Alberto Bailoni, Fred A. Hamprecht
Abstract	Calcium imaging is one of the most important tools in neurophysiology as it enables the observation of neuronal activity for hundreds of cells in parallel and at single-cell resolution. In order to use the data gained with calcium imaging, it is necessary to extract individual cells and their activity from the recordings. We present DISCo, a novel approach for the cell segmentation in calcium imaging videos. We use temporal information from the recordings in a computationally efficient way by computing correlations between pixels and combine it with shape-based information to identify active as well as non-active cells. We first learn to predict whether two pixels belong to the same cell; this information is summarized in an undirected, edge-weighted grid graph which we then partition. In so doing, we approximately solve the NP-hard correlation clustering problem with a recently proposed greedy algorithm. Evaluating our method on the Neurofinder public benchmark shows that DISCo outperforms all existing models trained on these datasets.
Tasks	Cell Segmentation, Instance Segmentation, Semantic Segmentation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07957v3
PDF	https://arxiv.org/pdf/1908.07957v3.pdf
PWC	https://paperswithcode.com/paper/disco-for-the-cia-deep-learning-instance
Repo
Framework

G$^{3}$AN: Disentangling Appearance and Motion for Video Generation


Title	G$^{3}$AN: Disentangling Appearance and Motion for Video Generation
Authors	Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva
Abstract	Creating realistic human videos entails the challenge of being able to simultaneously generate both appearance, as well as motion. To tackle this challenge, we introduce G$^{3}$AN, a novel spatio-temporal generative model, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. The latter is achieved by decomposing appearance and motion in a three-stream Generator, where the main stream aims to model spatio-temporal consistency, whereas the two auxiliary streams augment the main stream with multi-scale appearance and motion features, respectively. An extensive quantitative and qualitative analysis shows that our model systematically and significantly outperforms state-of-the-art methods on the facial expression datasets MUG and UvA-NEMO, as well as the Weizmann and UCF101 datasets on human action. Additional analysis on the learned latent representations confirms the successful decomposition of appearance and motion. Source code and pre-trained models are publicly available.
Tasks	Video Generation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05523v2
PDF	https://arxiv.org/pdf/1912.05523v2.pdf
PWC	https://paperswithcode.com/paper/mathbfg3an-this-video-does-not-exist
Repo
Framework

An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR


Title	An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR
Authors	Luca Pasa, Giovanni Morrone, Leonardo Badino
Abstract	In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audio-visual speech enhancement and phone recognition respectively. Then, we studied how the two models interact, and how to train them jointly affects the final result. We analyzed different training strategies that reveal some interesting and unexpected behaviors. The experiments show that during optimization of the ASR task the speech enhancement capability of the model significantly decreases and vice-versa. Nevertheless the joint optimization of the two tasks shows a remarkable drop of the Phone Error Rate (PER) compared to the audio-visual baseline models trained only to perform phone recognition. We analyzed the behaviors of the proposed models by using two limited-size datasets, and in particular we used the mixed-speech versions of GRID and TCD-TIMIT.
Tasks	Speech Enhancement
Published	2019-04-16
URL	https://arxiv.org/abs/1904.08248v2
PDF	https://arxiv.org/pdf/1904.08248v2.pdf
PWC	https://paperswithcode.com/paper/joined-audio-visual-speech-enhancement-and
Repo
Framework

Reinforcement Learning for Nested Polar Code Construction


Title	Reinforcement Learning for Nested Polar Code Construction
Authors	Lingchen Huang, Huazi Zhang, Rong Li, Yiqun Ge, Jun Wang
Abstract	In this paper, we model nested polar code construction as a Markov decision process (MDP), and tackle it with advanced reinforcement learning (RL) techniques. First, an MDP environment with state, action, and reward is defined in the context of polar coding. Specifically, a state represents the construction of an $(N,K)$ polar code, an action specifies its reduction to an $(N,K-1)$ subcode, and reward is the decoding performance. A neural network architecture consisting of both policy and value networks is proposed to generate actions based on the observed states, aiming at maximizing the overall rewards. A loss function is defined to trade off between exploitation and exploration. To further improve learning efficiency and quality, an `integrated learning’ paradigm is proposed. It first employs a genetic algorithm to generate a population of (sub-)optimal polar codes for each $(N,K)$, and then uses them as prior knowledge to refine the policy in RL. Such a paradigm is shown to accelerate the training process, and converge at better performances. Simulation results show that the proposed learning-based polar constructions achieve comparable, or even better, performances than the state of the art under successive cancellation list (SCL) decoders. Last but not least, this is achieved without exploiting any expert knowledge from polar coding theory in the learning algorithms. \|
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07511v2
PDF	https://arxiv.org/pdf/1904.07511v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-nested-polar-code
Repo
Framework