Paper Group ANR 1099
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations. Variance Reduction for Matrix Games. A Long-Short Demands-Aware Model for Next-Item Recommendation. A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities. Enhancing Learnability of classification algorithms …
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Title | Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations |
Authors | Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré |
Abstract | Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural priors they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the $O(N \log N)$ Cooley-Tukey FFT algorithm to machine precision, for dimensions $N$ up to $1024$. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters. |
Tasks | |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.05895v1 |
http://arxiv.org/pdf/1903.05895v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-fast-algorithms-for-linear |
Repo | |
Framework | |
Variance Reduction for Matrix Games
Title | Variance Reduction for Matrix Games |
Authors | Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian |
Abstract | We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $\epsilon \le \sqrt{n/\mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $\ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovski’s “conceptual prox-method” and a novel reduced-variance gradient estimator based on “sampling from the difference” between the current iterate and a reference point. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02056v2 |
https://arxiv.org/pdf/1907.02056v2.pdf | |
PWC | https://paperswithcode.com/paper/variance-reduction-for-matrix-games |
Repo | |
Framework | |
A Long-Short Demands-Aware Model for Next-Item Recommendation
Title | A Long-Short Demands-Aware Model for Next-Item Recommendation |
Authors | Ting Bai, Pan Du, Wayne Xin Zhao, Ji-Rong Wen, Jian-Yun Nie |
Abstract | Recommending the right products is the central problem in recommender systems, but the right products should also be recommended at the right time to meet the demands of users, so as to maximize their values. Users’ demands, implying strong purchase intents, can be the most useful way to promote products sales if well utilized. Previous recommendation models mainly focused on user’s general interests to find the right products. However, the aspect of meeting users’ demands at the right time has been much less explored. To address this problem, we propose a novel Long-Short Demands-aware Model (LSDM), in which both user’s interests towards items and user’s demands over time are incorporated. We summarize two aspects: termed as long-time demands (e.g., purchasing the same product repetitively showing a long-time persistent interest) and short-time demands (e.g., co-purchase like buying paintbrushes after pigments). To utilize such long-short demands of users, we create different clusters to group the successive product purchases together according to different time spans, and use recurrent neural networks to model each sequence of clusters at a time scale. The long-short purchase demands with multi-time scales are finally aggregated by joint learning strategies. Experimental results on three real-world commerce datasets demonstrate the effectiveness of our model for next-item recommendation, showing the usefulness of modeling users’ long-short purchase demands of items with multi-time scales. |
Tasks | Recommendation Systems |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1903.00066v1 |
http://arxiv.org/pdf/1903.00066v1.pdf | |
PWC | https://paperswithcode.com/paper/a-long-short-demands-aware-model-for-next |
Repo | |
Framework | |
A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities
Title | A Linear-complexity Multi-biometric Forensic Document Analysis System, by Fusing the Stylome and Signature Modalities |
Authors | Sayyed-Ali Hossayni, Yousef Alizadeh-Q, Vahid Tavana, Seyed M. Hosseini Nejad, Mohammad-R Akbarzadeh-T, Esteve Del Acebo, Josep Lluis De la Rosa i Esteva, Enrico Grosso, Massimo Tistarelli, Przemyslaw Kudlacik |
Abstract | Forensic Document Analysis (FDA) addresses the problem of finding the authorship of a given document. Identification of the document writer via a number of its modalities (e.g. handwriting, signature, linguistic writing style (i.e. stylome), etc.) has been studied in the FDA state-of-the-art. But, no research is conducted on the fusion of stylome and signature modalities. In this paper, we propose such a bimodal FDA system (which has vast applications in judicial, police-related, and historical documents analysis) with a focus on time-complexity. The proposed bimodal system can be trained and tested with linear time complexity. For this purpose, we first revisit Multinomial Na"ive Bayes (MNB), as the best state-of-the-art linear-complexity authorship attribution system and, then, prove its superior accuracy to the well-known linear-complexity classifiers in the state-of-the-art. Then, we propose a fuzzy version of MNB for being fused with a state-of-the-art well-known linear-complexity fuzzy signature recognition system. For the evaluation purposes, we construct a chimeric dataset, composed of signatures and textual contents of different letters. Despite its linear-complexity, the proposed multi-biometric system is proven to meaningfully improve its state-of-the-art unimodal counterparts, regarding the accuracy, F-Score, Detection Error Trade-off (DET), Cumulative Match Characteristics (CMC), and Match Score Histograms (MSH) evaluation metrics. |
Tasks | |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1902.02176v1 |
http://arxiv.org/pdf/1902.02176v1.pdf | |
PWC | https://paperswithcode.com/paper/a-linear-complexity-multi-biometric-forensic |
Repo | |
Framework | |
Enhancing Learnability of classification algorithms using simple data preprocessing in fMRI scans of Alzheimer’s disease
Title | Enhancing Learnability of classification algorithms using simple data preprocessing in fMRI scans of Alzheimer’s disease |
Authors | Rishu Garg, Rekh Ram Janghel, Yogesh Rathore |
Abstract | Alzheimer’s Disease (AD) is the most common type of dementia. In all leading countries, it is one of the primary reasons of death in senior citizens. Currently, it is diagnosed by calculating the MSME score and by the manual study of MRI Scan. Also, different machine learning methods are utilized for automatic diagnosis but existing has some limitations in terms of accuracy. In this paper, we have proposed some novel preprocessing techniques that have significantly increased the accuracy and at the same time decreased the training time of various classification algorithms. First, we have converted the ADNI dataset which was in 4D format into 2D form. We have also mitigated the computation costs by reducing the parameters of the input dataset while preserving important and relevant data. We have achieved this by using different preprocessing steps like grayscale image conversion, Histogram equalization and selective clipping of dataset. We observed a highest accuracy of 97.52% and a sensitivity of 97.6% in our testing dataset. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04453v1 |
https://arxiv.org/pdf/1912.04453v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-learnability-of-classification |
Repo | |
Framework | |
AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes
Title | AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes |
Authors | Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier |
Abstract | This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement. |
Tasks | Injury Prediction |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05972v2 |
https://arxiv.org/pdf/1908.05972v2.pdf | |
PWC | https://paperswithcode.com/paper/ai-predicts-independent-construction-safety |
Repo | |
Framework | |
A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data
Title | A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data |
Authors | Massimiliano Gargiulo, Domenico Antonio Giuseppe Dell’Aglio, Antonio Iodice, Daniele Riccio, Giuseppe Ruello |
Abstract | Remote Sensing applications can benefit from a relatively fine spatial resolution multispectral (MS) images and a high revisit frequency ensured by the twin satellites Sentinel-2. Unfortunately, only four out of thirteen bands are provided at the highest resolution of 10 meters, and the others at 20 or 60 meters. For instance the Short-Wave Infrared (SWIR) bands, provided at 20 meters, are very useful to detect active fires. Aiming to a more detailed Active Fire Detection (AFD) maps, we propose a super-resolution data fusion method based on Convolutional Neural Network (CNN) to move towards the 10-m spatial resolution the SWIR bands. The proposed CNN-based solution achieves better results than alternative methods in terms of some accuracy metrics. Moreover we test the super-resolved bands from an application point of view by monitoring active fire through classic indices. Advantages and limits of our proposed approach are validated on specific geographical area (the mount Vesuvius, close to Naples) that was damaged by widespread fires during the summer of 2017. |
Tasks | Accuracy Metrics, Super-Resolution |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10413v1 |
https://arxiv.org/pdf/1906.10413v1.pdf | |
PWC | https://paperswithcode.com/paper/a-cnn-based-super-resolution-technique-for |
Repo | |
Framework | |
Compression of Acoustic Event Detection Models With Quantized Distillation
Title | Compression of Acoustic Event Detection Models With Quantized Distillation |
Authors | Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang |
Abstract | Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems. Recently deep neural network significantly advances this field and reduces detection errors to a large scale. However how to efficiently execute deep models in AED has received much less attention. Meanwhile state-of-the-art AED models are based on large deep models, which are computational demanding and challenging to deploy on devices with constrained computational resources. In this paper, we present a simple yet effective compression approach which jointly leverages knowledge distillation and quantization to compress larger network (teacher model) into compact network (student model). Experimental results show proposed technique not only lowers error rate of original compact network by 15% through distillation but also further reduces its model size to a large extent (2% of teacher, 12% of full-precision student) through quantization. |
Tasks | Quantization |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00873v1 |
https://arxiv.org/pdf/1907.00873v1.pdf | |
PWC | https://paperswithcode.com/paper/compression-of-acoustic-event-detection-1 |
Repo | |
Framework | |
BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization
Title | BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization |
Authors | Il Yong Chun, Xuehang Zheng, Yong Long, Jeffrey A. Fessler |
Abstract | Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents the convergence property of the modified BCD-Net. Numerical results with phantom data show that applying faster numerical solvers to model-based image reconstruction (MBIR) modules of BCD-Net leads to faster and more accurate BCD-Net; BCD-Net significantly improves the reconstruction accuracy, compared to the state-of-the-art MBIR method using learned transforms; BCD-Net achieves better image quality, compared to a state-of-the-art iterative NN architecture, ADMM-Net. Numerical results with clinical data show that BCD-Net generalizes significantly better than a state-of-the-art deep (non-iterative) regression NN, FBPConvNet, that lacks MBIR modules. |
Tasks | Computed Tomography (CT), Image Reconstruction |
Published | 2019-08-04 |
URL | https://arxiv.org/abs/1908.01287v1 |
https://arxiv.org/pdf/1908.01287v1.pdf | |
PWC | https://paperswithcode.com/paper/bcd-net-for-low-dose-ct-reconstruction |
Repo | |
Framework | |
Robustness-Driven Exploration with Probabilistic Metric Temporal Logic
Title | Robustness-Driven Exploration with Probabilistic Metric Temporal Logic |
Authors | Xiaotian Liu, Pengyi Shi, Sarra Alqahtani, Victor Paúl Pauca, Miles Silman |
Abstract | The ability to perform autonomous exploration is essential for unmanned aerial vehicles (UAV) operating in unstructured or unknown environments where it is hard or even impossible to describe the environment beforehand. However, algorithms for autonomous exploration often focus on optimizing time and coverage in a greedy fashion. That type of exploration can collect irrelevant data and wastes time navigating areas with no important information. In this paper, we propose a method for exploiting the discovered knowledge about the environment while exploring it by relying on a theory of robustness based on Probabilistic Metric Temporal Logic (P-MTL) as applied to offline verification and online control of hybrid systems. By maximizing the satisfaction of the predefined P-MTL specifications of the exploration problem, the robustness values guide the UAV towards areas with more interesting information to gain. We use Markov Chain Monte Carlo to solve the P-MTL constraints. We demonstrate the effectiveness of the proposed approach by simulating autonomous exploration over Amazonian rainforest where our approach is used to detect areas occupied by illegal Artisanal Small-scale Gold Mining (ASGM) activities. The results show that our approach outperform a greedy exploration approach (Autonomous Exploration Planner) by 38% in terms of ASGM coverage. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01704v1 |
https://arxiv.org/pdf/1912.01704v1.pdf | |
PWC | https://paperswithcode.com/paper/robustness-driven-exploration-with |
Repo | |
Framework | |
Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes
Title | Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes |
Authors | Yunan Wu, Lan Wang |
Abstract | Recently, there has been growing interest in estimating optimal treatment regimes which are individualized decision rules that can achieve maximal average outcomes. This paper considers the problem of inference for optimal treatment regimes in the model-free setting, where the specification of an outcome regression model is not needed. Existing model-free estimators are usually not suitable for the purpose of inference because they either have nonstandard asymptotic distributions, or are designed to achieve fisher-consistent classification performance. This paper first studies a smoothed robust estimator that directly targets estimating the parameters corresponding to the Bayes decision rule for estimating the optimal treatment regime. This estimator is shown to have an asymptotic normal distribution. Furthermore, it is proved that a resampling procedure provides asymptotically accurate inference for both the parameters indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11043v1 |
https://arxiv.org/pdf/1911.11043v1.pdf | |
PWC | https://paperswithcode.com/paper/resampling-based-confidence-intervals-for |
Repo | |
Framework | |
DISCo: Deep learning, Instance Segmentation, and Correlations for cell segmentation in calcium imaging videos
Title | DISCo: Deep learning, Instance Segmentation, and Correlations for cell segmentation in calcium imaging videos |
Authors | Elke Kirschbaum, Alberto Bailoni, Fred A. Hamprecht |
Abstract | Calcium imaging is one of the most important tools in neurophysiology as it enables the observation of neuronal activity for hundreds of cells in parallel and at single-cell resolution. In order to use the data gained with calcium imaging, it is necessary to extract individual cells and their activity from the recordings. We present DISCo, a novel approach for the cell segmentation in calcium imaging videos. We use temporal information from the recordings in a computationally efficient way by computing correlations between pixels and combine it with shape-based information to identify active as well as non-active cells. We first learn to predict whether two pixels belong to the same cell; this information is summarized in an undirected, edge-weighted grid graph which we then partition. In so doing, we approximately solve the NP-hard correlation clustering problem with a recently proposed greedy algorithm. Evaluating our method on the Neurofinder public benchmark shows that DISCo outperforms all existing models trained on these datasets. |
Tasks | Cell Segmentation, Instance Segmentation, Semantic Segmentation |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07957v3 |
https://arxiv.org/pdf/1908.07957v3.pdf | |
PWC | https://paperswithcode.com/paper/disco-for-the-cia-deep-learning-instance |
Repo | |
Framework | |
G$^{3}$AN: Disentangling Appearance and Motion for Video Generation
Title | G$^{3}$AN: Disentangling Appearance and Motion for Video Generation |
Authors | Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva |
Abstract | Creating realistic human videos entails the challenge of being able to simultaneously generate both appearance, as well as motion. To tackle this challenge, we introduce G$^{3}$AN, a novel spatio-temporal generative model, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. The latter is achieved by decomposing appearance and motion in a three-stream Generator, where the main stream aims to model spatio-temporal consistency, whereas the two auxiliary streams augment the main stream with multi-scale appearance and motion features, respectively. An extensive quantitative and qualitative analysis shows that our model systematically and significantly outperforms state-of-the-art methods on the facial expression datasets MUG and UvA-NEMO, as well as the Weizmann and UCF101 datasets on human action. Additional analysis on the learned latent representations confirms the successful decomposition of appearance and motion. Source code and pre-trained models are publicly available. |
Tasks | Video Generation |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05523v2 |
https://arxiv.org/pdf/1912.05523v2.pdf | |
PWC | https://paperswithcode.com/paper/mathbfg3an-this-video-does-not-exist |
Repo | |
Framework | |
An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR
Title | An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR |
Authors | Luca Pasa, Giovanni Morrone, Leonardo Badino |
Abstract | In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audio-visual speech enhancement and phone recognition respectively. Then, we studied how the two models interact, and how to train them jointly affects the final result. We analyzed different training strategies that reveal some interesting and unexpected behaviors. The experiments show that during optimization of the ASR task the speech enhancement capability of the model significantly decreases and vice-versa. Nevertheless the joint optimization of the two tasks shows a remarkable drop of the Phone Error Rate (PER) compared to the audio-visual baseline models trained only to perform phone recognition. We analyzed the behaviors of the proposed models by using two limited-size datasets, and in particular we used the mixed-speech versions of GRID and TCD-TIMIT. |
Tasks | Speech Enhancement |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.08248v2 |
https://arxiv.org/pdf/1904.08248v2.pdf | |
PWC | https://paperswithcode.com/paper/joined-audio-visual-speech-enhancement-and |
Repo | |
Framework | |
Reinforcement Learning for Nested Polar Code Construction
Title | Reinforcement Learning for Nested Polar Code Construction |
Authors | Lingchen Huang, Huazi Zhang, Rong Li, Yiqun Ge, Jun Wang |
Abstract | In this paper, we model nested polar code construction as a Markov decision process (MDP), and tackle it with advanced reinforcement learning (RL) techniques. First, an MDP environment with state, action, and reward is defined in the context of polar coding. Specifically, a state represents the construction of an $(N,K)$ polar code, an action specifies its reduction to an $(N,K-1)$ subcode, and reward is the decoding performance. A neural network architecture consisting of both policy and value networks is proposed to generate actions based on the observed states, aiming at maximizing the overall rewards. A loss function is defined to trade off between exploitation and exploration. To further improve learning efficiency and quality, an `integrated learning’ paradigm is proposed. It first employs a genetic algorithm to generate a population of (sub-)optimal polar codes for each $(N,K)$, and then uses them as prior knowledge to refine the policy in RL. Such a paradigm is shown to accelerate the training process, and converge at better performances. Simulation results show that the proposed learning-based polar constructions achieve comparable, or even better, performances than the state of the art under successive cancellation list (SCL) decoders. Last but not least, this is achieved without exploiting any expert knowledge from polar coding theory in the learning algorithms. | |
Tasks | |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07511v2 |
https://arxiv.org/pdf/1904.07511v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-for-nested-polar-code |
Repo | |
Framework | |