Paper Group ANR 235
Neural Attentive Multiview Machines. V4D:4D Convolutional Neural Networks for Video-level Representation Learning. MRI Banding Removal via Adversarial Training. A comprehensive review on convolutional neural network in machine fault diagnosis. Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Perform …
Neural Attentive Multiview Machines
Title | Neural Attentive Multiview Machines |
Authors | Oren Barkan, Ori Katz, Noam Koenigstein |
Abstract | An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism. NAM harnesses multiple information sources and automatically quantifies their relevancy with respect to a supervised task. Finally, a very practical advantage of NAM is its robustness to the case of dataset with missing views. We demonstrate the effectiveness of NAM for the task of movies and app recommendations. Our evaluations indicate that NAM outperforms single view models as well as alternative multiview methods on item recommendations tasks, including cold-start scenarios. |
Tasks | Representation Learning |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07696v1 |
https://arxiv.org/pdf/2002.07696v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-attentive-multiview-machines |
Repo | |
Framework | |
V4D:4D Convolutional Neural Networks for Video-level Representation Learning
Title | V4D:4D Convolutional Neural Networks for Video-level Representation Learning |
Authors | Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang |
Abstract | Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin. |
Tasks | Representation Learning, Video Recognition |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07442v1 |
https://arxiv.org/pdf/2002.07442v1.pdf | |
PWC | https://paperswithcode.com/paper/v4d4d-convolutional-neural-networks-for-video |
Repo | |
Framework | |
MRI Banding Removal via Adversarial Training
Title | MRI Banding Removal via Adversarial Training |
Authors | Aaron Defazio, Tullie Murrell, Michael P. Recht |
Abstract | MRI images reconstructed from sub-sampled Cartesian data using deep learning techniques often show a characteristic banding (sometimes described as streaking), which is particularly strong in low signal-to-noise regions of the reconstructed image. In this work, we propose the use of an adversarial loss that penalizes banding structures without requiring any human annotation. Our technique greatly reduces the appearance of banding, without requiring any additional computation or post-processing at reconstruction time. We report the results of a blind comparison against a strong baseline by a group of expert evaluators (board-certified radiologists), where our approach is ranked superior at banding removal with no statistically significant loss of detail. |
Tasks | |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08699v2 |
https://arxiv.org/pdf/2001.08699v2.pdf | |
PWC | https://paperswithcode.com/paper/mri-banding-removal-via-adversarial-training |
Repo | |
Framework | |
A comprehensive review on convolutional neural network in machine fault diagnosis
Title | A comprehensive review on convolutional neural network in machine fault diagnosis |
Authors | Jinyang Jiao, Ming Zhao, Jing Lin, Kaixuan Liang |
Abstract | With the rapid development of manufacturing industry, machine fault diagnosis has become increasingly significant to ensure safe equipment operation and production. Consequently, multifarious approaches have been explored and developed in the past years, of which intelligent algorithms develop particularly rapidly. Convolutional neural network, as a typical representative of intelligent diagnostic models, has been extensively studied and applied in recent five years, and a large amount of literature has been published in academic journals and conference proceedings. However, there has not been a systematic review to cover these studies and make a prospect for the further research. To fill in this gap, this work attempts to review and summarize the development of the Convolutional Network based Fault Diagnosis (CNFD) approaches comprehensively. Generally, a typical CNFD framework is composed of the following steps, namely, data collection, model construction, and feature learning and decision making, thus this paper is organized by following this stream. Firstly, data collection process is described, in which several popular datasets are introduced. Then, the fundamental theory from the basic convolutional neural network to its variants is elaborated. After that, the applications of CNFD are reviewed in terms of three mainstream directions, i.e. classification, prediction and transfer diagnosis. Finally, conclusions and prospects are presented to point out the characteristics of current development, facing challenges and future trends. Last but not least, it is expected that this work would provide convenience and inspire further exploration for researchers in this field. |
Tasks | Decision Making |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.07605v1 |
https://arxiv.org/pdf/2002.07605v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-review-on-convolutional |
Repo | |
Framework | |
Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Performance Prediction
Title | Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Performance Prediction |
Authors | Ajitesh Srivastava, Naifeng Zhang, Rajgopal Kannan, Viktor K. Prasanna |
Abstract | Writing high-performance code requires significant expertise of the programming language, compiler optimizations, and hardware knowledge. This often leads to poor productivity and portability and is inconvenient for a non-programmer domain-specialist such as a Physicist. More desirable is a high-level language where the domain-specialist simply specifies the workload in terms of high-level operations (e.g., matrix-multiply(A, B)) and the compiler identifies the best implementation fully utilizing the heterogeneous platform. For creating a compiler that supports productivity, portability, and performance simultaneously, it is crucial to predict performance of various available implementations (variants) of the dominant operations (kernels) contained in the workload on various hardware to decide (a) which variant should be chosen for each kernel in the workload, and (b) on which hardware resource the variant should run. To enable the performance prediction, we propose lightweight augmented neural networks for arbitrary combinations of kernel-variant-hardware. A key innovation is utilizing mathematical complexity of the kernels as a feature to achieve higher accuracy. These models are compact to reduce training time and fast inference during compile-time and run-time. Using models with less than 75 parameters, and only 250 training data instances, we are able to obtain a low MAPE of ~13% significantly outperforming traditional feed-forward neural networks on 40 kernel-variant-hardware combinations. We further demonstrate that our variant selection approach can be used in Halide implementations to obtain up to 1.5x speedup over Halide’s autoscheduler. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07497v1 |
https://arxiv.org/pdf/2003.07497v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-high-performance-portability-and |
Repo | |
Framework | |
Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization
Title | Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization |
Authors | Patrick Hart, Leonard Rychly, Alois Knol |
Abstract | Many current behavior generation methods struggle to handle real-world traffic situations as they do not scale well with complexity. However, behaviors can be learned off-line using data-driven approaches. Especially, reinforcement learning is promising as it implicitly learns how to behave utilizing collected experiences. In this work, we combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. The policy-based reinforcement learning algorithm provides an initial solution and guiding reference for the post-optimization. Therefore, the optimizer only has to compute a single homotopy class, e.g.\ drive behind or in front of the other vehicle. By storing the state-history during reinforcement learning, it can be used for constraint checking and the optimizer can account for interactions. The post-optimization additionally acts as a safety-layer and the novel method, thus, can be applied in safety-critical applications. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles. |
Tasks | |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03168v1 |
https://arxiv.org/pdf/2003.03168v1.pdf | |
PWC | https://paperswithcode.com/paper/lane-merging-using-policy-based-reinforcement |
Repo | |
Framework | |
Search for Better Students to Learn Distilled Knowledge
Title | Search for Better Students to Learn Distilled Knowledge |
Authors | Jindong Gu, Volker Tresp |
Abstract | Knowledge Distillation, as a model compression technique, has received great attention. The knowledge of a well-performed teacher is distilled to a student with a small architecture. The architecture of the small student is often chosen to be similar to their teacher’s, with fewer layers or fewer channels, or both. However, even with the same number of FLOPs or parameters, the students with different architecture can achieve different generalization ability. The configuration of a student architecture requires intensive network architecture engineering. In this work, instead of designing a good student architecture manually, we propose to search for the optimal student automatically. Based on L1-norm optimization, a subgraph from the teacher network topology graph is selected as a student, the goal of which is to minimize the KL-divergence between student’s and teacher’s outputs. We verify the proposal on CIFAR10 and CIFAR100 datasets. The empirical experiments show that the learned student architecture achieves better performance than ones specified manually. We also visualize and understand the architecture of the found student. |
Tasks | Model Compression |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11612v1 |
https://arxiv.org/pdf/2001.11612v1.pdf | |
PWC | https://paperswithcode.com/paper/search-for-better-students-to-learn-distilled |
Repo | |
Framework | |
Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification
Title | Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification |
Authors | Muhammad Nabeel Asim, Muhammad Usman Ghani, Muhammad Ali Ibrahim, Sheraz Ahmad, Waqar Mahmood, Andreas Dengel |
Abstract | In order to provide benchmark performance for Urdu text document classification, the contribution of this paper is manifold. First, it pro-vides a publicly available benchmark dataset manually tagged against 6 classes. Second, it investigates the performance impact of traditional machine learning based Urdu text document classification methodologies by embedding 10 filter-based feature selection algorithms which have been widely used for other languages. Third, for the very first time, it as-sesses the performance of various deep learning based methodologies for Urdu text document classification. In this regard, for experimentation, we adapt 10 deep learning classification methodologies which have pro-duced best performance figures for English text classification. Fourth, it also investigates the performance impact of transfer learning by utiliz-ing Bidirectional Encoder Representations from Transformers approach for Urdu language. Fifth, it evaluates the integrity of a hybrid approach which combines traditional machine learning based feature engineering and deep learning based automated feature engineering. Experimental results show that feature selection approach named as Normalised Dif-ference Measure along with Support Vector Machine outshines state-of-the-art performance on two closed source benchmark datasets CLE Urdu Digest 1000k, and CLE Urdu Digest 1Million with a significant margin of 32%, and 13% respectively. Across all three datasets, Normalised Differ-ence Measure outperforms other filter based feature selection algorithms as it significantly uplifts the performance of all adopted machine learning, deep learning, and hybrid approaches. The source code and presented dataset are available at Github repository. |
Tasks | Automated Feature Engineering, Document Classification, Feature Engineering, Feature Selection, Text Classification, Transfer Learning |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01345v1 |
https://arxiv.org/pdf/2003.01345v1.pdf | |
PWC | https://paperswithcode.com/paper/benchmark-performance-of-machine-and-deep |
Repo | |
Framework | |
Hierarchical models vs. transfer learning for document-level sentiment classification
Title | Hierarchical models vs. transfer learning for document-level sentiment classification |
Authors | Jeremy Barnes, Vinit Ravishankar, Lilja Øvrelid, Erik Velldal |
Abstract | Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classification. However, these two paradigms have not been systematically compared and it is not clear under which circumstances one approach is better than the other. In this work we empirically compare hierarchical models and transfer learning for document-level sentiment classification. We show that non-trivial hierarchical models outperform previous baselines and transfer learning on document-level sentiment classification in five languages. |
Tasks | Document Classification, Language Modelling, Sentiment Analysis, Transfer Learning |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08131v1 |
https://arxiv.org/pdf/2002.08131v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-models-vs-transfer-learning-for |
Repo | |
Framework | |
Machine Learning Enabled Discovery of Application Dependent Design Principles for Two-dimensional Materials
Title | Machine Learning Enabled Discovery of Application Dependent Design Principles for Two-dimensional Materials |
Authors | Victor Venturi, Holden Parks, Zeeshan Ahmad, Venkatasubramanian Viswanathan |
Abstract | The large-scale search for high-performing candidate 2D materials is limited to calculating a few simple descriptors, usually with first-principles density functional theory calculations. In this work, we alleviate this issue by extending and generalizing crystal graph convolutional neural networks to systems with planar periodicity, and train an ensemble of models to predict thermodynamic, mechanical, and electronic properties. To demonstrate the utility of this approach, we carry out a screening of nearly 45,000 structures for two largely disjoint applications: namely, mechanically robust composites and photovoltaics. An analysis of the uncertainty associated with our methods indicates the ensemble of neural networks is well-calibrated and has errors comparable with those from accurate first-principles density functional theory calculations. The ensemble of models allows us to gauge the confidence of our predictions, and to find the candidates most likely to exhibit effective performance in their applications. Since the datasets used in our screening were combinatorically generated, we are also able to investigate, using an innovative method, structural and compositional design principles that impact the properties of the structures surveyed and which can act as a generative model basis for future material discovery through reverse engineering. Our approach allowed us to recover some well-accepted design principles: for instance, we find that hybrid organic-inorganic perovskites with lead and tin tend to be good candidates for solar cell applications. |
Tasks | |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.13418v1 |
https://arxiv.org/pdf/2003.13418v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-enabled-discovery-of |
Repo | |
Framework | |
On the Replicability of Combining Word Embeddings and Retrieval Models
Title | On the Replicability of Combining Word Embeddings and Retrieval Models |
Authors | Luca Papariello, Alexandros Bampoulidis, Mihai Lupu |
Abstract | We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space. |
Tasks | Document Classification, Information Retrieval, Word Embeddings |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04484v1 |
https://arxiv.org/pdf/2001.04484v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-replicability-of-combining-word |
Repo | |
Framework | |
Archimedean Choice Functions: an Axiomatic Foundation for Imprecise Decision Making
Title | Archimedean Choice Functions: an Axiomatic Foundation for Imprecise Decision Making |
Authors | Jasper De Bock |
Abstract | If uncertainty is modelled by a probability measure, decisions are typically made by choosing the option with the highest expected utility. If an imprecise probability model is used instead, this decision rule can be generalised in several ways. We here focus on two such generalisations that apply to sets of probability measures: E-admissibility and maximality. Both of them can be regarded as special instances of so-called choice functions, a very general mathematical framework for decision making. For each of these two decision rules, we provide a set of necessary and sufficient conditions on choice functions that uniquely characterises this rule, thereby providing an axiomatic foundation for imprecise decision making with sets of probabilities. A representation theorem for Archimedean choice functions in terms of coherent lower previsions lies at the basis of both results. |
Tasks | Decision Making |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05196v3 |
https://arxiv.org/pdf/2002.05196v3.pdf | |
PWC | https://paperswithcode.com/paper/archimedean-choice-functions-an-axiomatic |
Repo | |
Framework | |
Continual Learning for Domain Adaptation in Chest X-ray Classification
Title | Continual Learning for Domain Adaptation in Chest X-ray Classification |
Authors | Matthias Lenga, Heinrich Schulz, Axel Saalbach |
Abstract | Over the last years, Deep Learning has been successfully applied to a broad range of medical applications. Especially in the context of chest X-ray classification, results have been reported which are on par, or even superior to experienced radiologists. Despite this success in controlled experimental environments, it has been noted that the ability of Deep Learning models to generalize to data from a new domain (with potentially different tasks) is often limited. In order to address this challenge, we investigate techniques from the field of Continual Learning (CL) including Joint Training (JT), Elastic Weight Consolidation (EWC) and Learning Without Forgetting (LWF). Using the ChestX-ray14 and the MIMIC-CXR datasets, we demonstrate empirically that these methods provide promising options to improve the performance of Deep Learning models on a target domain and to mitigate effectively catastrophic forgetting for the source domain. To this end, the best overall performance was obtained using JT, while for LWF competitive results could be achieved - even without accessing data from the source domain. |
Tasks | Continual Learning, Domain Adaptation |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05922v1 |
https://arxiv.org/pdf/2001.05922v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-learning-for-domain-adaptation-in |
Repo | |
Framework | |
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis
Title | Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis |
Authors | Devinder Kumar, Parthipan Siva, Paul Marchwica, Alexander Wong |
Abstract | An ongoing major challenge in computer vision is the task of person re-identification, where the goal is to match individuals across different, non-overlapping camera views. While recent success has been achieved via supervised learning using deep neural networks, such methods have limited widespread adoption due to the need for large-scale, customized data annotation. As such, there has been a recent focus on unsupervised learning approaches to mitigate the data annotation issue; however, current approaches in literature have limited performance compared to supervised learning approaches as well as limited applicability for adoption in new environments. In this paper, we address the aforementioned challenges faced in person re-identification for real-world, practical scenarios by introducing a novel, unsupervised domain adaptation approach for person re-identification. This is accomplished through the introduction of: i) k-reciprocal tracklet Clustering for Unsupervised Domain Adaptation (ktCUDA) (for pseudo-label generation on target domain), and ii) Synthesized Heterogeneous RE-id Domain (SHRED) composed of large-scale heterogeneous independent source environments (for improving robustness and adaptability to a wide diversity of target environments). Experimental results across four different image and video benchmark datasets show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance when compared to existing state-of-the-art methods, as well as demonstrate better adaptability to different types of environments. |
Tasks | Domain Adaptation, Person Re-Identification, Unsupervised Domain Adaptation |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04928v1 |
https://arxiv.org/pdf/2001.04928v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-domain-adaptation-in-person-re |
Repo | |
Framework | |
Bayesian Nonparametric Cost-Effectiveness Analyses: Causal Estimation and Adaptive Subgroup Discovery
Title | Bayesian Nonparametric Cost-Effectiveness Analyses: Causal Estimation and Adaptive Subgroup Discovery |
Authors | Arman Oganisian, Nandita Mitra, Jason Roy |
Abstract | Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which cannot be captured parametrically. Effectiveness (often measured as increased survival time) and cost both tend to be right-censored. Moreover, CEAs are often conducted using observational data with non-random treatment assignment. Policy-relevant causal estimation therefore requires robust confounding control. Finally, current CEA methods do not address cost-effectiveness heterogeneity in a principled way - opting to either present marginal results or cost-effectiveness results for pre-specified subgroups. Motivated by these challenges, we develop a nonparametric Bayesian model for joint cost-survival distributions in the presence of censoring. Our approach utilizes an Enriched Dirichlet Process prior on the covariate effects of cost and survival time, while using a separate Gamma Process prior on the baseline survival time hazard. Causal CEA estimands are identified and estimated via a Bayesian nonparametric g-computation procedure. Finally, we propose leveraging the induced clustering of the Enriched Dirichlet Process to adaptively discover subgroups of patients with different cost-effectiveness profiles. We outline an MCMC procedure for full posterior inference, evaluate frequentist properties via simulations, and apply our model to an observational study of endometrial cancer therapies using medical insurance claims data. |
Tasks | Decision Making |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04706v1 |
https://arxiv.org/pdf/2002.04706v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-nonparametric-cost-effectiveness |
Repo | |
Framework | |