January 29, 2020

3319 words 16 mins read

Paper Group ANR 704

Online Forecasting of Total-Variation-bounded Sequences. BERT-CNN: a Hierarchical Patent Classifier Based on a Pre-Trained Language Model. Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies. Automated Multiclass Cardiac Volume Segmentation and Model Generation. Causal Simulations for Uplift Modelin …

Online Forecasting of Total-Variation-bounded Sequences


Title	Online Forecasting of Total-Variation-bounded Sequences
Authors	Dheeraj Baby, Yu-Xiang Wang
Abstract	We consider the problem of online forecasting of sequences of length $n$ with total-variation at most $C_n$ using observations contaminated by independent $\sigma$-subgaussian noise. We design an $O(n\log n)$-time algorithm that achieves a cumulative square error of $\tilde{O}(n^{1/3}C_n^{2/3}\sigma^{4/3} + C_n^2)$ with high probability.We also prove a lower bound that matches the upper bound in all parameters (up to a $\log(n)$ factor). To the best of our knowledge, this is the first \emph{polynomial-time} algorithm that achieves the optimal $O(n^{1/3})$ rate in forecasting total variation bounded sequences and the first algorithm that \emph{adapts to unknown} $C_n$. Our proof techniques leverage the special localized structure of Haar wavelet basis and the adaptivity to unknown smoothness parameters in the classical wavelet smoothing [Donoho et al., 1998]. We also compare our model to the rich literature of dynamic regret minimization and nonstationary stochastic optimization, where our problem can be treated as a special case. We show that the workhorse in those settings — online gradient descent and its variants with a fixed restarting schedule — are instances of a class of \emph{linear forecasters} that require a suboptimal regret of $\tilde{\Omega}(\sqrt{n})$. This implies that the use of more adaptive algorithms is necessary to obtain the optimal rate.
Tasks	Stochastic Optimization
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03364v2
PDF	https://arxiv.org/pdf/1906.03364v2.pdf
PWC	https://paperswithcode.com/paper/online-forecasting-of-total-variation-bounded
Repo
Framework

BERT-CNN: a Hierarchical Patent Classifier Based on a Pre-Trained Language Model


Title	BERT-CNN: a Hierarchical Patent Classifier Based on a Pre-Trained Language Model
Authors	Xiaolei Lu, Bin Ni
Abstract	The automatic classification is a process of automatically assigning text documents to predefined categories. An accurate automatic patent classifier is crucial to patent inventors and patent examiners in terms of intellectual property protection, patent management, and patent information retrieval. We present BERT-CNN, a hierarchical patent classifier based on pre-trained language model by training the national patent application documents collected from the State Information Center, China. The experimental results show that BERT-CNN achieves 84.3% accuracy, which is far better than the two compared baseline methods, Convolutional Neural Networks and Recurrent Neural Networks. We didn’t apply our model to the third and fourth hierarchical level of the International Patent Classification - “subclass” and “group”.The visualization of the Attention Mechanism shows that BERT-CNN obtains new state-of-the-art results in representing vocabularies and semantics. This article demonstrates the practicality and effectiveness of BERT-CNN in the field of automatic patent classification.
Tasks	Information Retrieval, Language Modelling
Published	2019-11-03
URL	https://arxiv.org/abs/1911.06241v1
PDF	https://arxiv.org/pdf/1911.06241v1.pdf
PWC	https://paperswithcode.com/paper/bert-cnn-a-hierarchical-patent-classifier
Repo
Framework

Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies


Title	Rating for Parents: Predicting Children Suitability Rating for Movies Based on Language of the Movies
Authors	Mahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar, Thamar Solorio
Abstract	The film culture has grown tremendously in recent years. The large number of streaming services put films as one of the most convenient forms of entertainment in today’s world. Films can help us learn and inspire societal change. But they can also negatively affect viewers. In this paper, our goal is to predict the suitability of the movie content for children and young adults based on scripts. The criterion that we use to measure suitability is the MPAA rating that is specifically designed for this purpose. We propose an RNN based architecture with attention that jointly models the genre and the emotions in the script to predict the MPAA rating. We achieve 78% weighted F1-score for the classification model that outperforms the traditional machine learning method by 6%.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07819v2
PDF	https://arxiv.org/pdf/1908.07819v2.pdf
PWC	https://paperswithcode.com/paper/190807819
Repo
Framework

Automated Multiclass Cardiac Volume Segmentation and Model Generation


Title	Automated Multiclass Cardiac Volume Segmentation and Model Generation
Authors	Erik Gaasedelen, Alex Deakyne, Paul Iaizzo
Abstract	Many strides have been made in semantic segmentation of multiple classes within an image. This has been largely due to advancements in deep learning and convolutional neural networks (CNNs). Features within a CNN are automatically learned during training, which allows for the abstraction of semantic information within the images. These deep learning models are powerful enough to handle the segmentation of multiple classes without the need for multiple networks. Despite these advancements, few attempts have been made to automatically segment multiple anatomical features within medical imaging datasets obtained from CT or MRI scans. This offers a unique challenge because of the three dimensional nature of medical imaging data. In order to alleviate the 3D modality problem, we propose a multi-axis ensemble method, applied to a dataset of 4-cardiac-chamber segmented CT scans. Inspired by the typical three-axis view used by humans, this technique aims to maximize the 3D spatial information afforded to the model, while remaining efficient for consumer grade inference hardware. Multi-axis ensembling along with pragmatic voxel preprocessing have shown in our experiments to greatly increase the mean intersection over union of our predictions over the complete DICOM dataset.
Tasks	Semantic Segmentation
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06685v1
PDF	https://arxiv.org/pdf/1909.06685v1.pdf
PWC	https://paperswithcode.com/paper/automated-multiclass-cardiac-volume
Repo
Framework

Causal Simulations for Uplift Modeling


Title	Causal Simulations for Uplift Modeling
Authors	Jeroen Berrevoets, Wouter Verbeke
Abstract	Uplift modeling requires experimental data, preferably collected in random fashion. This places a logistical and financial burden upon any organisation aspiring such models. Once deployed, uplift models are subject to effects from concept drift. Hence, methods are being developed that are able to learn from newly gained experience, as well as handle drifting environments. As these new methods attempt to eliminate the need for experimental data, another approach to test such methods must be formulated. Therefore, we propose a method to simulate environments that offer causal relationships in their parameters.
Tasks
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00287v1
PDF	http://arxiv.org/pdf/1902.00287v1.pdf
PWC	https://paperswithcode.com/paper/causal-simulations-for-uplift-modeling
Repo
Framework

EGNet:Edge Guidance Network for Salient Object Detection


Title	EGNet:Edge Guidance Network for Salient Object Detection
Authors	Jia-Xing Zhao, Jiangjiang Liu, Den-Ping Fan, Yang Cao, Jufeng Yang, Ming-Ming Cheng
Abstract	Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.
Tasks	Object Detection, Salient Object Detection
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08297v1
PDF	https://arxiv.org/pdf/1908.08297v1.pdf
PWC	https://paperswithcode.com/paper/egnetedge-guidance-network-for-salient-object
Repo
Framework

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference


Title	On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
Authors	Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan
Abstract	Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from demonstrations, but it usually assumes that the expert is noisily optimal. Real people, on the other hand, often have systematic biases: risk-aversion, myopia, etc. One option is to try to characterize these biases and account for them explicitly during learning. But in the era of deep learning, a natural suggestion researchers make is to avoid mathematical models of human behavior that are fraught with specific assumptions, and instead use a purely data-driven approach. We decided to put this to the test – rather than relying on assumptions about which specific bias the demonstrator has when planning, we instead learn the demonstrator’s planning algorithm that they use to generate demonstrations, as a differentiable planner. Our exploration yielded mixed findings: on the one hand, learning the planner can lead to better reward inference than relying on the wrong assumption; on the other hand, this benefit is dwarfed by the loss we incur by going from an exact to a differentiable planner. This suggest that at least for the foreseeable future, agents need a middle ground between the flexibility of data-driven methods and the useful bias of known human biases. Code is available at https://tinyurl.com/learningbiases.
Tasks
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09624v1
PDF	https://arxiv.org/pdf/1906.09624v1.pdf
PWC	https://paperswithcode.com/paper/on-the-feasibility-of-learning-rather-than
Repo
Framework

CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering


Title	CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering
Authors	Xiangrui Zeng, Hongyu Zheng
Abstract	Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering structure. Existing feature selection methods, such as Sparse K-means, rarely tackle the problem of accounting features that can only separate a subset of clusters. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes or distinguish a pair of subtypes but not others. In this paper, we propose a K-means based clustering algorithm that discovers informative features as well as which cluster pairs are separable by each selected features. The method is essentially an EM algorithm, in which we introduce lasso-type constraints on each cluster pair in the M step, and make the E step possible by maximizing the raw cross-cluster distance instead of minimizing the intra-cluster distance. The results were demonstrated on simulated data and a leukemia gene expression dataset.
Tasks	Feature Selection
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12384v2
PDF	https://arxiv.org/pdf/1909.12384v2.pdf
PWC	https://paperswithcode.com/paper/cs-sparse-k-means-an-algorithm-for-cluster
Repo
Framework

A hybrid model for predicting human physical activity status from lifelogging data


Title	A hybrid model for predicting human physical activity status from lifelogging data
Authors	Ji Ni, Bowei Chen, Nigel M. Allinson, Xujiong Ye
Abstract	One trend in the recent healthcare transformations is people are encouraged to monitor and manage their health based on their daily diets and physical activity habits. However, much attention of the use of operational research and analytical models in healthcare has been paid to the systematic level such as country or regional policy making or organisational issues. This paper proposes a model concerned with healthcare analytics at the individual level, which can predict human physical activity status from sequential lifelogging data collected from wearable sensors. The model has a two-stage hybrid structure (in short, MOGP-HMM) – a multi-objective genetic programming (MOGP) algorithm in the first stage to reduce the dimensions of lifelogging data and a hidden Markov model (HMM) in the second stage for activity status prediction over time. It can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to individuals, encouraging positive attitudes towards healthy lifestyles. We validate the model with the real data collected from a group of participants in the UK, and compare it with other popular two-stage hybrid models. Our experimental results show that the MOGP-HMM can achieve comparable performance. To the best of our knowledge, this is the very first study that uses the MOGP in the hybrid two-stage structure for individuals’ activity status prediction. It fits seamlessly with the current trend in the UK healthcare transformation of patient empowerment as well as contributing to a strategic development for more efficient and cost-effective provision of healthcare.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10891v2
PDF	https://arxiv.org/pdf/1905.10891v2.pdf
PWC	https://paperswithcode.com/paper/a-hybrid-model-for-predicting-human-physical
Repo
Framework

A Radiomics Approach to Computer-Aided Diagnosis with Cardiac Cine-MRI


Title	A Radiomics Approach to Computer-Aided Diagnosis with Cardiac Cine-MRI
Authors	Irem Cetin, Gerard Sanroma, Steffen E. Petersen, Sandy Napel, Oscar Camara, Miguel-Angel Gonzalez Ballester, Karim Lekadir
Abstract	Use expert visualization or conventional clinical indices can lack accuracy for borderline classications. Advanced statistical approaches based on eigen-decomposition have been mostly concerned with shape and motion indices. In this paper, we present a new approach to identify CVDs from cine-MRI by estimating large pools of radiomic features (statistical, shape and textural features) encoding relevant changes in anatomical and image characteristics due to CVDs. The calculated cine-MRI radiomic features are assessed using sequential forward feature selection to identify the most relevant ones for given CVD classes (e.g. myocardial infarction, cardiomyopathy, abnormal right ventricle). Finally, advanced machine learning is applied to suitably integrate the selected radiomics for final multi-feature classification based on Support Vector Machines (SVMs). The proposed technique was trained and cross-validated using 100 cine-MRI cases corresponding to five different cardiac classes from the ACDC MICCAI 2017 challenge \footnote{https://www.creatis.insa-lyon.fr/Challenge/acdc/index.html}. All cases were correctly classified in this preliminary study, indicating potential of using large-scale radiomics for MRI-based diagnosis of CVDs.
Tasks	Feature Selection
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11854v1
PDF	https://arxiv.org/pdf/1909.11854v1.pdf
PWC	https://paperswithcode.com/paper/a-radiomics-approach-to-computer-aided
Repo
Framework

Weakly-Supervised Temporal Localization via Occurrence Count Learning


Title	Weakly-Supervised Temporal Localization via Occurrence Count Learning
Authors	Julien Schroeter, Kirill Sidorov, David Marshall
Abstract	We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model’s theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.
Tasks	Temporal Localization
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07293v1
PDF	https://arxiv.org/pdf/1905.07293v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-temporal-localization-via
Repo
Framework

Essence Knowledge Distillation for Speech Recognition


Title	Essence Knowledge Distillation for Speech Recognition
Authors	Zhenchuan Yang, Chun Zhang, Weibin Zhang, Jianxiu Jin, Dongpeng Chen
Abstract	It is well known that a speech recognition system that combines multiple acoustic models trained on the same data significantly outperforms a single-model system. Unfortunately, real time speech recognition using a whole ensemble of models is too computationally expensive. In this paper, we propose to distill the knowledge of essence in an ensemble of models (i.e. the teacher model) to a single model (i.e. the student model) that needs much less computation to deploy. Previously, all the soften outputs of the teacher model are used to optimize the student model. We argue that not all the outputs of the ensemble are necessary to be distilled. Some of the outputs may even contain noisy information that is useless or even harmful to the training of the student model. In addition, we propose to train the student model with a multitask learning approach by utilizing both the soften outputs of the teacher model and the correct hard labels. The proposed method achieves some surprising results on the Switchboard data set. When the student model is trained together with the correct labels and the essence knowledge from the teacher model, it not only significantly outperforms another single model with the same architecture that is trained only with the correct labels, but also consistently outperforms the teacher model that is used to generate the soft labels.
Tasks	Speech Recognition
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10834v1
PDF	https://arxiv.org/pdf/1906.10834v1.pdf
PWC	https://paperswithcode.com/paper/essence-knowledge-distillation-for-speech
Repo
Framework

The Dynamical Gaussian Process Latent Variable Model in the Longitudinal Scenario


Title	The Dynamical Gaussian Process Latent Variable Model in the Longitudinal Scenario
Authors	Thanh Le, Vasant Honavar
Abstract	The Dynamical Gaussian Process Latent Variable Models provide an elegant non-parametric framework for learning the low dimensional representations of the high-dimensional time-series. Real world observational studies, however, are often ill-conditioned: the observations can be noisy, not assuming the luxury of relatively complete and equally spaced like those in time series. Such conditions make it difficult to learn reasonable representations in the high dimensional longitudinal data set by way of Gaussian Process Latent Variable Model as well as other dimensionality reduction procedures. In this study, we approach the inference of Gaussian Process Dynamical Systems in Longitudinal scenario by augmenting the bound in the variational approximation to include systematic samples of the unseen observations. We demonstrate the usefulness of this approach on synthetic as well as the human motion capture data set.
Tasks	Dimensionality Reduction, Latent Variable Models, Motion Capture, Time Series
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11630v1
PDF	https://arxiv.org/pdf/1909.11630v1.pdf
PWC	https://paperswithcode.com/paper/the-dynamical-gaussian-process-latent
Repo
Framework

Continual Learning with Self-Organizing Maps


Title	Continual Learning with Self-Organizing Maps
Authors	Pouya Bashivan, Martin Schrimpf, Robert Ajemian, Irina Rish, Matthew Riemer, Yuhai Tu
Abstract	Despite remarkable successes achieved by modern neural networks in a wide range of applications, these networks perform best in domain-specific stationary environments where they are trained only once on large-scale controlled data repositories. When exposed to non-stationary learning environments, current neural networks tend to forget what they had previously learned, a phenomena known as catastrophic forgetting. Most previous approaches to this problem rely on memory replay buffers which store samples from previously learned tasks, and use them to regularize the learning on new ones. This approach suffers from the important disadvantage of not scaling well to real-life problems in which the memory requirements become enormous. We propose a memoryless method that combines standard supervised neural networks with self-organizing maps to solve the continual learning problem. The role of the self-organizing map is to adaptively cluster the inputs into appropriate task contexts - without explicit labels - and allocate network resources accordingly. Thus, it selectively routes the inputs in accord with previous experience, ensuring that past learning is maintained and does not interfere with current learning. Out method is intuitive, memoryless, and performs on par with current state-of-the-art approaches on standard benchmarks.
Tasks	Continual Learning
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09330v1
PDF	http://arxiv.org/pdf/1904.09330v1.pdf
PWC	https://paperswithcode.com/paper/190409330
Repo
Framework

Exploring the Ideal Depth of Neural Network when Predicting Question Deletion on Community Question Answering


Title	Exploring the Ideal Depth of Neural Network when Predicting Question Deletion on Community Question Answering
Authors	Souvick Ghosh, Satanu Ghosh
Abstract	In recent years, Community Question Answering (CQA) has emerged as a popular platform for knowledge curation and archival. An interesting aspect of question answering is that it combines aspects from natural language processing, information retrieval, and machine learning. In this paper, we have explored how the depth of the neural network influences the accuracy of prediction of deleted questions in question-answering forums. We have used different shallow and deep models for prediction and analyzed the relationships between number of hidden layers, accuracy, and computational time. The results suggest that while deep networks perform better than shallow networks in modeling complex non-linear functions, increasing the depth may not always produce desired results. We observe that the performance of the deep neural network suffers significantly due to vanishing gradients when large number of hidden layers are present. Constantly increasing the depth of the model increases accuracy initially, after which the accuracy plateaus, and finally drops. Adding each layer is also expensive in terms of the time required to train the model. This research is situated in the domain of neural information retrieval and contributes towards building a theory on how deep neural networks can be efficiently and accurately used for predicting question deletion. We predict deleted questions with more than 90% accuracy using two to ten hidden layers, with less accurate results for shallower and deeper architectures.
Tasks	Community Question Answering, Information Retrieval, Question Answering
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03585v1
PDF	https://arxiv.org/pdf/1912.03585v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-ideal-depth-of-neural-network
Repo
Framework