October 16, 2019

3658 words 18 mins read

Paper Group ANR 1147

Marginal Replay vs Conditional Replay for Continual Learning. Deep Learning Based Gait Recognition Using Smartphones in the Wild. A Neural Network Framework for Fair Classifier. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?. Extracting Lungs from CT Images using Fully Convolutional Networks. Eyeriss v2: A Flexible …

Marginal Replay vs Conditional Replay for Continual Learning


Title	Marginal Replay vs Conditional Replay for Continual Learning
Authors	Timothée Lesort, Alexander Gepperth, Andrei Stoian, David Filliat
Abstract	We present a new replay-based method of continual classification learning that we term “conditional replay” which generates samples and labels together by sampling from a distribution conditioned on the class. We compare conditional replay to another replay-based continual learning paradigm (which we term “marginal replay”) that generates samples independently of their class and assigns labels in a separate step. The main improvement in conditional replay is that labels for generated samples need not be inferred, which reduces the margin for error in complex continual classification learning tasks. We demonstrate the effectiveness of this approach using novel and standard benchmarks constructed from MNIST and FashionMNIST data, and compare to the regularization-based \textit{elastic weight consolidation} (EWC) method.
Tasks	Continual Learning
Published	2018-10-29
URL	https://arxiv.org/abs/1810.12069v6
PDF	https://arxiv.org/pdf/1810.12069v6.pdf
PWC	https://paperswithcode.com/paper/marginal-replay-vs-conditional-replay-for
Repo
Framework

Deep Learning Based Gait Recognition Using Smartphones in the Wild


Title	Deep Learning Based Gait Recognition Using Smartphones in the Wild
Authors	Qin Zou, Yanling Wang, Qian Wang, Yi Zhao, Qingquan Li
Abstract	Comparing with other biometrics, gait has advantages of being unobtrusive and difficult to conceal. Inertial sensors such as accelerometer and gyroscope are often used to capture gait dynamics. Nowadays, these inertial sensors have commonly been integrated in smartphones and widely used by average person, which makes it very convenient and inexpensive to collect gait data. In this paper, we study gait recognition using smartphones in the wild. Unlike traditional methods that often require the person to walk along a specified road and/or at a normal walking speed, the proposed method collects inertial gait data under a condition of unconstraint without knowing when, where, and how the user walks. To obtain a high performance of person identification and authentication, deep-learning techniques are presented to learn and model the gait biometrics from the walking data. Specifically, a hybrid deep neural network is proposed for robust gait feature representation, where features in the space domain and in the time domain are successively abstracted by a convolutional neural network and a recurrent neural network. In the experiments, two datasets collected by smartphones on a total of 118 subjects are used for evaluations. Experiments show that the proposed method achieves over 93.5% and 93.7% accuracy in person identification and authentication, respectively.
Tasks	Gait Recognition, Person Identification
Published	2018-11-01
URL	https://arxiv.org/abs/1811.00338v2
PDF	https://arxiv.org/pdf/1811.00338v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-gait-recognition-using
Repo
Framework

A Neural Network Framework for Fair Classifier


Title	A Neural Network Framework for Fair Classifier
Authors	P Manisha, Sujit Gujar
Abstract	Machine learning models are extensively being used in decision making, especially for prediction tasks. These models could be biased or unfair towards a specific sensitive group either of a specific race, gender or age. Researchers have put efforts into characterizing a particular definition of fairness and enforcing them into the models. In this work, mainly we are concerned with the following three definitions, Disparate Impact, Demographic Parity and Equalized Odds. Researchers have shown that Equalized Odds cannot be satisfied in calibrated classifiers unless the classifier is perfect. Hence the primary challenge is to ensure a degree of fairness while guaranteeing as much accuracy as possible. Fairness constraints are complex and need not be convex. Incorporating them into a machine learning algorithm is a significant challenge. Hence, many researchers have tried to come up with a surrogate loss which is convex in order to build fair classifiers. Besides, certain papers try to build fair representations by preprocessing the data, irrespective of the classifier used. Such methods, not only require a lot of unrealistic assumptions but also require human engineered analytical solutions to build a machine learning model. We instead propose an automated solution which is generalizable over any fairness constraint. We use a neural network which is trained on batches and directly enforces the fairness constraint as the loss function without modifying it further. We have also experimented with other complex performance measures such as H-mean loss, Q-mean-loss, F-measure; without the need for any surrogate loss functions. Our experiments prove that the network achieves similar performance as state of the art. Thus, one can just plug-in appropriate loss function as per required fairness constraint and performance measure of the classifier and train a neural network to achieve that.
Tasks	Decision Making
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00247v1
PDF	http://arxiv.org/pdf/1811.00247v1.pdf
PWC	https://paperswithcode.com/paper/a-neural-network-framework-for-fair
Repo
Framework

When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?


Title	When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?
Authors	Tengyu Xu, Yi Zhou, Kaiyi Ji, Yingbin Liang
Abstract	We study the implicit bias of gradient descent methods in solving a binary classification problem over a linearly separable dataset. The classifier is described by a nonlinear ReLU model and the objective function adopts the exponential loss function. We first characterize the landscape of the loss function and show that there can exist spurious asymptotic local minima besides asymptotic global minima. We then show that gradient descent (GD) can converge to either a global or a local max-margin direction, or may diverge from the desired max-margin direction in a general context. For stochastic gradient descent (SGD), we show that it converges in expectation to either the global or the local max-margin direction if SGD converges. We further explore the implicit bias of these algorithms in learning a multi-neuron network under certain stationary conditions, and show that the learned classifier maximizes the margins of each sample pattern partition under the ReLU activation.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04339v2
PDF	http://arxiv.org/pdf/1806.04339v2.pdf
PWC	https://paperswithcode.com/paper/when-will-gradient-methods-converge-to-max
Repo
Framework

Extracting Lungs from CT Images using Fully Convolutional Networks


Title	Extracting Lungs from CT Images using Fully Convolutional Networks
Authors	Jeovane Honório Alves, Pedro Martins Moreira Neto, Lucas Ferrari de Oliveira
Abstract	Analysis of cancer and other pathological diseases, like the interstitial lung diseases (ILDs), is usually possible through Computed Tomography (CT) scans. To aid this, a preprocessing step of segmentation is performed to reduce the area to be analyzed, segmenting the lungs and removing unimportant regions. Generally, complex methods are developed to extract the lung region, also using hand-made feature extractors to enhance segmentation. With the popularity of deep learning techniques and its automated feature learning, we propose a lung segmentation approach using fully convolutional networks (FCNs) combined with fully connected conditional random fields (CRF), employed in many state-of-the-art segmentation works. Aiming to develop a generalized approach, the publicly available datasets from University Hospitals of Geneva (HUG) and VESSEL12 challenge were studied, including many healthy and pathological CT scans for evaluation. Experiments using the dataset individually, its trained model on the other dataset and a combination of both datasets were employed. Dice scores of $98.67%\pm0.94%$ for the HUG-ILD dataset and $99.19%\pm0.37%$ for the VESSEL12 dataset were achieved, outperforming works in the former and obtaining similar state-of-the-art results in the latter dataset, showing the capability in using deep learning approaches.
Tasks	Computed Tomography (CT)
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10704v1
PDF	http://arxiv.org/pdf/1804.10704v1.pdf
PWC	https://paperswithcode.com/paper/extracting-lungs-from-ct-images-using-fully
Repo
Framework

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices


Title	Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Authors	Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze
Abstract	A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.
Tasks
Published	2018-07-10
URL	https://arxiv.org/abs/1807.07928v2
PDF	https://arxiv.org/pdf/1807.07928v2.pdf
PWC	https://paperswithcode.com/paper/eyeriss-v2-a-flexible-accelerator-for
Repo
Framework

Multi-view X-ray R-CNN


Title	Multi-view X-ray R-CNN
Authors	Jan-Martin O. Steitz, Faraz Saeedan, Stefan Roth
Abstract	Motivated by the detection of prohibited objects in carry-on luggage as a part of avionic security screening, we develop a CNN-based object detection approach for multi-view X-ray image data. Our contributions are two-fold. First, we introduce a novel multi-view pooling layer to perform a 3D aggregation of 2D CNN-features extracted from each view. To that end, our pooling layer exploits the known geometry of the imaging system to ensure geometric consistency of the feature aggregation. Second, we introduce an end-to-end trainable multi-view detection pipeline based on Faster R-CNN, which derives the region proposals and performs the final classification in 3D using these aggregated multi-view features. Our approach shows significant accuracy gains compared to single-view detection while even being more efficient than performing single-view detection in each view.
Tasks	Object Detection
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02344v1
PDF	http://arxiv.org/pdf/1810.02344v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-x-ray-r-cnn
Repo
Framework

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition


Title	A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition
Authors	Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, Karen Livescu
Abstract	Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set.
Tasks	Language Modelling, Speech Recognition
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10857v2
PDF	http://arxiv.org/pdf/1807.10857v2.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-techniques-for-language-model
Repo
Framework

Online Machine Learning in Big Data Streams


Title	Online Machine Learning in Big Data Streams
Authors	András A. Benczúr, Levente Kocsis, Róbert Pálovics
Abstract	The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.
Tasks	Recommendation Systems
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05872v1
PDF	http://arxiv.org/pdf/1802.05872v1.pdf
PWC	https://paperswithcode.com/paper/online-machine-learning-in-big-data-streams
Repo
Framework

A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer


Title	A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer
Authors	Byungjae Lee, Kyunghyun Paeng
Abstract	Predicting TNM stage is the major determinant of breast cancer prognosis and treatment. The essential part of TNM stage classification is whether the cancer has metastasized to the regional lymph nodes (N-stage). Pathologic N-stage (pN-stage) is commonly performed by pathologists detecting metastasis in histological slides. However, this diagnostic procedure is prone to misinterpretation and would normally require extensive time by pathologists because of the sheer volume of data that needs a thorough review. Automated detection of lymph node metastasis and pN-stage prediction has a great potential to reduce their workload and help the pathologist. Recent advances in convolutional neural networks (CNN) have shown significant improvements in histological slide analysis, but accuracy is not optimized because of the difficulty in the handling of gigapixel images. In this paper, we propose a robust method for metastasis detection and pN-stage classification in breast cancer from multiple gigapixel pathology images in an effective way. pN-stage is predicted by combining patch-level CNN based metastasis detector and slide-level lymph node classifier. The proposed framework achieves a state-of-the-art quadratic weighted kappa score of 0.9203 on the Camelyon17 dataset, outperforming the previous winning method of the Camelyon17 challenge.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12067v1
PDF	http://arxiv.org/pdf/1805.12067v1.pdf
PWC	https://paperswithcode.com/paper/a-robust-and-effective-approach-towards
Repo
Framework

A Capsule Network for Traffic Speed Prediction in Complex Road Networks


Title	A Capsule Network for Traffic Speed Prediction in Complex Road Networks
Authors	Youngjoo Kim, Peng Wang, Yifei Zhu, Lyudmila Mihaylova
Abstract	This paper proposes a deep learning approach for traffic flow prediction in complex road networks. Traffic flow data from induction loop sensors are essentially a time series, which is also spatially related to traffic in different road segments. The spatio-temporal traffic data can be converted into an image where the traffic data are expressed in a 3D space with respect to space and time axes. Although convolutional neural networks (CNNs) have been showing surprising performance in understanding images, they have a major drawback. In the max pooling operation, CNNs are losing important information by locally taking the highest activation values. The inter-relationship in traffic data measured by sparsely located sensors in different time intervals should not be neglected in order to obtain accurate predictions. Thus, we propose a neural network with capsules that replaces max pooling by dynamic routing. This is the first approach that employs the capsule network on a time series forecasting problem, to our best knowledge. Moreover, an experiment on real traffic speed data measured in the Santander city of Spain demonstrates the proposed method outperforms the state-of-the-art method based on a CNN by 13.1% in terms of root mean squared error.
Tasks	Time Series, Time Series Forecasting
Published	2018-07-23
URL	http://arxiv.org/abs/1807.10603v2
PDF	http://arxiv.org/pdf/1807.10603v2.pdf
PWC	https://paperswithcode.com/paper/a-capsule-network-for-traffic-speed
Repo
Framework

Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds


Title	Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds
Authors	Lingwei Xie, Song He, Shu Yang, Boyuan Feng, Kun Wan, Zhongnan Zhang, Xiaochen Bo, Yufei Ding
Abstract	With the rapid development of high-throughput technologies, parallel acquisition of large-scale drug-informatics data provides huge opportunities to improve pharmaceutical research and development. One significant application is the purpose prediction of small molecule compounds, aiming to specify therapeutic properties of extensive purpose-unknown compounds and to repurpose novel therapeutic properties of FDA-approved drugs. Such problem is very challenging since compound attributes contain heterogeneous data with various feature patterns such as drug fingerprint, drug physicochemical property, drug perturbation gene expression. Moreover, there is complex nonlinear dependency among heterogeneous data. In this paper, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains. The framework utilizes the adversarial strategy to effectively learn target representations and models their nonlinear dependency. Experiments on two real-world datasets illustrate that the performance of our approach obtains an obvious improvement over competitive baselines. The novel therapeutic properties of purpose-unknown compounds we predicted are mostly reported or brought to the clinics. Furthermore, our framework can integrate various attributes beyond the three domains examined here and can be applied in the industry for screening the purpose of huge amounts of as yet unidentified compounds. Source codes of this paper are available on Github.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1810.00867v1
PDF	http://arxiv.org/pdf/1810.00867v1.pdf
PWC	https://paperswithcode.com/paper/domain-adversarial-multi-task-framework-for
Repo
Framework

Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays


Title	Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays
Authors	Lise Aubin, Mehdi Khamassi, Benoît Girard
Abstract	During sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled.
Tasks	Q-Learning
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05594v2
PDF	http://arxiv.org/pdf/1802.05594v2.pdf
PWC	https://paperswithcode.com/paper/prioritized-sweeping-neural-dynaq-with
Repo
Framework

Annotation-cost Minimization for Medical Image Segmentation using Suggestive Mixed Supervision Fully Convolutional Networks


Title	Annotation-cost Minimization for Medical Image Segmentation using Suggestive Mixed Supervision Fully Convolutional Networks
Authors	Yash Bhalgat, Meet Shah, Suyash Awate
Abstract	For medical image segmentation, most fully convolutional networks (FCNs) need strong supervision through a large sample of high-quality dense segmentations, which is taxing in terms of costs, time and logistics involved. This burden of annotation can be alleviated by exploiting weak inexpensive annotations such as bounding boxes and anatomical landmarks. However, it is very difficult to \textit{a priori} estimate the optimal balance between the number of annotations needed for each supervision type that leads to maximum performance with the least annotation cost. To optimize this cost-performance trade off, we present a budget-based cost-minimization framework in a mixed-supervision setting via dense segmentations, bounding boxes, and landmarks. We propose a linear programming (LP) formulation combined with uncertainty and similarity based ranking strategy to judiciously select samples to be annotated next for optimal performance. In the results section, we show that our proposed method achieves comparable performance to state-of-the-art approaches with significantly reduced cost of annotations.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2018-12-29
URL	http://arxiv.org/abs/1812.11302v1
PDF	http://arxiv.org/pdf/1812.11302v1.pdf
PWC	https://paperswithcode.com/paper/annotation-cost-minimization-for-medical
Repo
Framework

Improved Part Segmentation Performance by Optimising Realism of Synthetic Images using Cycle Generative Adversarial Networks


Title	Improved Part Segmentation Performance by Optimising Realism of Synthetic Images using Cycle Generative Adversarial Networks
Authors	Ruud Barth, Jochen Hemming, Eldert J. van Henten
Abstract	In this paper we report on improved part segmentation performance using convolutional neural networks to reduce the dependency on the large amount of manually annotated empirical images. This was achieved by optimising the visual realism of synthetic agricultural images.In Part I, a cycle consistent generative adversarial network was applied to synthetic and empirical images with the objective to generate more realistic synthetic images by translating them to the empirical domain. We first hypothesise and confirm that plant part image features such as color and texture become more similar to the empirical domain after translation of the synthetic images.Results confirm this with an improved mean color distribution correlation with the empirical data prior of 0.62 and post translation of 0.90. Furthermore, the mean image features of contrast, homogeneity, energy and entropy moved closer to the empirical mean, post translation. In Part II, 7 experiments were performed using convolutional neural networks with different combinations of synthetic, synthetic translated to empirical and empirical images. We hypothesised that the translated images can be used for (i) improved learning of empirical images, and (ii) that learning without any fine-tuning with empirical images is improved by bootstrapping with translated images over bootstrapping with synthetic images. Results confirm our second and third hypotheses. First a maximum intersection-over-union performance was achieved of 0.52 when bootstrapping with translated images and fine-tuning with empirical images; an 8% increase compared to only using synthetic images. Second, training without any empirical fine-tuning resulted in an average IOU of 0.31; a 55% performance increase over previous methods that only used synthetic images.
Tasks
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06301v1
PDF	http://arxiv.org/pdf/1803.06301v1.pdf
PWC	https://paperswithcode.com/paper/improved-part-segmentation-performance-by
Repo
Framework