Paper Group AWR 188
Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks. A Unified Approach to Interpreting Model Predictions. KATE: K-Competitive Autoencoder for Text. Backtracking Regression Forests for Accurate Camera Relocalization. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Lo …
Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks
Title | Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks |
Authors | Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, Luc Van Gool |
Abstract | We present Convolutional Oriented Boundaries (COB), which produces multiscale oriented contours and region hierarchies starting from generic image classification Convolutional Neural Networks (CNNs). COB is computationally efficient, because it requires a single CNN forward pass for multi-scale contour detection and it uses a novel sparse boundary representation for hierarchical segmentation; it gives a significant leap in performance over the state-of-the-art, and it generalizes very well to unseen categories and datasets. Particularly, we show that learning to estimate not only contour strength but also orientation provides more accurate results. We perform extensive experiments for low-level applications on BSDS, PASCAL Context, PASCAL Segmentation, and NYUD to evaluate boundary detection performance, showing that COB provides state-of-the-art contours and region hierarchies in all datasets. We also evaluate COB on high-level tasks when coupled with multiple pipelines for object proposals, semantic contours, semantic segmentation, and object detection on MS-COCO, SBD, and PASCAL; showing that COB also improves the results for all tasks. |
Tasks | Boundary Detection, Contour Detection, Image Classification, Object Detection, Semantic Segmentation |
Published | 2017-01-17 |
URL | http://arxiv.org/abs/1701.04658v2 |
http://arxiv.org/pdf/1701.04658v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-oriented-boundaries-from-image |
Repo | https://github.com/kmaninis/COB |
Framework | none |
A Unified Approach to Interpreting Model Predictions
Title | A Unified Approach to Interpreting Model Predictions |
Authors | Scott Lundberg, Su-In Lee |
Abstract | Understanding why a model makes a certain prediction can be as crucial as the prediction’s accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches. |
Tasks | Feature Importance, Interpretable Machine Learning |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07874v2 |
http://arxiv.org/pdf/1705.07874v2.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-approach-to-interpreting-model |
Repo | https://github.com/GISH123/Cathay-Holdings-CIP-Projects-for-Interpretable-Machine-Learning |
Framework | tf |
KATE: K-Competitive Autoencoder for Text
Title | KATE: K-Competitive Autoencoder for Text |
Authors | Yu Chen, Mohammed J. Zaki |
Abstract | Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations of text documents due to their confounding properties such as high-dimensionality, sparsity and power-law word distributions. In this paper, we propose a novel k-competitive autoencoder, called KATE, for text documents. Due to the competition between the neurons in the hidden layer, each neuron becomes specialized in recognizing specific data patterns, and overall the model can learn meaningful representations of textual data. A comprehensive set of experiments show that KATE can learn better representations than traditional autoencoders including denoising, contractive, variational, and k-sparse autoencoders. Our model also outperforms deep generative models, probabilistic topic models, and even word representation models (e.g., Word2Vec) in terms of several downstream tasks such as document classification, regression, and retrieval. |
Tasks | Document Classification, Topic Models |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.02033v2 |
http://arxiv.org/pdf/1705.02033v2.pdf | |
PWC | https://paperswithcode.com/paper/kate-k-competitive-autoencoder-for-text |
Repo | https://github.com/hugochan/KATE |
Framework | tf |
Backtracking Regression Forests for Accurate Camera Relocalization
Title | Backtracking Regression Forests for Accurate Camera Relocalization |
Authors | Lili Meng, Jianhui Chen, Frederick Tung, James J. Little, Julien Valentin, Clarence W. de Silva |
Abstract | Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure, and loop closure detection. Recent random forests based methods directly predict 3D world locations for 2D image locations to guide the camera pose optimization. During training, each tree greedily splits the samples to minimize the spatial variance. However, these greedy splits often produce uneven sub-trees in training or incorrect 2D-3D correspondences in testing. To address these problems, we propose a sample-balanced objective to encourage equal numbers of samples in the left and right sub-trees, and a novel backtracking scheme to remedy the incorrect 2D-3D correspondence predictions. Furthermore, we extend the regression forests based methods to use local features in both training and testing stages for outdoor RGB-only applications. Experimental results on publicly available indoor and outdoor datasets demonstrate the efficacy of our approach, which shows superior or on-par accuracy with several state-of-the-art methods. |
Tasks | Camera Relocalization, Loop Closure Detection, Simultaneous Localization and Mapping |
Published | 2017-10-22 |
URL | http://arxiv.org/abs/1710.07965v1 |
http://arxiv.org/pdf/1710.07965v1.pdf | |
PWC | https://paperswithcode.com/paper/backtracking-regression-forests-for-accurate |
Repo | https://github.com/LiliMeng/btrf |
Framework | none |
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Title | ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases |
Authors | Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald M. Summers |
Abstract | The chest X-ray is one of the most commonly accessible radiological examinations for screening and diagnosis of many lung diseases. A tremendous number of X-ray imaging studies accompanied by radiological reports are accumulated and stored in many modern hospitals’ Picture Archiving and Communication Systems (PACS). On the other side, it is still an open question how this type of hospital-size knowledge database containing invaluable imaging informatics (i.e., loosely labeled) can be used to facilitate the data-hungry deep learning paradigms in building truly large-scale high precision computer-aided diagnosis (CAD) systems. In this paper, we present a new chest X-ray database, namely “ChestX-ray8”, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. Importantly, we demonstrate that these commonly occurring thoracic diseases can be detected and even spatially-located via a unified weakly-supervised multi-label image classification and disease localization framework, which is validated using our proposed dataset. Although the initial quantitative results are promising as reported, deep convolutional neural network based “reading chest X-rays” (i.e., recognizing and locating the common disease patterns trained with only image-level labels) remains a strenuous task for fully-automated high precision CAD systems. Data download link: https://nihcc.app.box.com/v/ChestXray-NIHCC |
Tasks | Image Classification, Lung Disease Classification |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02315v5 |
http://arxiv.org/pdf/1705.02315v5.pdf | |
PWC | https://paperswithcode.com/paper/chestx-ray8-hospital-scale-chest-x-ray |
Repo | https://github.com/Azure/AzureChestXRay |
Framework | pytorch |
Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier
Title | Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier |
Authors | Joseph Futoma, Sanjay Hariharan, Katherine Heller |
Abstract | We present a scalable end-to-end classifier that uses streaming physiological and medication data to accurately predict the onset of sepsis, a life-threatening complication from infections that has high mortality and morbidity. Our proposed framework models the multivariate trajectories of continuous-valued physiological time series using multitask Gaussian processes, seamlessly accounting for the high uncertainty, frequent missingness, and irregular sampling rates typically associated with real clinical data. The Gaussian process is directly connected to a black-box classifier that predicts whether a patient will become septic, chosen in our case to be a recurrent neural network to account for the extreme variability in the length of patient encounters. We show how to scale the computations associated with the Gaussian process in a manner so that the entire system can be discriminatively trained end-to-end using backpropagation. In a large cohort of heterogeneous inpatient encounters at our university health system we find that it outperforms several baselines at predicting sepsis, and yields 19.4% and 55.5% improved areas under the Receiver Operating Characteristic and Precision Recall curves as compared to the NEWS score currently used by our hospital. |
Tasks | Gaussian Processes, Time Series |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.04152v1 |
http://arxiv.org/pdf/1706.04152v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-detect-sepsis-with-a-multitask |
Repo | https://github.com/BorgwardtLab/mgp-tcn |
Framework | tf |
Real-Time Panoramic Tracking for Event Cameras
Title | Real-Time Panoramic Tracking for Event Cameras |
Authors | Christian Reinbacher, Gottfried Munda, Thomas Pock |
Abstract | Event cameras are a paradigm shift in camera technology. Instead of full frames, the sensor captures a sparse set of events caused by intensity changes. Since only the changes are transferred, those cameras are able to capture quick movements of objects in the scene or of the camera itself. In this work we propose a novel method to perform camera tracking of event cameras in a panoramic setting with three degrees of freedom. We propose a direct camera tracking formulation, similar to state-of-the-art in visual odometry. We show that the minimal information needed for simultaneous tracking and mapping is the spatial position of events, without using the appearance of the imaged scene point. We verify the robustness to fast camera movements and dynamic objects in the scene on a recently proposed dataset and self-recorded sequences. |
Tasks | Visual Odometry |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05161v2 |
http://arxiv.org/pdf/1703.05161v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-panoramic-tracking-for-event |
Repo | https://github.com/VLOGroup/dvs-panotracking |
Framework | none |
Single-Shot Refinement Neural Network for Object Detection
Title | Single-Shot Refinement Neural Network for Object Detection |
Authors | Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li |
Abstract | For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https://github.com/sfzhang15/RefineDet |
Tasks | Object Detection |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06897v3 |
http://arxiv.org/pdf/1711.06897v3.pdf | |
PWC | https://paperswithcode.com/paper/single-shot-refinement-neural-network-for |
Repo | https://github.com/laycoding/FaceDetection |
Framework | none |
Decision support from financial disclosures with deep neural networks and transfer learning
Title | Decision support from financial disclosures with deep neural networks and transfer learning |
Authors | Mathias Kraus, Stefan Feuerriegel |
Abstract | Company disclosures greatly aid in the process of financial decision-making; therefore, they are consulted by financial investors and automated traders before exercising ownership in stocks. While humans are usually able to correctly interpret the content, the same is rarely true of computerized decision support systems, which struggle with the complexity and ambiguity of natural language. A possible remedy is represented by deep learning, which overcomes several shortcomings of traditional methods of text mining. For instance, recurrent neural networks, such as long short-term memories, employ hierarchical structures, together with a large number of hidden layers, to automatically extract features from ordered sequences of words and capture highly non-linear relationships such as context-dependent meanings. However, deep learning has only recently started to receive traction, possibly because its performance is largely untested. Hence, this paper studies the use of deep neural networks for financial decision support. We additionally experiment with transfer learning, in which we pre-train the network on a different corpus with a length of 139.1 million words. Our results reveal a higher directional accuracy as compared to traditional machine learning when predicting stock price movements in response to financial disclosures. Our work thereby helps to highlight the business value of deep learning and provides recommendations to practitioners and executives. |
Tasks | Decision Making, Transfer Learning |
Published | 2017-10-11 |
URL | http://arxiv.org/abs/1710.03954v1 |
http://arxiv.org/pdf/1710.03954v1.pdf | |
PWC | https://paperswithcode.com/paper/decision-support-from-financial-disclosures |
Repo | https://github.com/MathiasKraus/FinancialDeepLearning |
Framework | none |
Attention Is All You Need
Title | Attention Is All You Need |
Authors | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin |
Abstract | The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. |
Tasks | Constituency Parsing, Machine Translation |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03762v5 |
http://arxiv.org/pdf/1706.03762v5.pdf | |
PWC | https://paperswithcode.com/paper/attention-is-all-you-need |
Repo | https://github.com/kolloldas/torchnlp |
Framework | pytorch |
Multivariate Time Series Classification with WEASEL+MUSE
Title | Multivariate Time Series Classification with WEASEL+MUSE |
Authors | Patrick Schäfer, Ulf Leser |
Abstract | Multivariate time series (MTS) arise when multiple interconnected sensors record data over time. Dealing with this high-dimensional data is challenging for every classifier for at least two aspects: First, an MTS is not only characterized by individual feature values, but also by the interplay of features in different dimensions. Second, this typically adds large amounts of irrelevant data and noise. We present our novel MTS classifier WEASEL+MUSE which addresses both challenges. WEASEL+MUSE builds a multivariate feature vector, first using a sliding-window approach applied to each dimension of the MTS, then extracts discrete features per window and dimension. The feature vector is subsequently fed through feature selection, removing non-discriminative features, and analysed by a machine learning classifier. The novelty of WEASEL+MUSE lies in its specific way of extracting and filtering multivariate features from MTS by encoding context information into each feature. Still the resulting feature set is small, yet very discriminative and useful for MTS classification. Based on a popular benchmark of 20 MTS datasets, we found that WEASEL+MUSE is among the most accurate classifiers, when compared to the state of the art. The outstanding robustness of WEASEL+MUSE is further confirmed based on motion gesture recognition data, where it out-of-the-box achieved similar accuracies as domain-specific methods. |
Tasks | Feature Selection, Gesture Recognition, Time Series, Time Series Classification |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11343v4 |
http://arxiv.org/pdf/1711.11343v4.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-time-series-classification-with |
Repo | https://github.com/patrickzib/SFA |
Framework | tf |
Weightless: Lossy Weight Encoding For Deep Neural Network Compression
Title | Weightless: Lossy Weight Encoding For Deep Neural Network Compression |
Authors | Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks |
Abstract | The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, Weightless, can compress DNN weights by up to 496x with the same model accuracy. This results in up to a 1.51x improvement over the state-of-the-art. |
Tasks | Model Compression, Neural Network Compression, Quantization |
Published | 2017-11-13 |
URL | http://arxiv.org/abs/1711.04686v1 |
http://arxiv.org/pdf/1711.04686v1.pdf | |
PWC | https://paperswithcode.com/paper/weightless-lossy-weight-encoding-for-deep |
Repo | https://github.com/cambridge-mlg/miracle |
Framework | tf |
Fast and Accurate Time Series Classification with WEASEL
Title | Fast and Accurate Time Series Classification with WEASEL |
Authors | Patrick Schäfer, Ulf Leser |
Abstract | Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods. |
Tasks | Load Forecasting, Time Series, Time Series Classification |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07681v1 |
http://arxiv.org/pdf/1701.07681v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-accurate-time-series-classification |
Repo | https://github.com/patrickzib/SFA |
Framework | tf |
Character-level Deep Conflation for Business Data Analytics
Title | Character-level Deep Conflation for Business Data Analytics |
Authors | Zhe Gan, P. D. Singh, Ameet Joshi, Xiaodong He, Jianshu Chen, Jianfeng Gao, Li Deng |
Abstract | Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling. It is therefore critical to develop a conflation model that is able to truly understand the semantic meaning of the strings and match them at the semantic level. To this end, we develop a character-level deep conflation model that encodes the input text strings from character level into finite dimension feature vectors, which are then used to compute the cosine similarity between the text strings. The model is trained in an end-to-end manner using back propagation and stochastic gradient descent to maximize the likelihood of the correct association. Specifically, we propose two variants of the deep conflation model, based on long-short-term memory (LSTM) recurrent neural network (RNN) and convolutional neural network (CNN), respectively. Both models perform well on a real-world business analytics dataset and significantly outperform the baseline bag-of-character (BoC) model. |
Tasks | |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02640v1 |
http://arxiv.org/pdf/1702.02640v1.pdf | |
PWC | https://paperswithcode.com/paper/character-level-deep-conflation-for-business |
Repo | https://github.com/zhegan27/Deep_Conflation_Model |
Framework | none |
DeepTFP: Mobile Time Series Data Analytics based Traffic Flow Prediction
Title | DeepTFP: Mobile Time Series Data Analytics based Traffic Flow Prediction |
Authors | Yuanfang Chen, Falin Chen, Yizhi Ren, Ting Wu, Ye Yao |
Abstract | Traffic flow prediction is an important research issue to avoid traffic congestion in transportation systems. Traffic congestion avoiding can be achieved by knowing traffic flow and then conducting transportation planning. Achieving traffic flow prediction is challenging as the prediction is affected by many complex factors such as inter-region traffic, vehicles’ relations, and sudden events. However, as the mobile data of vehicles has been widely collected by sensor-embedded devices in transportation systems, it is possible to predict the traffic flow by analysing mobile data. This study proposes a deep learning based prediction algorithm, DeepTFP, to collectively predict the traffic flow on each and every traffic road of a city. This algorithm uses three deep residual neural networks to model temporal closeness, period, and trend properties of traffic flow. Each residual neural network consists of a branch of residual convolutional units. DeepTFP aggregates the outputs of the three residual neural networks to optimize the parameters of a time series prediction model. Contrast experiments on mobile time series data from the transportation system of England demonstrate that the proposed DeepTFP outperforms the Long Short-Term Memory (LSTM) architecture based method in prediction accuracy. |
Tasks | Time Series, Time Series Prediction |
Published | 2017-10-01 |
URL | http://arxiv.org/abs/1710.01695v1 |
http://arxiv.org/pdf/1710.01695v1.pdf | |
PWC | https://paperswithcode.com/paper/deeptfp-mobile-time-series-data-analytics |
Repo | https://github.com/tbinetruy/CIL4SYS |
Framework | none |