Paper Group ANR 269
Q($λ$) with Off-Policy Corrections. Time and Activity Sequence Prediction of Business Process Instances. Singularity structures and impacts on parameter estimation in finite mixtures of distributions. Content Selection in Data-to-Text Systems: A Survey. Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale. A Bayesian Appr …
Q($λ$) with Off-Policy Corrections
Title | Q($λ$) with Off-Policy Corrections |
Authors | Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Remi Munos |
Abstract | We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD($\lambda$). We illustrate this theoretical relationship empirically on a continuous-state control task. |
Tasks | |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.04951v2 |
http://arxiv.org/pdf/1602.04951v2.pdf | |
PWC | https://paperswithcode.com/paper/q-with-off-policy-corrections |
Repo | |
Framework | |
Time and Activity Sequence Prediction of Business Process Instances
Title | Time and Activity Sequence Prediction of Business Process Instances |
Authors | Mirko Polato, Alessandro Sperduti, Andrea Burattin, Massimiliano de Leoni |
Abstract | The ability to know in advance the trend of running process instances, with respect to different features, such as the expected completion time, would allow business managers to timely counteract to undesired situations, in order to prevent losses. Therefore, the ability to accurately predict future features of running business process instances would be a very helpful aid when managing processes, especially under service level agreement constraints. However, making such accurate forecasts is not easy: many factors may influence the predicted features. Many approaches have been proposed to cope with this problem but all of them assume that the underling process is stationary. However, in real cases this assumption is not always true. In this work we present new methods for predicting the remaining time of running cases. In particular we propose a method, assuming process stationarity, which outperforms the state-of-the-art and two other methods which are able to make predictions even with non-stationary processes. We also describe an approach able to predict the full sequence of activities that a running case is going to take. All these methods are extensively evaluated on two real case studies. |
Tasks | |
Published | 2016-02-24 |
URL | http://arxiv.org/abs/1602.07566v1 |
http://arxiv.org/pdf/1602.07566v1.pdf | |
PWC | https://paperswithcode.com/paper/time-and-activity-sequence-prediction-of |
Repo | |
Framework | |
Singularity structures and impacts on parameter estimation in finite mixtures of distributions
Title | Singularity structures and impacts on parameter estimation in finite mixtures of distributions |
Authors | Nhat Ho, XuanLong Nguyen |
Abstract | Singularities of a statistical model are the elements of the model’s parameter space which make the corresponding Fisher information matrix degenerate. These are the points for which estimation techniques such as the maximum likelihood estimator and standard Bayesian procedures do not admit the root-$n$ parametric rate of convergence. We propose a general framework for the identification of singularity structures of the parameter space of finite mixtures, and study the impacts of the singularity structures on minimax lower bounds and rates of convergence for the maximum likelihood estimator over a compact parameter space. Our study makes explicit the deep links between model singularities, parameter estimation convergence rates and minimax lower bounds, and the algebraic geometry of the parameter space for mixtures of continuous distributions. The theory is applied to establish concrete convergence rates of parameter estimation for finite mixture of skew-normal distributions. This rich and increasingly popular mixture model is shown to exhibit a remarkably complex range of asymptotic behaviors which have not been hitherto reported in the literature. |
Tasks | |
Published | 2016-09-09 |
URL | https://arxiv.org/abs/1609.02655v4 |
https://arxiv.org/pdf/1609.02655v4.pdf | |
PWC | https://paperswithcode.com/paper/singularity-structures-and-impacts-on |
Repo | |
Framework | |
Content Selection in Data-to-Text Systems: A Survey
Title | Content Selection in Data-to-Text Systems: A Survey |
Authors | Dimitra Gkatzia |
Abstract | Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting data using visualisation techniques, data-to-text systems use natural (human) language, which is the most common way for human-human communication. In addition, data-to-text systems can adapt their output content to users’ preferences, background or interests and therefore they can be pleasant for users to interact with. Content selection is an important part of every data-to-text system, because it is the module that determines which from the available information should be conveyed to the user. This survey initially introduces the field of data-to-text generation, describes the general data-to-text system architecture and then it reviews the state-of-the-art content selection methods. Finally, it provides recommendations for choosing an approach and discusses opportunities for future research. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2016-10-26 |
URL | http://arxiv.org/abs/1610.08375v1 |
http://arxiv.org/pdf/1610.08375v1.pdf | |
PWC | https://paperswithcode.com/paper/content-selection-in-data-to-text-systems-a |
Repo | |
Framework | |
Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale
Title | Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale |
Authors | Forrest Iandola |
Abstract | In recent years, the research community has discovered that deep neural networks (DNNs) and convolutional neural networks (CNNs) can yield higher accuracy than all previous solutions to a broad array of machine learning problems. To our knowledge, there is no single CNN/DNN architecture that solves all problems optimally. Instead, the “right” CNN/DNN architecture varies depending on the application at hand. CNN/DNNs comprise an enormous design space. Quantitatively, we find that a small region of the CNN design space contains 30 billion different CNN architectures. In this dissertation, we develop a methodology that enables systematic exploration of the design space of CNNs. Our methodology is comprised of the following four themes. 1. Judiciously choosing benchmarks and metrics. 2. Rapidly training CNN models. 3. Defining and describing the CNN design space. 4. Exploring the design space of CNN architectures. Taken together, these four themes comprise an effective methodology for discovering the “right” CNN architectures to meet the needs of practical applications. |
Tasks | |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06519v1 |
http://arxiv.org/pdf/1612.06519v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-design-space-of-deep |
Repo | |
Framework | |
A Bayesian Approach to Estimation of Speaker Normalization Parameters
Title | A Bayesian Approach to Estimation of Speaker Normalization Parameters |
Authors | Dhananjay Ram, Debasis Kundu, Rajesh M. Hegde |
Abstract | In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special type of Markov Chain Monte Carlo method. Additionally the hyperparameters are estimated using maximum likelihood approach. This model is used assuming that human vocal tract can be modeled as a tube of uniform cross section. It captures the variation in length of the vocal tract of different speakers more effectively, than the linear model used in literature. The work has also investigated different methods like minimization of Mean Square Error (MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both single pass and two pass approaches are then used to build a VTLN based speech recognizer. Experimental results on recognition of vowels and Hindi phrases from a medium vocabulary indicate that the Bayesian method improves the performance by a considerable margin. |
Tasks | Speech Recognition |
Published | 2016-10-19 |
URL | http://arxiv.org/abs/1610.05948v1 |
http://arxiv.org/pdf/1610.05948v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-approach-to-estimation-of-speaker |
Repo | |
Framework | |
Gaussian Processes for Music Audio Modelling and Content Analysis
Title | Gaussian Processes for Music Audio Modelling and Content Analysis |
Authors | Pablo A. Alvarado, Dan Stowell |
Abstract | Real music signals are highly variable, yet they have strong statistical structure. Prior information about the underlying physical mechanisms by which sounds are generated and rules by which complex sound structure is constructed (notes, chords, a complete musical score), can be naturally unified using Bayesian modelling techniques. Typically algorithms for Automatic Music Transcription independently carry out individual tasks such as multiple-F0 detection and beat tracking. The challenge remains to perform joint estimation of all parameters. We present a Bayesian approach for modelling music audio, and content analysis. The proposed methodology based on Gaussian processes seeks joint estimation of multiple music concepts by incorporating into the kernel prior information about non-stationary behaviour, dynamics, and rich spectral content present in the modelled music signal. We illustrate the benefits of this approach via two tasks: pitch estimation, and inferring missing segments in a polyphonic audio recording. |
Tasks | Gaussian Processes |
Published | 2016-06-03 |
URL | http://arxiv.org/abs/1606.01039v2 |
http://arxiv.org/pdf/1606.01039v2.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-processes-for-music-audio-modelling |
Repo | |
Framework | |
Deep Survival Analysis
Title | Deep Survival Analysis |
Authors | Rajesh Ranganath, Adler Perotte, Noémie Elhadad, David Blei |
Abstract | The electronic health record (EHR) provides an unprecedented opportunity to build actionable tools to support physicians at the point of care. In this paper, we investigate survival analysis in the context of EHR data. We introduce deep survival analysis, a hierarchical generative approach to survival analysis. It departs from previous approaches in two primary ways: (1) all observations, including covariates, are modeled jointly conditioned on a rich latent structure; and (2) the observations are aligned by their failure time, rather than by an arbitrary time zero as in traditional survival analysis. Further, it (3) scalably handles heterogeneous (continuous and discrete) data types that occur in the EHR. We validate deep survival analysis model by stratifying patients according to risk of developing coronary heart disease (CHD). Specifically, we study a dataset of 313,000 patients corresponding to 5.5 million months of observations. When compared to the clinically validated Framingham CHD risk score, deep survival analysis is significantly superior in stratifying patients according to their risk. |
Tasks | Survival Analysis |
Published | 2016-08-06 |
URL | http://arxiv.org/abs/1608.02158v2 |
http://arxiv.org/pdf/1608.02158v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-survival-analysis |
Repo | |
Framework | |
3-D/2-D Registration of Cardiac Structures by 3-D Contrast Agent Distribution Estimation
Title | 3-D/2-D Registration of Cardiac Structures by 3-D Contrast Agent Distribution Estimation |
Authors | Matthias Hoffmann, Christopher Kowalewski, Andreas Maier, Klaus Kurzidim, Norbert Strobel, Joachim Hornegger |
Abstract | For augmented fluoroscopy during cardiac catheter ablation procedures, a preoperatively acquired 3-D model of the left atrium of the patient can be registered to X-ray images. Therefore the 3D-model is matched with the contrast agent based appearance of the left atrium. Commonly, only small amounts of contrast agent (CA) are used to locate the left atrium. This is why we focus on robust registration methods that work also if the structure of interest is only partially contrasted. In particular, we propose two similarity measures for CA-based registration: The first similarity measure, explicit apparent edges, focuses on edges of the patient anatomy made visible by contrast agent and can be computed quickly on the GPU. The second novel similarity measure computes a contrast agent distribution estimate (CADE) inside the 3-D model and rates its consistency with the CA seen in biplane fluoroscopic images. As the CADE computation involves a reconstruction of CA in 3-D using the CA within the fluoroscopic images, it is slower. Using a combination of both methods, our evaluation on 11 well-contrasted clinical datasets yielded an error of 7.9+/-6.3 mm over all frames. For 10 datasets with little CA, we obtained an error of 8.8+/-6.7 mm. Our new methods outperform a registration based on the projected shadow significantly (p<0.05). |
Tasks | |
Published | 2016-01-22 |
URL | http://arxiv.org/abs/1601.06062v1 |
http://arxiv.org/pdf/1601.06062v1.pdf | |
PWC | https://paperswithcode.com/paper/3-d2-d-registration-of-cardiac-structures-by |
Repo | |
Framework | |
Saturating Splines and Feature Selection
Title | Saturating Splines and Feature Selection |
Authors | Nicholas Boyd, Trevor Hastie, Stephen Boyd, Benjamin Recht, Michael Jordan |
Abstract | We extend the adaptive regression spline model by incorporating saturation, the natural requirement that a function extend as a constant outside a certain range. We fit saturating splines to data using a convex optimization problem over a space of measures, which we solve using an efficient algorithm based on the conditional gradient method. Unlike many existing approaches, our algorithm solves the original infinite-dimensional (for splines of degree at least two) optimization problem without pre-specified knot locations. We then adapt our algorithm to fit generalized additive models with saturating splines as coordinate functions and show that the saturation requirement allows our model to simultaneously perform feature selection and nonlinear function fitting. Finally, we briefly sketch how the method can be extended to higher order splines and to different requirements on the extension outside the data range. |
Tasks | Feature Selection |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06764v3 |
http://arxiv.org/pdf/1609.06764v3.pdf | |
PWC | https://paperswithcode.com/paper/saturating-splines-and-feature-selection |
Repo | |
Framework | |
Authorship Verification - An Approach based on Random Forest
Title | Authorship Verification - An Approach based on Random Forest |
Authors | Promita Maitra, Souvick Ghosh, Dipankar Das |
Abstract | Authorship attribution, being an important problem in many areas in-cluding information retrieval, computational linguistics, law and journalism etc., has been identified as a subject of increasingly research interest in the re-cent years. In case of Author Identification task in PAN at CLEF 2015, the main focus was given on cross-genre and cross-topic author verification tasks. We have used several word-based and style-based features to identify the dif-ferences between the known and unknown problems of one given set and label the unknown ones accordingly using a Random Forest based classifier. |
Tasks | Information Retrieval |
Published | 2016-07-29 |
URL | http://arxiv.org/abs/1607.08885v1 |
http://arxiv.org/pdf/1607.08885v1.pdf | |
PWC | https://paperswithcode.com/paper/authorship-verification-an-approach-based-on |
Repo | |
Framework | |
A Reconfigurable Low Power High Throughput Architecture for Deep Network Training
Title | A Reconfigurable Low Power High Throughput Architecture for Deep Network Training |
Authors | Raqibul Hasan, Tarek Taha |
Abstract | General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Neural networks, including deep networks, are widely used for signal processing and pattern recognition applications. In this paper we propose a multicore architecture for deep neural network based processing. Memristor crossbars are utilized to provide low power high throughput execution of neural networks. The system has both training and recognition (evaluation of new input) capabilities. The proposed system could be used for classification, dimensionality reduction, feature extraction, and anomaly detection applications. The system level area and power benefits of the specialized architecture is compared with the NVIDIA Telsa K20 GPGPU. Our experimental evaluations show that the proposed architecture can provide up to five orders of magnitude more energy efficiency over GPGPUs for deep neural network processing. |
Tasks | Anomaly Detection, Dimensionality Reduction |
Published | 2016-03-24 |
URL | http://arxiv.org/abs/1603.07400v2 |
http://arxiv.org/pdf/1603.07400v2.pdf | |
PWC | https://paperswithcode.com/paper/a-reconfigurable-low-power-high-throughput |
Repo | |
Framework | |
Implementation of a FPGA-Based Feature Detection and Networking System for Real-time Traffic Monitoring
Title | Implementation of a FPGA-Based Feature Detection and Networking System for Real-time Traffic Monitoring |
Authors | Jieshi Chen, Benjamin Carrion Schafer, Ivan Wang-Hei Ho |
Abstract | With the growing demand of real-time traffic monitoring nowadays, software-based image processing can hardly meet the real-time data processing requirement due to the serial data processing nature. In this paper, the implementation of a hardware-based feature detection and networking system prototype for real-time traffic monitoring as well as data transmission is presented. The hardware architecture of the proposed system is mainly composed of three parts: data collection, feature detection, and data transmission. Overall, the presented prototype can tolerate a high data rate of about 60 frames per second. By integrating the feature detection and data transmission functions, the presented system can be further developed for various VANET application scenarios to improve road safety and traffic efficiency. For example, detection of vehicles that violate traffic rules, parking enforcement, etc. |
Tasks | |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.06669v1 |
http://arxiv.org/pdf/1603.06669v1.pdf | |
PWC | https://paperswithcode.com/paper/implementation-of-a-fpga-based-feature |
Repo | |
Framework | |
Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering
Title | Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering |
Authors | Yangyang Hou, Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon |
Abstract | Clustering is one of the most fundamental and important tasks in data mining. Traditional clustering algorithms, such as K-means, assign every data point to exactly one cluster. However, in real-world datasets, the clusters may overlap with each other. Furthermore, often, there are outliers that should not belong to any cluster. We recently proposed the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address both issues in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. A practical alternative is to use a low-rank factorization of the solution matrix in the convex formulation. The resulting optimization problem is non-convex, and we can locally optimize the objective function using an augmented Lagrangian method. In this paper, we consider two fast multiplier methods to accelerate the convergence of an augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers (ADMM). For the proximal augmented Lagrangian or proximal method of multipliers, we show a convergence result for the non-convex case with bound-constrained subproblems. These methods are up to 13 times faster—with no change in quality—compared with a standard augmented Lagrangian method on problems with over 10,000 variables and bring runtimes down from over an hour to around 5 minutes. |
Tasks | |
Published | 2016-02-05 |
URL | http://arxiv.org/abs/1602.01910v1 |
http://arxiv.org/pdf/1602.01910v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-multiplier-methods-to-optimize-non |
Repo | |
Framework | |
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
Title | HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection |
Authors | Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun |
Abstract | Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances. State-of-the-art region proposal methods usually need several thousand proposals to get high recall, thus hurting the detection efficiency. Although the latest Region Proposal Network method gets promising detection accuracy with several hundred proposals, it still struggles in small-size object detection and precise localization (e.g., large IoU thresholds), mainly due to the coarseness of its feature maps. In this paper, we present a deep hierarchical network, namely HyperNet, for handling region proposal generation and object detection jointly. Our HyperNet is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space. The Hyper Features well incorporate deep but highly semantic, intermediate but really complementary, and shallow but naturally high-resolution features of the image, thus enabling us to construct HyperNet by sharing them both in generating proposals and detecting objects via an end-to-end joint training strategy. For the deep VGG16 model, our method achieves completely leading recall and state-of-the-art object detection accuracy on PASCAL VOC 2007 and 2012 using only 100 proposals per image. It runs with a speed of 5 fps (including all steps) on a GPU, thus having the potential for real-time processing. |
Tasks | Object Detection |
Published | 2016-04-03 |
URL | http://arxiv.org/abs/1604.00600v1 |
http://arxiv.org/pdf/1604.00600v1.pdf | |
PWC | https://paperswithcode.com/paper/hypernet-towards-accurate-region-proposal |
Repo | |
Framework | |