Paper Group ANR 771
Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods. On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms. Actively Learning what makes a Discrete Sequence Valid. Vision-based Detection of Acoustic Timed Events: a Case Study on Cl …
Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods
Title | Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods |
Authors | Cheng Ju, Mary Combs, Samuel D Lendle, Jessica M Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. van der Laan |
Abstract | The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a “library” of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different databases. While the SL has been shown to perform well in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of the SL in its ability to predict treatment assignment using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also considered a novel strategy for prediction modeling that combines the SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases. |
Tasks | |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02236v2 |
http://arxiv.org/pdf/1703.02236v2.pdf | |
PWC | https://paperswithcode.com/paper/propensity-score-prediction-for-electronic |
Repo | |
Framework | |
On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms
Title | On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms |
Authors | Nicolas Garcia Trillos, Zachary Kaplan, Thabo Samakhoana, Daniel Sanz-Alonso |
Abstract | A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that these consistency results have profound algorithmic implications. When consistency holds, carefully designed graph-based Markov chain Monte Carlo algorithms are proved to have a uniform spectral gap, independent of the number of unlabeled inputs. Several numerical experiments corroborate both the statistical consistency and the algorithmic scalability established by the theory. |
Tasks | |
Published | 2017-10-20 |
URL | https://arxiv.org/abs/1710.07702v2 |
https://arxiv.org/pdf/1710.07702v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-consistency-of-graph-based-bayesian |
Repo | |
Framework | |
Actively Learning what makes a Discrete Sequence Valid
Title | Actively Learning what makes a Discrete Sequence Valid |
Authors | David Janz, Jos van der Westhuizen, José Miguel Hernández-Lobato |
Abstract | Deep learning techniques have been hugely successful for traditional supervised and unsupervised machine learning problems. In large part, these techniques solve continuous optimization problems. Recently however, discrete generative deep learning models have been successfully used to efficiently search high-dimensional discrete spaces. These methods work by representing discrete objects as sequences, for which powerful sequence-based deep models can be employed. Unfortunately, these techniques are significantly hindered by the fact that these generative models often produce invalid sequences. As a step towards solving this problem, we propose to learn a deep recurrent validator model. Given a partial sequence, our model learns the probability of that sequence occurring as the beginning of a full valid sequence. Thus this identifies valid versus invalid sequences and crucially it also provides insight about how individual sequence elements influence the validity of discrete objects. To learn this model we propose an approach inspired by seminal work in Bayesian active learning. On a synthetic dataset, we demonstrate the ability of our model to distinguish valid and invalid sequences. We believe this is a key step toward learning generative models that faithfully produce valid discrete objects. |
Tasks | Active Learning |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04465v1 |
http://arxiv.org/pdf/1708.04465v1.pdf | |
PWC | https://paperswithcode.com/paper/actively-learning-what-makes-a-discrete |
Repo | |
Framework | |
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Title | Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets |
Authors | A. Bazzica, J. C. van Gemert, C. C. S. Liem, A. Hanjalic |
Abstract | Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world clarinetist videos and carry out preliminary experiments on a 3D convolutional neural network based on multiple streams and purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5 hours of clarinetist videos together with cleaned annotations which include about 36,000 onsets and the coordinates for a number of salient points and regions of interest. By performing several training trials on our dataset, we learned that the problem is challenging. We found that the CNN model is highly sensitive to the optimization algorithm and hyper-parameters, and that treating the problem as binary classification may prevent the joint optimization of precision and recall. To encourage further research, we publicly share our dataset, annotations and all models and detail which issues we came across during our preliminary experiments. |
Tasks | |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09556v1 |
http://arxiv.org/pdf/1706.09556v1.pdf | |
PWC | https://paperswithcode.com/paper/vision-based-detection-of-acoustic-timed |
Repo | |
Framework | |
Let Features Decide for Themselves: Feature Mask Network for Person Re-identification
Title | Let Features Decide for Themselves: Feature Mask Network for Person Re-identification |
Authors | Guodong Ding, Salman Khan, Zhenmin Tang, Fatih Porikli |
Abstract | Person re-identification aims at establishing the identity of a pedestrian from a gallery that contains images of multiple people obtained from a multi-camera system. Many challenges such as occlusions, drastic lighting and pose variations across the camera views, indiscriminate visual appearances, cluttered backgrounds, imperfect detections, motion blur, and noise make this task highly challenging. While most approaches focus on learning features and metrics to derive better representations, we hypothesize that both local and global contextual cues are crucial for an accurate identity matching. To this end, we propose a Feature Mask Network (FMN) that takes advantage of ResNet high-level features to predict a feature map mask and then imposes it on the low-level features to dynamically reweight different object parts for a locally aware feature representation. This serves as an effective attention mechanism by allowing the network to focus on local details selectively. Given the resemblance of person re-identification with classification and retrieval tasks, we frame the network training as a multi-task objective optimization, which further improves the learned feature descriptions. We conduct experiments on Market-1501, DukeMTMC-reID and CUHK03 datasets, where the proposed approach respectively achieves significant improvements of $5.3%$, $9.1%$ and $10.7%$ in mAP measure relative to the state-of-the-art. |
Tasks | Person Re-Identification |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07155v1 |
http://arxiv.org/pdf/1711.07155v1.pdf | |
PWC | https://paperswithcode.com/paper/let-features-decide-for-themselves-feature |
Repo | |
Framework | |
Anticipating Daily Intention using On-Wrist Motion Triggered Sensing
Title | Anticipating Daily Intention using On-Wrist Motion Triggered Sensing |
Authors | Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun |
Abstract | Anticipating human intention by observing one’s actions has many applications. For instance, picking up a cellphone, then a charger (actions) implies that one wants to charge the cellphone (intention). By anticipating the intention, an intelligent system can guide the user to the closest power outlet. We propose an on-wrist motion triggered sensing system for anticipating daily intentions, where the on-wrist sensors help us to persistently observe one’s actions. The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement. We jointly trained the whole network using policy gradient and cross-entropy loss. To evaluate, we collect the first daily “intention” dataset consisting of 2379 videos with 34 intentions and 164 unique action sequences. Our method achieves 92.68%, 90.85%, 97.56% accuracy on three users while processing only 29% of the visual observation on average. |
Tasks | |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07477v1 |
http://arxiv.org/pdf/1710.07477v1.pdf | |
PWC | https://paperswithcode.com/paper/anticipating-daily-intention-using-on-wrist |
Repo | |
Framework | |
Director Field Analysis (DFA): Exploring Local White Matter Geometric Structure in diffusion MRI
Title | Director Field Analysis (DFA): Exploring Local White Matter Geometric Structure in diffusion MRI |
Authors | Jian Cheng, Peter J. Basser |
Abstract | In Diffusion Tensor Imaging (DTI) or High Angular Resolution Diffusion Imaging (HARDI), a tensor field or a spherical function field (e.g., an orientation distribution function field), can be estimated from measured diffusion weighted images. In this paper, inspired by the microscopic theoretical treatment of phases in liquid crystals, we introduce a novel mathematical framework, called Director Field Analysis (DFA), to study local geometric structural information of white matter based on the reconstructed tensor field or spherical function field: 1) We propose a set of mathematical tools to process general director data, which consists of dyadic tensors that have orientations but no direction. 2) We propose Orientational Order (OO) and Orientational Dispersion (OD) indices to describe the degree of alignment and dispersion of a spherical function in a single voxel or in a region, respectively; 3) We also show how to construct a local orthogonal coordinate frame in each voxel exhibiting anisotropic diffusion; 4) Finally, we define three indices to describe three types of orientational distortion (splay, bend, and twist) in a local spatial neighborhood, and a total distortion index to describe distortions of all three types. To our knowledge, this is the first work to quantitatively describe orientational distortion (splay, bend, and twist) in general spherical function fields from DTI or HARDI data. The proposed DFA and its related mathematical tools can be used to process not only diffusion MRI data but also general director field data, and the proposed scalar indices are useful for detecting local geometric changes of white matter for voxel-based or tract-based analysis in both DTI and HARDI acquisitions. The related codes and a tutorial for DFA will be released in DMRITool. |
Tasks | |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01862v2 |
http://arxiv.org/pdf/1706.01862v2.pdf | |
PWC | https://paperswithcode.com/paper/director-field-analysis-dfa-exploring-local |
Repo | |
Framework | |
Minimax Lower Bounds for Ridge Combinations Including Neural Nets
Title | Minimax Lower Bounds for Ridge Combinations Including Neural Nets |
Authors | Jason M. Klusowski, Andrew R. Barron |
Abstract | Estimation of functions of $ d $ variables is considered using ridge combinations of the form $ \textstyle\sum_{k=1}^m c_{1,k} \phi(\textstyle\sum_{j=1}^d c_{0,j,k}x_j-b_k) $ where the activation function $ \phi $ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size $ n $ of possibly noisy values at random sites $ X \in B = [-1,1]^d $, the minimax mean square error is examined for functions in the closure of the $ \ell_1 $ hull of ridge functions with activation $ \phi $. It is shown to be of order $ d/n $ to a fractional power (when $ d $ is of smaller order than $ n $), and to be of order $ (\log d)/n $ to a fractional power (when $ d $ is of larger order than $ n $). Dependence on constraints $ v_0 $ and $ v_1 $ on the $ \ell_1 $ norms of inner parameter $ c_0 $ and outer parameter $ c_1 $, respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions. |
Tasks | |
Published | 2017-02-09 |
URL | http://arxiv.org/abs/1702.02828v1 |
http://arxiv.org/pdf/1702.02828v1.pdf | |
PWC | https://paperswithcode.com/paper/minimax-lower-bounds-for-ridge-combinations |
Repo | |
Framework | |
From Characters to Words to in Between: Do We Capture Morphology?
Title | From Characters to Words to in Between: Do We Capture Morphology? |
Authors | Clara Vania, Adam Lopez |
Abstract | Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we present experiments that systematically vary (1) the basic unit of representation, (2) the composition of these representations, and (3) the morphological typology of the language modeled. Our results extend previous findings that character representations are effective across typologies, and we find that a previously unstudied combination of character trigram representations composed with bi-LSTMs outperforms most others. But we also find room for improvement: none of the character-level models match the predictive accuracy of a model with access to true morphological analyses, even when learned from an order of magnitude more data. |
Tasks | Language Modelling |
Published | 2017-04-26 |
URL | http://arxiv.org/abs/1704.08352v1 |
http://arxiv.org/pdf/1704.08352v1.pdf | |
PWC | https://paperswithcode.com/paper/from-characters-to-words-to-in-between-do-we |
Repo | |
Framework | |
Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA
Title | Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA |
Authors | Farhan Shafiq, Takato Yamada, Antonio T. Vilchez, Sakyasingha Dasgupta |
Abstract | Deep convolutional neural networks (CNN) based solutions are the current state- of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized “YOLOV2” on the low cost, low power Cyclone- V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour. |
Tasks | Object Detection, Quantization |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06272v1 |
http://arxiv.org/pdf/1712.06272v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-flow-for-compressing-convolution |
Repo | |
Framework | |
Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic
Title | Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic |
Authors | Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor |
Abstract | We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus. The subject has not previously received serious attention due to lack of naturally occurring parallel data; yet its importance is evidenced by dialectal Arabic’s wide usage and breadth of inter-dialect variation, comparable to that of Romance languages. Our results suggest that modeling morphology and syntax significantly improves dialect-to-dialect translation, though optimizing such data-sparse models requires consideration of the linguistic differences between dialects and the nature of available data and resources. On a single-reference blind test set where untranslated input scores 6.5 BLEU and a model trained only on parallel data reaches 14.6, pivot techniques and morphosyntactic modeling significantly improve performance to 17.5. |
Tasks | Machine Translation |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06273v1 |
http://arxiv.org/pdf/1712.06273v1.pdf | |
PWC | https://paperswithcode.com/paper/low-resourced-machine-translation-via-morpho |
Repo | |
Framework | |
Toward Inverse Control of Physics-Based Sound Synthesis
Title | Toward Inverse Control of Physics-Based Sound Synthesis |
Authors | A. Pfalz, E. Berdahl |
Abstract | Long Short-Term Memory networks (LSTMs) can be trained to realize inverse control of physics-based sound synthesizers. Physics-based sound synthesizers simulate the laws of physics to produce output sound according to input gesture signals. When a user’s gestures are measured in real time, she or he can use them to control physics-based sound synthesizers, thereby creating simulated virtual instruments. An intriguing question is how to program a computer to learn to play such physics-based models. This work demonstrates that LSTMs can be trained to accomplish this inverse control task with four physics-based sound synthesizers. |
Tasks | |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09551v1 |
http://arxiv.org/pdf/1706.09551v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-inverse-control-of-physics-based-sound |
Repo | |
Framework | |
Curvature-aided Incremental Aggregated Gradient Method
Title | Curvature-aided Incremental Aggregated Gradient Method |
Authors | Hoi-To Wai, Wei Shi, Angelia Nedic, Anna Scaglione |
Abstract | We propose a new algorithm for finite sum optimization which we call the curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the problem of training a classifier for a d-dimensional problem, where the number of training data is $m$ and $m \gg d \gg 1$, the CIAG method seeks to accelerate incremental aggregated gradient (IAG) methods using aids from the curvature (or Hessian) information, while avoiding the evaluation of matrix inverses required by the incremental Newton (IN) method. Specifically, our idea is to exploit the incrementally aggregated Hessian matrix to trace the full gradient vector at every incremental step, therefore achieving an improved linear convergence rate over the state-of-the-art IAG methods. For strongly convex problems, the fast linear convergence rate requires the objective function to be close to quadratic, or the initial point to be close to optimal solution. Importantly, we show that running one iteration of the CIAG method yields the same improvement to the optimality gap as running one iteration of the full gradient method, while the complexity is $O(d^2)$ for CIAG and $O(md)$ for the full gradient. Overall, the CIAG method strikes a balance between the high computation complexity incremental Newton-type methods and the slow IAG method. Our numerical results support the theoretical findings and show that the CIAG method often converges with much fewer iterations than IAG, and requires much shorter running time than IN when the problem dimension is high. |
Tasks | |
Published | 2017-10-24 |
URL | http://arxiv.org/abs/1710.08936v1 |
http://arxiv.org/pdf/1710.08936v1.pdf | |
PWC | https://paperswithcode.com/paper/curvature-aided-incremental-aggregated |
Repo | |
Framework | |
Atari games and Intel processors
Title | Atari games and Intel processors |
Authors | Robert Adamski, Tomasz Grel, Maciej Klimek, Henryk Michalewski |
Abstract | The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we present our results on learning strategies in Atari games using a Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0 machine learning framework. We also analyze effects of asynchronous computations on the convergence of reinforcement learning algorithms. |
Tasks | Atari Games |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.06936v1 |
http://arxiv.org/pdf/1705.06936v1.pdf | |
PWC | https://paperswithcode.com/paper/atari-games-and-intel-processors |
Repo | |
Framework | |
Deep Neural Network Approximation using Tensor Sketching
Title | Deep Neural Network Approximation using Tensor Sketching |
Authors | Shiva Prasad Kasiviswanathan, Nina Narodytska, Hongxia Jin |
Abstract | Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments. In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network. |
Tasks | |
Published | 2017-10-21 |
URL | http://arxiv.org/abs/1710.07850v1 |
http://arxiv.org/pdf/1710.07850v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-approximation-using |
Repo | |
Framework | |