Paper Group ANR 418
A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA. GPU Activity Prediction using Representation Learning. Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels. A Novel Approach to Forecasting Financial Volatility with Gaussian Process Envelopes. Limitations of …
A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA
Title | A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA |
Authors | Kamel Abdelouahab, Cedric Bourrasset, Maxime Pelcat, François Berry, Jean-Charles Quinton, Jocelyn Serot |
Abstract | Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA. This method was tested when implementing a reconfigurable OCR convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 x 256 resolution using 8% of the available DSP blocks. |
Tasks | Optical Character Recognition |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.09779v1 |
http://arxiv.org/pdf/1703.09779v1.pdf | |
PWC | https://paperswithcode.com/paper/a-holistic-approach-for-optimizing-dsp-block |
Repo | |
Framework | |
GPU Activity Prediction using Representation Learning
Title | GPU Activity Prediction using Representation Learning |
Authors | Aswin Raghavan, Mohamed Amer, Timothy Shields, David Zhang, Sek Chai |
Abstract | GPU activity prediction is an important and complex problem. This is due to the high level of contention among thousands of parallel threads. This problem was mostly addressed using heuristics. We propose a representation learning approach to address this problem. We model any performance metric as a temporal function of the executed instructions with the intuition that the flow of instructions can be identified as distinct activities of the code. Our experiments show high accuracy and non-trivial predictive power of representation learning on a benchmark. |
Tasks | Activity Prediction, Representation Learning |
Published | 2017-03-27 |
URL | http://arxiv.org/abs/1703.09146v1 |
http://arxiv.org/pdf/1703.09146v1.pdf | |
PWC | https://paperswithcode.com/paper/gpu-activity-prediction-using-representation |
Repo | |
Framework | |
Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels
Title | Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels |
Authors | Victor Chernozhukov, Matt Goldman, Vira Semenova, Matt Taddy |
Abstract | There has been growing interest in how economists can import machine learning tools designed for prediction to accelerate and automate the model selection process, while still retaining desirable inference properties for causal parameters. Focusing on partially linear models, we extend the Double ML framework to allow for (1) a number of treatments that may grow with the sample size and (2) the analysis of panel data under sequentially exogenous errors. Our low-dimensional treatment (LD) regime directly extends the work in [Chernozhukov et al., 2016], by showing that the coefficients from a second stage, ordinary least squares estimator attain root-n convergence and desired coverage even if the dimensionality of treatment is allowed to grow. In a high-dimensional sparse (HDS) regime, we show that second stage LASSO and debiased LASSO have asymptotic properties equivalent to oracle estimators with no upstream error. We argue that these advances make Double ML methods a desirable alternative for practitioners estimating short-term demand elasticities in non-contractual settings. |
Tasks | Causal Inference, Model Selection |
Published | 2017-12-28 |
URL | http://arxiv.org/abs/1712.09988v2 |
http://arxiv.org/pdf/1712.09988v2.pdf | |
PWC | https://paperswithcode.com/paper/orthogonal-machine-learning-for-demand |
Repo | |
Framework | |
A Novel Approach to Forecasting Financial Volatility with Gaussian Process Envelopes
Title | A Novel Approach to Forecasting Financial Volatility with Gaussian Process Envelopes |
Authors | Syed Ali Asad Rizvi, Stephen J. Roberts, Michael A. Osborne, Favour Nyikosa |
Abstract | In this paper we use Gaussian Process (GP) regression to propose a novel approach for predicting volatility of financial returns by forecasting the envelopes of the time series. We provide a direct comparison of their performance to traditional approaches such as GARCH. We compare the forecasting power of three approaches: GP regression on the absolute and squared returns; regression on the envelope of the returns and the absolute returns; and regression on the envelope of the negative and positive returns separately. We use a maximum a posteriori estimate with a Gaussian prior to determine our hyperparameters. We also test the effect of hyperparameter updating at each forecasting step. We use our approaches to forecast out-of-sample volatility of four currency pairs over a 2 year period, at half-hourly intervals. From three kernels, we select the kernel giving the best performance for our data. We use two published accuracy measures and four statistical loss functions to evaluate the forecasting ability of GARCH vs GPs. In mean squared error the GP’s perform 20% better than a random walk model, and 50% better than GARCH for the same data. |
Tasks | Time Series |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.00891v1 |
http://arxiv.org/pdf/1705.00891v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-approach-to-forecasting-financial |
Repo | |
Framework | |
Limitations of Cross-Lingual Learning from Image Search
Title | Limitations of Cross-Lingual Learning from Image Search |
Authors | Mareike Hartmann, Anders Soegaard |
Abstract | Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whether the meaning of other parts-of-speech, in particular adjectives and verbs, can be learned in the same way. We also experiment with combining the representations learned from visual data with embeddings learned from textual data. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns. |
Tasks | Image Retrieval, Representation Learning |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.05914v1 |
http://arxiv.org/pdf/1709.05914v1.pdf | |
PWC | https://paperswithcode.com/paper/limitations-of-cross-lingual-learning-from |
Repo | |
Framework | |
Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms
Title | Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms |
Authors | Ahmed M. Alaa, Mihaela van der Schaar |
Abstract | We investigate the problem of estimating the causal effect of a treatment on individual subjects from observational data, this is a central problem in various application domains, including healthcare, social sciences, and online advertising. Within the Neyman Rubin potential outcomes model, we use the Kullback Leibler (KL) divergence between the estimated and true distributions as a measure of accuracy of the estimate, and we define the information rate of the Bayesian causal inference procedure as the (asymptotic equivalence class of the) expected value of the KL divergence between the estimated and true distributions as a function of the number of samples. Using Fano method, we establish a fundamental limit on the information rate that can be achieved by any Bayesian estimator, and show that this fundamental limit is independent of the selection bias in the observational data. We characterize the Bayesian priors on the potential (factual and counterfactual) outcomes that achieve the optimal information rate. As a consequence, we show that a particular class of priors that have been widely used in the causal inference literature cannot achieve the optimal information rate. On the other hand, a broader class of priors can achieve the optimal information rate. We go on to propose a prior adaptation procedure (which we call the information based empirical Bayes procedure) that optimizes the Bayesian prior by maximizing an information theoretic criterion on the recovered causal effects rather than maximizing the marginal likelihood of the observed (factual) data. Building on our analysis, we construct an information optimal Bayesian causal inference algorithm. |
Tasks | Causal Inference |
Published | 2017-12-24 |
URL | http://arxiv.org/abs/1712.08914v2 |
http://arxiv.org/pdf/1712.08914v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-nonparametric-causal-inference |
Repo | |
Framework | |
What does a convolutional neural network recognize in the moon?
Title | What does a convolutional neural network recognize in the moon? |
Authors | Daigo Shoji |
Abstract | Many people see a human face or animals in the pattern of the maria on the moon. Although the pattern corresponds to the actual variation in composition of the lunar surface, the culture and environment of each society influence the recognition of these objects (i.e., symbols) as specific entities. In contrast, a convolutional neural network (CNN) recognizes objects from characteristic shapes in a training data set. Using CNN, this study evaluates the probabilities of the pattern of lunar maria categorized into the shape of a crab, a lion and a hare. If Mare Frigoris (a dark band on the moon) is included in the lunar image, the lion is recognized. However, in an image without Mare Frigoris, the hare has the highest probability of recognition. Thus, the recognition of objects similar to the lunar pattern depends on which part of the lunar maria is taken into account. In human recognition, before we find similarities between the lunar maria and objects such as animals, we may be persuaded in advance to see a particular image from our culture and environment and then adjust the lunar pattern to the shape of the imagined object. |
Tasks | |
Published | 2017-08-18 |
URL | http://arxiv.org/abs/1708.05636v2 |
http://arxiv.org/pdf/1708.05636v2.pdf | |
PWC | https://paperswithcode.com/paper/what-does-a-convolutional-neural-network |
Repo | |
Framework | |
Adaptive Learning to Speed-Up Control of Prosthetic Hands: a Few Things Everybody Should Know
Title | Adaptive Learning to Speed-Up Control of Prosthetic Hands: a Few Things Everybody Should Know |
Authors | Valentina Gregori, Arjan Gijsberts, Barbara Caputo |
Abstract | A number of studies have proposed to use domain adaptation to reduce the training efforts needed to control an upper-limb prosthesis exploiting pre-trained models from prior subjects. These studies generally reported impressive reductions in the required number of training samples to achieve a certain level of accuracy for intact subjects. We further investigate two popular methods in this field to verify whether this result equally applies to amputees. Our findings show instead that this improvement can largely be attributed to a suboptimal hyperparameter configuration. When hyperparameters are appropriately tuned, the standard approach that does not exploit prior information performs on par with the more complicated transfer learning algorithms. Additionally, earlier studies erroneously assumed that the number of training samples relates proportionally to the efforts required from the subject. However, a repetition of a movement is the atomic unit for subjects and the total number of repetitions should therefore be used as reliable measure for training efforts. Also when correcting for this mistake, we do not find any performance increase due to the use of prior models. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08283v1 |
http://arxiv.org/pdf/1702.08283v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-learning-to-speed-up-control-of |
Repo | |
Framework | |
Joint Sentiment/Topic Modeling on Text Data Using Boosted Restricted Boltzmann Machine
Title | Joint Sentiment/Topic Modeling on Text Data Using Boosted Restricted Boltzmann Machine |
Authors | Masoud Fatemi, Mehran Safayani |
Abstract | Recently by the development of the Internet and the Web, different types of social media such as web blogs become an immense source of text data. Through the processing of these data, it is possible to discover practical information about different topics, individuals opinions and a thorough understanding of the society. Therefore, applying models which can automatically extract the subjective information from the documents would be efficient and helpful. Topic modeling methods, also sentiment analysis are the most raised topics in the natural language processing and text mining fields. In this paper a new structure for joint sentiment-topic modeling based on Restricted Boltzmann Machine (RBM) which is a type of neural networks is proposed. By modifying the structure of RBM as well as appending a layer which is analogous to sentiment of text data to it, we propose a generative structure for joint sentiment topic modeling based on neutral networks. The proposed method is supervised and trained by the Contrastive Divergence algorithm. The new attached layer in the proposed model is a layer with the multinomial probability distribution which can be used in text data sentiment classification or any other supervised application. The proposed model is compared with existing models in the experiments such as evaluating as a generative model, sentiment classification, information retrieval and the corresponding results demonstrate the efficiency of the method. |
Tasks | Information Retrieval, Sentiment Analysis |
Published | 2017-11-10 |
URL | http://arxiv.org/abs/1711.03736v1 |
http://arxiv.org/pdf/1711.03736v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-sentimenttopic-modeling-on-text-data |
Repo | |
Framework | |
Maximum Entropy Flow Networks
Title | Maximum Entropy Flow Networks |
Authors | Gabriel Loaiza-Ganem, Yuanjun Gao, John P. Cunningham |
Abstract | Maximum entropy modeling is a flexible and popular framework for formulating statistical models given partial knowledge. In this paper, rather than the traditional method of optimizing over the continuous density directly, we learn a smooth and invertible transformation that maps a simple distribution to the desired maximum entropy distribution. Doing so is nontrivial in that the objective being maximized (entropy) is a function of the density itself. By exploiting recent developments in normalizing flow networks, we cast the maximum entropy problem into a finite-dimensional constrained optimization, and solve the problem by combining stochastic optimization with the augmented Lagrangian method. Simulation results demonstrate the effectiveness of our method, and applications to finance and computer vision show the flexibility and accuracy of using maximum entropy flow networks. |
Tasks | Stochastic Optimization |
Published | 2017-01-12 |
URL | http://arxiv.org/abs/1701.03504v2 |
http://arxiv.org/pdf/1701.03504v2.pdf | |
PWC | https://paperswithcode.com/paper/maximum-entropy-flow-networks |
Repo | |
Framework | |
A trans-disciplinary review of deep learning research for water resources scientists
Title | A trans-disciplinary review of deep learning research for water resources scientists |
Authors | Chaopeng Shen |
Abstract | Deep learning (DL), a new-generation of artificial neural network research, has transformed industries, daily lives and various scientific disciplines in recent years. DL represents significant progress in the ability of neural networks to automatically engineer problem-relevant features and capture highly complex data distributions. I argue that DL can help address several major new and old challenges facing research in water sciences such as inter-disciplinarity, data discoverability, hydrologic scaling, equifinality, and needs for parameter regionalization. This review paper is intended to provide water resources scientists and hydrologists in particular with a simple technical overview, trans-disciplinary progress update, and a source of inspiration about the relevance of DL to water. The review reveals that various physical and geoscientific disciplines have utilized DL to address data challenges, improve efficiency, and gain scientific insights. DL is especially suited for information extraction from image-like data and sequential data. Techniques and experiences presented in other disciplines are of high relevance to water research. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed ‘AI neuroscience,’ where scientists interpret the decision process of deep networks and derive insights, has been born. This budding sub-discipline has demonstrated methods including correlation-based analysis, inversion of network-extracted features, reduced-order approximations by interpretable models, and attribution of network decisions to inputs. Moreover, DL can also use data to condition neurons that mimic problem-specific fundamental organizing units, thus revealing emergent behaviors of these units. Vast opportunities exist for DL to propel advances in water sciences. |
Tasks | |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02162v3 |
http://arxiv.org/pdf/1712.02162v3.pdf | |
PWC | https://paperswithcode.com/paper/a-trans-disciplinary-review-of-deep-learning |
Repo | |
Framework | |
Nearly Maximally Predictive Features and Their Dimensions
Title | Nearly Maximally Predictive Features and Their Dimensions |
Authors | Sarah E. Marzen, James P. Crutchfield |
Abstract | Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper-bounds on the rates at which the number and the coding cost of nearly maximally predictive features scales with desired predictive power. The rates are determined by the fractal dimensions of a process’ mixed-state distribution. These results, in turn, show how widely-used finite-order Markov models can fail as predictors and that mixed-state predictive features offer a substantial improvement. |
Tasks | |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08565v1 |
http://arxiv.org/pdf/1702.08565v1.pdf | |
PWC | https://paperswithcode.com/paper/nearly-maximally-predictive-features-and |
Repo | |
Framework | |
Obtaining Accurate Probabilistic Causal Inference by Post-Processing Calibration
Title | Obtaining Accurate Probabilistic Causal Inference by Post-Processing Calibration |
Authors | Fattaneh Jabbari, Mahdi Pakdaman Naeini, Gregory F. Cooper |
Abstract | Discovery of an accurate causal Bayesian network structure from observational data can be useful in many areas of science. Often the discoveries are made under uncertainty, which can be expressed as probabilities. To guide the use of such discoveries, including directing further investigation, it is important that those probabilities be well-calibrated. In this paper, we introduce a novel framework to derive calibrated probabilities of causal relationships from observational data. The framework consists of three components: (1) an approximate method for generating initial probability estimates of the edge types for each pair of variables, (2) the availability of a relatively small number of the causal relationships in the network for which the truth status is known, which we call a calibration training set, and (3) a calibration method for using the approximate probability estimates and the calibration training set to generate calibrated probabilities for the many remaining pairs of variables. We also introduce a new calibration method based on a shallow neural network. Our experiments on simulated data support that the proposed approach improves the calibration of causal edge predictions. The results also support that the approach often improves the precision and recall of predictions. |
Tasks | Calibration, Causal Inference |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08626v1 |
http://arxiv.org/pdf/1712.08626v1.pdf | |
PWC | https://paperswithcode.com/paper/obtaining-accurate-probabilistic-causal |
Repo | |
Framework | |
Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation
Title | Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation |
Authors | Xin Jin, Peng Yuan, Xiaodong Li, Chenggen Song, Shiming Ge, Geng Zhao, Yingya Chen |
Abstract | A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically. |
Tasks | Object Detection |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08318v2 |
http://arxiv.org/pdf/1702.08318v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-privacy-preserving-viola-jones-type |
Repo | |
Framework | |
Comparing Human and Machine Errors in Conversational Speech Transcription
Title | Comparing Human and Machine Errors in Conversational Speech Transcription |
Authors | Andreas Stolcke, Jasha Droppo |
Abstract | Recent work in automatic recognition of conversational telephone speech (CTS) has achieved accuracy levels comparable to human transcribers, although there is some debate how to precisely quantify human performance on this task, using the NIST 2000 CTS evaluation set. This raises the question what systematic differences, if any, may be found differentiating human from machine transcription errors. In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline. We find that the most frequent substitution, deletion and insertion error types of both outputs show a high degree of overlap. The only notable exception is that the automatic recognizer tends to confuse filled pauses (“uh”) and backchannel acknowledgments (“uhhuh”). Humans tend not to make this error, presumably due to the distinctive and opposing pragmatic functions attached to these words. Furthermore, we quantify the correlation between human and machine errors at the speaker level, and investigate the effect of speaker overlap between training and test data. Finally, we report on an informal “Turing test” asking humans to discriminate between automatic and human transcription error cases. |
Tasks | |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.08615v1 |
http://arxiv.org/pdf/1708.08615v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-human-and-machine-errors-in |
Repo | |
Framework | |