Paper Group ANR 424
DLVM: A modern compiler infrastructure for deep learning systems. Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation. A Fusion-based Gender Recognition Method Using Facial Images. The Forgettable-Watcher Model for Video Question Answering. Semi-Automatic Segmentation and Ultrasonic Characterization of Sol …
DLVM: A modern compiler infrastructure for deep learning systems
Title | DLVM: A modern compiler infrastructure for deep learning systems |
Authors | Richard Wei, Lane Schwartz, Vikram Adve |
Abstract | Deep learning software demands reliability and performance. However, many of the existing deep learning frameworks are software libraries that act as an unsafe DSL in Python and a computation graph interpreter. We present DLVM, a design and implementation of a compiler infrastructure with a linear algebra intermediate representation, algorithmic differentiation by adjoint code generation, domain-specific optimizations and a code generator targeting GPU via LLVM. Designed as a modern compiler infrastructure inspired by LLVM, DLVM is more modular and more generic than existing deep learning compiler frameworks, and supports tensor DSLs with high expressivity. With our prototypical staged DSL embedded in Swift, we argue that the DLVM system enables a form of modular, safe and performant frameworks for deep learning. |
Tasks | Code Generation |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.03016v5 |
http://arxiv.org/pdf/1711.03016v5.pdf | |
PWC | https://paperswithcode.com/paper/dlvm-a-modern-compiler-infrastructure-for |
Repo | |
Framework | |
Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation
Title | Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation |
Authors | Ole-Christoffer Granmo |
Abstract | Bandit based optimisation has a remarkable advantage over gradient based approaches due to their global perspective, which eliminates the danger of getting stuck at local optima. However, for continuous optimisation problems or problems with a large number of actions, bandit based approaches can be hindered by slow learning. Gradient based approaches, on the other hand, navigate quickly in high-dimensional continuous spaces through local optimisation, following the gradient in fine grained steps. Yet, apart from being susceptible to local optima, these schemes are less suited for online learning due to their reliance on extensive trial-and-error before the optimum can be identified. In this paper, we propose a Bayesian approach that unifies the above two paradigms in one single framework, with the aim of combining their advantages. At the heart of our approach we find a stochastic linear approximation of the function to be optimised, where both the gradient and values of the function are explicitly captured. This allows us to learn from both noisy function and gradient observations, and predict these properties across the action space to support optimisation. We further propose an accompanying bandit driven exploration scheme that uses Bayesian credible bounds to trade off exploration against exploitation. Our empirical results demonstrate that by unifying bandit and gradient based learning, one obtains consistently improved performance across a wide spectrum of problem environments. Furthermore, even when gradient feedback is unavailable, the flexibility of our model, including gradient prediction, still allows us outperform competing approaches, although with a smaller margin. Due to the pervasiveness of bandit based optimisation, our scheme opens up for improved performance both in meta-optimisation and in applications where gradient related information is readily available. |
Tasks | |
Published | 2017-05-28 |
URL | http://arxiv.org/abs/1705.09922v1 |
http://arxiv.org/pdf/1705.09922v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-unification-of-gradient-and-bandit |
Repo | |
Framework | |
A Fusion-based Gender Recognition Method Using Facial Images
Title | A Fusion-based Gender Recognition Method Using Facial Images |
Authors | Benyamin Ghojogh, Saeed Bagheri Shouraki, Hoda Mohammadzade, Ensieh Iranmehr |
Abstract | This paper proposes a fusion-based gender recognition method which uses facial images as input. Firstly, this paper utilizes pre-processing and a landmark detection method in order to find the important landmarks of faces. Thereafter, four different frameworks are proposed which are inspired by state-of-the-art gender recognition systems. The first framework extracts features using Local Binary Pattern (LBP) and Principal Component Analysis (PCA) and uses back propagation neural network. The second framework uses Gabor filters, PCA, and kernel Support Vector Machine (SVM). The third framework uses lower part of faces as input and classifies them using kernel SVM. The fourth framework uses Linear Discriminant Analysis (LDA) in order to classify the side outline landmarks of faces. Finally, the four decisions of frameworks are fused using weighted voting. This paper takes advantage of both texture and geometrical information, the two dominant types of information in facial gender recognition. Experimental results show the power and effectiveness of the proposed method. This method obtains recognition rate of 94% for neutral faces of FEI face dataset, which is equal to state-of-the-art rate for this dataset. |
Tasks | |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06451v1 |
http://arxiv.org/pdf/1711.06451v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fusion-based-gender-recognition-method |
Repo | |
Framework | |
The Forgettable-Watcher Model for Video Question Answering
Title | The Forgettable-Watcher Model for Video Question Answering |
Authors | Hongyang Xue, Zhou Zhao, Deng Cai |
Abstract | A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored. Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models. |
Tasks | Question Answering, Question Generation, Video Question Answering, Visual Question Answering |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01253v1 |
http://arxiv.org/pdf/1705.01253v1.pdf | |
PWC | https://paperswithcode.com/paper/the-forgettable-watcher-model-for-video |
Repo | |
Framework | |
Semi-Automatic Segmentation and Ultrasonic Characterization of Solid Breast Lesions
Title | Semi-Automatic Segmentation and Ultrasonic Characterization of Solid Breast Lesions |
Authors | Mohammad Saad Billah, Tahmida Binte Mahmud |
Abstract | Characterization of breast lesions is an essential prerequisite to detect breast cancer in an early stage. Automatic segmentation makes this categorization method robust by freeing it from subjectivity and human error. Both spectral and morphometric features are successfully used for differentiating between benign and malignant breast lesions. In this thesis, we used empirical mode decomposition method for semi-automatic segmentation. Sonographic features like ehcogenicity, heterogeneity, FNPA, margin definition, Hurst coefficient, compactness, roundness, aspect ratio, convexity, solidity, form factor were calculated to be used as our characterization parameters. All of these parameters did not give desired comparative results. But some of them namely echogenicity, heterogeneity, margin definition, aspect ratio and convexity gave good results and were used for characterization. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08238v1 |
http://arxiv.org/pdf/1703.08238v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-automatic-segmentation-and-ultrasonic |
Repo | |
Framework | |
Ordered Pooling of Optical Flow Sequences for Action Recognition
Title | Ordered Pooling of Optical Flow Sequences for Action Recognition |
Authors | Jue Wang, Anoop Cherian, Fatih Porikli |
Abstract | Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand. Early fusion of video frames is thus a standard technique, in which several consecutive frames are first agglomerated into a compact representation, and then fed into the CNN as an input sample. For this purpose, a summarization approach that represents a set of consecutive RGB frames by a single dynamic image to capture pixel dynamics is proposed recently. In this paper, we introduce a novel ordered representation of consecutive optical flow frames as an alternative and argue that this representation captures the action dynamics more effectively than RGB frames. We provide intuitions on why such a representation is better for action recognition. We validate our claims on standard benchmark datasets and demonstrate that using summaries of flow images lead to significant improvements over RGB frames while achieving accuracy comparable to the state-of-the-art on UCF101 and HMDB datasets. |
Tasks | Optical Flow Estimation, Temporal Action Localization |
Published | 2017-01-12 |
URL | http://arxiv.org/abs/1701.03246v2 |
http://arxiv.org/pdf/1701.03246v2.pdf | |
PWC | https://paperswithcode.com/paper/ordered-pooling-of-optical-flow-sequences-for |
Repo | |
Framework | |
Learning to Decode for Future Success
Title | Learning to Decode for Future Success |
Authors | Jiwei Li, Will Monroe, Dan Jurafsky |
Abstract | We introduce a simple, general strategy to manipulate the behavior of a neural decoder that enables it to generate outputs that have specific properties of interest (e.g., sequences of a pre-specified length). The model can be thought of as a simple version of the actor-critic model that uses an interpolation of the actor (the MLE-based token generation policy) and the critic (a value function that estimates the future values of the desired property) for decision making. We demonstrate that the approach is able to incorporate a variety of properties that cannot be handled by standard neural sequence decoders, such as sequence length and backward probability (probability of sources given targets), in addition to yielding consistent improvements in abstractive summarization and machine translation when the property to be optimized is BLEU or ROUGE scores. |
Tasks | Abstractive Text Summarization, Decision Making, Machine Translation |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06549v2 |
http://arxiv.org/pdf/1701.06549v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-decode-for-future-success |
Repo | |
Framework | |
Generative Convolutional Networks for Latent Fingerprint Reconstruction
Title | Generative Convolutional Networks for Latent Fingerprint Reconstruction |
Authors | Jan Svoboda, Federico Monti, Michael M. Bronstein |
Abstract | Performance of fingerprint recognition depends heavily on the extraction of minutiae points. Enhancement of the fingerprint ridge pattern is thus an essential pre-processing step that noticeably reduces false positive and negative detection rates. A particularly challenging setting is when the fingerprint images are corrupted or partially missing. In this work, we apply generative convolutional networks to denoise visible minutiae and predict the missing parts of the ridge pattern. The proposed enhancement approach is tested as a pre-processing step in combination with several standard feature extraction methods such as MINDTCT, followed by biometric comparison using MCC and BOZORTH3. We evaluate our method on several publicly available latent fingerprint datasets captured using different sensors. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01707v1 |
http://arxiv.org/pdf/1705.01707v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-convolutional-networks-for-latent |
Repo | |
Framework | |
The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI
Title | The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI |
Authors | Zhao Chen, Darvin Yi |
Abstract | We present a vision-only model for gaming AI which uses a late integration deep convolutional network architecture trained in a purely supervised imitation learning context. Although state-of-the-art deep learning models for video game tasks generally rely on more complex methods such as deep-Q learning, we show that a supervised model which requires substantially fewer resources and training time can already perform well at human reaction speeds on the N64 classic game Super Smash Bros. We frame our learning task as a 30-class classification problem, and our CNN model achieves 80% top-1 and 95% top-3 validation accuracy. With slight test-time fine-tuning, our model is also competitive during live simulation with the highest-level AI built into the game. We will further show evidence through network visualizations that the network is successfully leveraging temporal information during inference to aid in decision making. Our work demonstrates that supervised CNN models can provide good performance in challenging policy prediction tasks while being significantly simpler and more lightweight than alternatives. |
Tasks | Decision Making, Imitation Learning, Q-Learning |
Published | 2017-02-18 |
URL | http://arxiv.org/abs/1702.05663v1 |
http://arxiv.org/pdf/1702.05663v1.pdf | |
PWC | https://paperswithcode.com/paper/the-game-imitation-deep-supervised |
Repo | |
Framework | |
Computational ghost imaging using deep learning
Title | Computational ghost imaging using deep learning |
Authors | Tomoyoshi Shimobaba, Yutaka Endo, Takashi Nishitsuji, Takayuki Takahashi, Yuki Nagahama, Satoki Hasegawa, Marie Sano, Ryuji Hirayama, Takashi Kakue, Atsushi Shiraki, Tomoyoshi Ito |
Abstract | Computational ghost imaging (CGI) is a single-pixel imaging technique that exploits the correlation between known random patterns and the measured intensity of light transmitted (or reflected) by an object. Although CGI can obtain two- or three- dimensional images with a single or a few bucket detectors, the quality of the reconstructed images is reduced by noise due to the reconstruction of images from random patterns. In this study, we improve the quality of CGI images using deep learning. A deep neural network is used to automatically learn the features of noise-contaminated CGI images. After training, the network is able to predict low-noise images from new noise-contaminated CGI images. |
Tasks | |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.08343v1 |
http://arxiv.org/pdf/1710.08343v1.pdf | |
PWC | https://paperswithcode.com/paper/computational-ghost-imaging-using-deep |
Repo | |
Framework | |
Learning to Segment Human by Watching YouTube
Title | Learning to Segment Human by Watching YouTube |
Authors | Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan |
Abstract | An intuition on human segmentation is that when a human is moving in a video, the video-context (e.g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body. Inspired by this, based on popular deep convolutional neural networks (CNN), we explore a very-weakly supervised learning framework for human segmentation task, where only an imperfect human detector is available along with massive weakly-labeled YouTube videos. In our solution, the video-context guided human mask inference and CNN based segmentation network learning iterate to mutually enhance each other until no further improvement gains. In the first step, each video is decomposed into supervoxels by the unsupervised video segmentation. The superpixels within the supervoxels are then classified as human or non-human by graph optimization with unary energies from the imperfect human detection results and the predicted confidence maps by the CNN trained in the previous iteration. In the second step, the video-context derived human masks are used as direct labels to train CNN. Extensive experiments on the challenging PASCAL VOC 2012 semantic segmentation benchmark demonstrate that the proposed framework has already achieved superior results than all previous weakly-supervised methods with object class or bounding box annotations. In addition, by augmenting with the annotated masks from PASCAL VOC 2012, our method reaches a new state-of-the-art performance on the human segmentation task. |
Tasks | Human Detection, Semantic Segmentation, Video Semantic Segmentation |
Published | 2017-10-04 |
URL | http://arxiv.org/abs/1710.01457v2 |
http://arxiv.org/pdf/1710.01457v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-segment-human-by-watching-youtube |
Repo | |
Framework | |
Local Shrunk Discriminant Analysis (LSDA)
Title | Local Shrunk Discriminant Analysis (LSDA) |
Authors | Zan Gao, Guotai Zhang, Feiping Nie, Hua Zhang |
Abstract | Dimensionality reduction is a crucial step for pattern recognition and data mining tasks to overcome the curse of dimensionality. Principal component analysis (PCA) is a traditional technique for unsupervised dimensionality reduction, which is often employed to seek a projection to best represent the data in a least-squares sense, but if the original data is nonlinear structure, the performance of PCA will quickly drop. An supervised dimensionality reduction algorithm called Linear discriminant analysis (LDA) seeks for an embedding transformation, which can work well with Gaussian distribution data or single-modal data, but for non-Gaussian distribution data or multimodal data, it gives undesired results. What is worse, the dimension of LDA cannot be more than the number of classes. In order to solve these issues, Local shrunk discriminant analysis (LSDA) is proposed in this work to process the non-Gaussian distribution data or multimodal data, which not only incorporate both the linear and nonlinear structures of original data, but also learn the pattern shrinking to make the data more flexible to fit the manifold structure. Further, LSDA has more strong generalization performance, whose objective function will become local LDA and traditional LDA when different extreme parameters are utilized respectively. What is more, a new efficient optimization algorithm is introduced to solve the non-convex objective function with low computational cost. Compared with other related approaches, such as PCA, LDA and local LDA, the proposed method can derive a subspace which is more suitable for non-Gaussian distribution and real data. Promising experimental results on different kinds of data sets demonstrate the effectiveness of the proposed approach. |
Tasks | Dimensionality Reduction |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01206v1 |
http://arxiv.org/pdf/1705.01206v1.pdf | |
PWC | https://paperswithcode.com/paper/local-shrunk-discriminant-analysis-lsda |
Repo | |
Framework | |
Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning
Title | Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning |
Authors | Jeremias Sulam, Vardan Papyan, Yaniv Romano, Michael Elad |
Abstract | The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors – or feature maps – from a given input signal. Despite having served as a pivotal connection between CNNs and sparse modeling, a deeper understanding of the ML-CSC is still lacking: there are no pursuit algorithms that can serve this model exactly, nor are there conditions to guarantee a non-empty model. While one can easily obtain signals that approximately satisfy the ML-CSC constraints, it remains unclear how to simply sample from the model and, more importantly, how one can train the convolutional filters from real data. In this work, we propose a sound pursuit algorithm for the ML-CSC model by adopting a projection approach. We provide new and improved bounds on the stability of the solution of such pursuit and we analyze different practical alternatives to implement this in practice. We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers. Last, but not least, we demonstrate the applicability of the ML-CSC model for several applications in an unsupervised setting, providing competitive results. Our work represents a bridge between matrix factorization, sparse dictionary learning and sparse auto-encoders, and we analyze these connections in detail. |
Tasks | Dictionary Learning |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.08705v2 |
http://arxiv.org/pdf/1708.08705v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-layer-convolutional-sparse-modeling |
Repo | |
Framework | |
Cost-Based Intuitionist Probabilities on Spaces of Graphs, Hypergraphs and Theorems
Title | Cost-Based Intuitionist Probabilities on Spaces of Graphs, Hypergraphs and Theorems |
Authors | Ben Goertzel |
Abstract | A novel partial order is defined on the space of digraphs or hypergraphs, based on assessing the cost of producing a graph via a sequence of elementary transformations. Leveraging work by Knuth and Skilling on the foundations of inference, and the structure of Heyting algebras on graph space, this partial order is used to construct an intuitionistic probability measure that applies to either digraphs or hypergraphs. As logical inference steps can be represented as transformations on hypergraphs representing logical statements, this also yields an intuitionistic probability measure on spaces of theorems. The central result is also extended to yield intuitionistic probabilities based on more general weighted rule systems defined over bicartesian closed categories. |
Tasks | |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04382v1 |
http://arxiv.org/pdf/1703.04382v1.pdf | |
PWC | https://paperswithcode.com/paper/cost-based-intuitionist-probabilities-on |
Repo | |
Framework | |
Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument
Title | Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument |
Authors | Sebastian Benthall |
Abstract | In recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become “superintelligent” and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent’s ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage. |
Tasks | |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08495v2 |
http://arxiv.org/pdf/1702.08495v2.pdf | |
PWC | https://paperswithcode.com/paper/dont-fear-the-reaper-refuting-bostroms |
Repo | |
Framework | |