July 28, 2019

2858 words 14 mins read

Paper Group ANR 261

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction. Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos. Konzept für Bildanalysen in Hochdurchsatz-Systemen am Beispiel des Zebrabärblings. Scaling Binarized Neural Networks on Reconfigurable Logic. Recurrent …

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction


Title	Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction
Authors	Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Na, Yong Wang, Yunpeng Wang
Abstract	This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
Tasks
Published	2017-01-16
URL	http://arxiv.org/abs/1701.04245v4
PDF	http://arxiv.org/pdf/1701.04245v4.pdf
PWC	https://paperswithcode.com/paper/learning-traffic-as-images-a-deep
Repo
Framework

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos


Title	Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
Authors	De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles
Abstract	We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e.g., “dressing”) to the action (e.g., “mix yogurt”) that produced it. The key challenge is the inevitable visual-linguistic ambiguities arising from the changes in both visual appearance and referring expression of an entity in the video. This challenge is amplified by the fact that we aim to resolve references with no supervision. We address these challenges by learning a joint visual-linguistic model, where linguistic cues can help resolve visual ambiguities and vice versa. We verify our approach by learning our model unsupervisedly using more than two thousand unstructured cooking videos from YouTube, and show that our visual-linguistic model can substantially improve upon state-of-the-art linguistic only model on reference resolution in instructional videos.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02521v2
PDF	http://arxiv.org/pdf/1703.02521v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-visual-linguistic-reference
Repo
Framework

Konzept für Bildanalysen in Hochdurchsatz-Systemen am Beispiel des Zebrabärblings


Title	Konzept für Bildanalysen in Hochdurchsatz-Systemen am Beispiel des Zebrabärblings
Authors	Rüdiger Alshut
Abstract	With image-based high-throughput experiments, new challenges arise in both, the design of experiments and the automated analysis. To be able to handle the massive number of single experiments and the corresponding amount of data, a comprehensive concept for the design of experiments and a new evaluation method is needed. This work proposes a new method for an optimized experiment layout that enables the determination of parameters, adapted for the needs of automated image analysis. Furthermore, a catalogue of new image analysis modules, especially developed for zebrafish analysis, is presented. The combination of both parts offers the user, usually a biologist, an approach for high-throughput zebrafish image analysis, which enables the extraction of new signals and optimizes the design of experiments. The result is a reduction of data amount, redundant information and workload as well as classification errors.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1705.02962v1
PDF	http://arxiv.org/pdf/1705.02962v1.pdf
PWC	https://paperswithcode.com/paper/konzept-fur-bildanalysen-in-hochdurchsatz
Repo
Framework

Scaling Binarized Neural Networks on Reconfigurable Logic


Title	Scaling Binarized Neural Networks on Reconfigurable Logic
Authors	Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
Abstract	Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. They are particularly well suited to reconfigurable logic devices, which contain an abundance of fine-grained compute resources and can result in smaller, lower power implementations, or conversely in higher classification rates. Towards this end, the Finn framework was recently proposed for building fast and flexible field programmable gate array (FPGA) accelerators for BNNs. Finn utilized a novel set of optimizations that enable efficient mapping of BNNs to hardware and implemented fully connected, non-padded convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. However, FINN was not evaluated on larger topologies due to the size of the chosen FPGA, and exhibited decreased accuracy due to lack of padding. In this paper, we improve upon Finn to show how padding can be employed on BNNs while still maintaining a 1-bit datapath and high accuracy. Based on this technique, we demonstrate numerous experiments to illustrate flexibility and scalability of the approach. In particular, we show that a large BNN requiring 1.2 billion operations per frame running on an ADM-PCIE-8K5 platform can classify images at 12 kFPS with 671 us latency while drawing less than 41 W board power and classifying CIFAR-10 images at 88.7% accuracy. Our implementation of this network achieves 14.8 trillion operations per second. We believe this is the fastest classification rate reported to date on this benchmark at this level of accuracy.
Tasks
Published	2017-01-12
URL	http://arxiv.org/abs/1701.03400v2
PDF	http://arxiv.org/pdf/1701.03400v2.pdf
PWC	https://paperswithcode.com/paper/scaling-binarized-neural-networks-on
Repo
Framework

Recurrent Residual Learning for Action Recognition


Title	Recurrent Residual Learning for Action Recognition
Authors	Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall
Abstract	Action recognition is a fundamental problem in computer vision with a lot of potential applications such as video surveillance, human computer interaction, and robot learning. Given pre-segmented videos, the task is to recognize actions happening within videos. Historically, hand crafted video features were used to address the task of action recognition. With the success of Deep ConvNets as an image analysis method, a lot of extensions of standard ConvNets were purposed to process variable length video data. In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition. The approach extends ResNet, a state of the art model for image classification. While the original formulation of ResNet aims at learning spatial residuals in its layers, we extend the approach by introducing recurrent connections that allow to learn a spatio-temporal residual. In contrast to fully recurrent networks, our temporal connections only allow a limited range of preceding frames to contribute to the output for the current frame, enabling efficient training and inference as well as limiting the temporal context to a reasonable local range around each frame. On a large-scale action recognition dataset, we show that our model improves over both, the standard ResNet architecture and a ResNet extended by a fully recurrent layer.
Tasks	Image Classification, Temporal Action Localization
Published	2017-06-27
URL	http://arxiv.org/abs/1706.08807v1
PDF	http://arxiv.org/pdf/1706.08807v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-residual-learning-for-action
Repo
Framework

Rare Disease Physician Targeting: A Factor Graph Approach


Title	Rare Disease Physician Targeting: A Factor Graph Approach
Authors	Yong Cai, Yunlong Wang, Dong Dai
Abstract	In rare disease physician targeting, a major challenge is how to identify physicians who are treating diagnosed or underdiagnosed rare diseases patients. Rare diseases have extremely low incidence rate. For a specified rare disease, only a small number of patients are affected and a fractional of physicians are involved. The existing targeting methodologies, such as segmentation and profiling, are developed under mass market assumption. They are not suitable for rare disease market where the target classes are extremely imbalanced. The authors propose a graphical model approach to predict targets by jointly modeling physician and patient features from different data spaces and utilizing the extra relational information. Through an empirical example with medical claim and prescription data, the proposed approach demonstrates better accuracy in finding target physicians. The graph representation also provides visual interpretability of relationship among physicians and patients. The model can be extended to incorporate more complex dependency structures. This article contributes to the literature of exploring the benefit of utilizing relational dependencies among entities in healthcare industry.
Tasks
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05644v1
PDF	http://arxiv.org/pdf/1701.05644v1.pdf
PWC	https://paperswithcode.com/paper/rare-disease-physician-targeting-a-factor
Repo
Framework

Knowledge Transfer Between Artificial Intelligence Systems


Title	Knowledge Transfer Between Artificial Intelligence Systems
Authors	Ivan Y. Tyukin, Alexander N. Gorban, Konstantin Sofeikov, Ilya Romanenko
Abstract	We consider the fundamental question: how a legacy “student” Artificial Intelligent (AI) system could learn from a legacy “teacher” AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here “learning” is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the “student” Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the “student” system can successfully and non-iteratively learn $k\ll n$ new examples from the “teacher” (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.
Tasks	Transfer Learning
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01547v2
PDF	http://arxiv.org/pdf/1709.01547v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-transfer-between-artificial
Repo
Framework

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making


Title	Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making
Authors	Qi Zhang, Satinder Singh, Edmund Durfee
Abstract	In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent’s detailed behavior into account. Extending previous work in the Bayesian setting, we consider instead a worst-case setting in which the agent has a set of possible environments (MDPs) it could be in, and develop a commitment semantics that allows for probabilistic guarantees on the agent’s behavior in any of the environments it could end up facing. Crucially, an agent receives observations (of reward and state transitions) that allow it to potentially eliminate possible environments and thus obtain higher utility by adapting its policy to the history of observations. We develop algorithms and provide theory and some preliminary empirical results showing that they ensure an agent meets its commitments with history-dependent policies while minimizing maximum regret over the possible environments.
Tasks	Decision Making
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04587v1
PDF	http://arxiv.org/pdf/1703.04587v1.pdf
PWC	https://paperswithcode.com/paper/minimizing-maximum-regret-in-commitment
Repo
Framework

Exact Camera Location Recovery by Least Unsquared Deviations


Title	Exact Camera Location Recovery by Least Unsquared Deviations
Authors	Gilad Lerman, Yunpeng Shi, Teng Zhang
Abstract	We establish exact recovery for the Least Unsquared Deviations (LUD) algorithm of Ozyesil and Singer. More precisely, we show that for sufficiently many cameras with given corrupted pairwise directions, where both camera locations and pairwise directions are generated by a special probabilistic model, the LUD algorithm exactly recovers the camera locations with high probability. A similar exact recovery guarantee was established for the ShapeFit algorithm by Hand, Lee and Voroninski, but with typically less corruption.
Tasks
Published	2017-09-27
URL	http://arxiv.org/abs/1709.09683v4
PDF	http://arxiv.org/pdf/1709.09683v4.pdf
PWC	https://paperswithcode.com/paper/exact-camera-location-recovery-by-least
Repo
Framework

Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion


Title	Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion
Authors	Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong
Abstract	Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deep-based framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for representing actions, and 2) reducing the asynchrony between different information streams. We first introduce a coarse-to-fine network which extracts shared deep features at different action class granularities and progressively integrates them to obtain a more accurate feature representation for input actions. We further introduce an asynchronous fusion network. It fuses information from different streams by asynchronously integrating stream-wise features at different time points, hence better leveraging the complementary information in different streams. Experimental results on action recognition benchmarks demonstrate that our approach achieves the state-of-the-art performance.
Tasks	Temporal Action Localization
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07430v1
PDF	http://arxiv.org/pdf/1711.07430v1.pdf
PWC	https://paperswithcode.com/paper/action-recognition-with-coarse-to-fine-deep
Repo
Framework

Negative Results in Computer Vision: A Perspective


Title	Negative Results in Computer Vision: A Perspective
Authors	Ali Borji
Abstract	A negative result is when the outcome of an experiment or a model is not what is expected or when a hypothesis does not hold. Despite being often overlooked in the scientific community, negative results are results and they carry value. While this topic has been extensively discussed in other fields such as social sciences and biosciences, less attention has been paid to it in the computer vision community. The unique characteristics of computer vision, particularly its experimental aspect, call for a special treatment of this matter. In this paper, I will address what makes negative results important, how they should be disseminated and incentivized, and what lessons can be learned from cognitive vision research in this regard. Further, I will discuss issues such as computer vision and human vision interaction, experimental design and statistical hypothesis testing, explanatory versus predictive modeling, performance evaluation, model comparison, as well as computer vision research culture.
Tasks
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04402v3
PDF	http://arxiv.org/pdf/1705.04402v3.pdf
PWC	https://paperswithcode.com/paper/negative-results-in-computer-vision-a
Repo
Framework

Modified Alpha-Rooting Color Image Enhancement Method On The Two-Side 2-D Quaternion Discrete Fourier Transform And The 2-D Discrete Fourier Transform


Title	Modified Alpha-Rooting Color Image Enhancement Method On The Two-Side 2-D Quaternion Discrete Fourier Transform And The 2-D Discrete Fourier Transform
Authors	Artyom M. Grigoryan, Aparna John, Sos S. Agaian
Abstract	Color in an image is resolved into 3 or 4 color components and 2-Dimages of these components are stored in separate channels. Most of the color image enhancement algorithms are applied channel-by-channel on each image. But such a system of color image processing is not processing the original color. When a color image is represented as a quaternion image, processing is done in original colors. This paper proposes an implementation of the quaternion approach of enhancement algorithm for enhancing color images and is referred as the modified alpha-rooting by the two-dimensional quaternion discrete Fourier transform (2-D QDFT). Enhancement results of this proposed method are compared with the channel-by-channel image enhancement by the 2-D DFT. Enhancements in color images are quantitatively measured by the color enhancement measure estimation (CEME), which allows for selecting optimum parameters for processing by the genetic algorithm. Enhancement of color images by the quaternion based method allows for obtaining images which are closer to the genuine representation of the real original color.
Tasks	Image Enhancement
Published	2017-07-15
URL	http://arxiv.org/abs/1707.04781v1
PDF	http://arxiv.org/pdf/1707.04781v1.pdf
PWC	https://paperswithcode.com/paper/modified-alpha-rooting-color-image
Repo
Framework

Improving Visually Grounded Sentence Representations with Self-Attention


Title	Improving Visually Grounded Sentence Representations with Self-Attention
Authors	Kang Min Yoo, Youhyun Shin, Sang-goo Lee
Abstract	Sentence representation models trained only on language could potentially suffer from the grounding problem. Recent work has shown promising results in improving the qualities of sentence representations by jointly training them with associated image features. However, the grounding capability is limited due to distant connection between input sentences and image features by the design of the architecture. In order to further close the gap, we propose applying self-attention mechanism to the sentence encoder to deepen the grounding effect. Our results on transfer tasks show that self-attentive encoders are better for visual grounding, as they exploit specific words with strong visual associations.
Tasks
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00609v1
PDF	http://arxiv.org/pdf/1712.00609v1.pdf
PWC	https://paperswithcode.com/paper/improving-visually-grounded-sentence
Repo
Framework

A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines


Title	A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines
Authors	Eric W. Tramel, Marylou Gabrié, Andre Manoel, Francesco Caltagirone, Florent Krzakala
Abstract	Restricted Boltzmann machines (RBMs) are energy-based neural-networks which are commonly used as the building blocks for deep architectures neural architectures. In this work, we derive a deterministic framework for the training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer (TAP) mean-field approximation of widely-connected systems with weak interactions coming from spin-glass theory. While the TAP approach has been extensively studied for fully-visible binary spin systems, our construction is generalized to latent-variable models, as well as to arbitrarily distributed real-valued spin systems with bounded support. In our numerical experiments, we demonstrate the effective deterministic training of our proposed models and are able to show interesting features of unsupervised learning which could not be directly observed with sampling. Additionally, we demonstrate how to utilize our TAP-based framework for leveraging trained RBMs as joint priors in denoising problems.
Tasks	Denoising, Latent Variable Models
Published	2017-02-10
URL	http://arxiv.org/abs/1702.03260v3
PDF	http://arxiv.org/pdf/1702.03260v3.pdf
PWC	https://paperswithcode.com/paper/a-deterministic-and-generalized-framework-for
Repo
Framework

On Quantum Decision Trees


Title	On Quantum Decision Trees
Authors	Subhash Kak
Abstract	Quantum decision systems are being increasingly considered for use in artificial intelligence applications. Classical and quantum nodes can be distinguished based on certain correlations in their states. This paper investigates some properties of the states obtained in a decision tree structure. How these correlations may be mapped to the decision tree is considered. Classical tree representations and approximations to quantum states are provided.
Tasks
Published	2017-03-08
URL	http://arxiv.org/abs/1703.03693v1
PDF	http://arxiv.org/pdf/1703.03693v1.pdf
PWC	https://paperswithcode.com/paper/on-quantum-decision-trees
Repo
Framework