July 27, 2019

3365 words 16 mins read

Paper Group ANR 644

Impression Network for Video Object Detection. Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models. Descriptions of Objectives and Processes of Mechanical Learning. Formal approaches to a definition of agents. Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware. CNN- …

Impression Network for Video Object Detection


Title	Impression Network for Video Object Detection
Authors	Congrui Hetang, Hongwei Qin, Shaohui Liu, Junjie Yan
Abstract	Video object detection is more challenging compared to image object detection. Previous works proved that applying object detector frame by frame is not only slow but also inaccurate. Visual clues get weakened by defocus and motion blur, causing failure on corresponding frames. Multi-frame feature fusion methods proved effective in improving the accuracy, but they dramatically sacrifice the speed. Feature propagation based methods proved effective in improving the speed, but they sacrifice the accuracy. So is it possible to improve speed and performance simultaneously? Inspired by how human utilize impression to recognize objects from blurry frames, we propose Impression Network that embodies a natural and efficient feature aggregation mechanism. In our framework, an impression feature is established by iteratively absorbing sparsely extracted frame features. The impression feature is propagated all the way down the video, helping enhance features of low-quality frames. This impression mechanism makes it possible to perform long-range multi-frame feature fusion among sparse keyframes with minimal overhead. It significantly improves per-frame detection baseline on ImageNet VID while being 3 times faster (20 fps). We hope Impression Network can provide a new perspective on video feature enhancement. Code will be made available.
Tasks	Object Detection, Video Object Detection
Published	2017-12-16
URL	http://arxiv.org/abs/1712.05896v1
PDF	http://arxiv.org/pdf/1712.05896v1.pdf
PWC	https://paperswithcode.com/paper/impression-network-for-video-object-detection
Repo
Framework

Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models


Title	Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models
Authors	Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, Lenka Zdeborová
Abstract	Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Non-rigorous predictions for the optimal errors existed for special cases of GLMs, e.g. for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance, and locate the associated sharp phase transitions separating learnable and non-learnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multi-purpose algorithms. This paper is divided in two parts that can be read independently: The first part (main part) presents the model and main results, discusses some applications and sketches the main ideas of the proof. The second part (supplementary informations) is much more detailed and provides more examples as well as all the proofs.
Tasks
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03395v3
PDF	http://arxiv.org/pdf/1708.03395v3.pdf
PWC	https://paperswithcode.com/paper/optimal-errors-and-phase-transitions-in-high
Repo
Framework

Descriptions of Objectives and Processes of Mechanical Learning


Title	Descriptions of Objectives and Processes of Mechanical Learning
Authors	Chuyu Xiong
Abstract	In [1], we introduced mechanical learning and proposed 2 approaches to mechanical learning. Here, we follow one such approach to well describe the objects and the processes of learning. We discuss 2 kinds of patterns: objective and subjective pattern. Subjective pattern is crucial for learning machine. We prove that for any objective pattern we can find a proper subjective pattern based upon least base patterns to express the objective pattern well. X-form is algebraic expression for subjective pattern. Collection of X-forms form internal representation space, which is center of learning machine. We discuss learning by teaching and without teaching. We define data sufficiency by X-form. We then discussed some learning strategies. We show, in each strategy, with sufficient data, and with certain capabilities, learning machine indeed can learn any pattern (universal learning machine). In appendix, with knowledge of learning machine, we try to view deep learning from a different angle, i.e. its internal representation space and its learning dynamics.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00066v1
PDF	http://arxiv.org/pdf/1706.00066v1.pdf
PWC	https://paperswithcode.com/paper/descriptions-of-objectives-and-processes-of
Repo
Framework

Formal approaches to a definition of agents


Title	Formal approaches to a definition of agents
Authors	Martin Biehl
Abstract	This thesis contributes to the formalisation of the notion of an agent within the class of finite multivariate Markov chains. Agents are seen as entities that act, perceive, and are goal-directed. We present a new measure that can be used to identify entities (called $\iota$-entities), some general requirements for entities in multivariate Markov chains, as well as formal definitions of actions and perceptions suitable for such entities. The intuition behind $\iota$-entities is that entities are spatiotemporal patterns for which every part makes every other part more probable. The measure, complete local integration (CLI), is formally investigated in general Bayesian networks. It is based on the specific local integration (SLI) which is measured with respect to a partition. CLI is the minimum value of SLI over all partitions. We prove that $\iota$-entities are blocks in specific partitions of the global trajectory. These partitions are the finest partitions that achieve a given SLI value. We also establish the transformation behaviour of SLI under permutations of nodes in the network. We go on to present three conditions on general definitions of entities. These are not fulfilled by sets of random variables i.e.\ the perception-action loop, which is often used to model agents, is too restrictive. We propose that any general entity definition should in effect specify a subset (called an an entity-set) of the set of all spatiotemporal patterns of a given multivariate Markov chain. The set of $\iota$-entities is such a set. Importantly the perception-action loop also induces an entity-set. We then propose formal definitions of actions and perceptions for arbitrary entity-sets. These specialise to standard notions in case of the perception-action loop entity-set. Finally we look at some very simple examples.
Tasks
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02716v1
PDF	http://arxiv.org/pdf/1704.02716v1.pdf
PWC	https://paperswithcode.com/paper/formal-approaches-to-a-definition-of-agents
Repo
Framework

Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware


Title	Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware
Authors	Mihai A. Petrovici, Anna Schroeder, Oliver Breitwieser, Andreas Grübl, Johannes Schemmel, Karlheinz Meier
Abstract	How spiking networks are able to perform probabilistic inference is an intriguing question, not only for understanding information processing in the brain, but also for transferring these computational principles to neuromorphic silicon circuits. A number of computationally powerful spiking network models have been proposed, but most of them have only been tested, under ideal conditions, in software simulations. Any implementation in an analog, physical system, be it in vivo or in silico, will generally lead to distorted dynamics due to the physical properties of the underlying substrate. In this paper, we discuss several such distortive effects that are difficult or impossible to remove by classical calibration routines or parameter training. We then argue that hierarchical networks of leaky integrate-and-fire neurons can offer the required robustness for physical implementation and demonstrate this with both software simulations and emulation on an accelerated analog neuromorphic device.
Tasks	Calibration
Published	2017-03-12
URL	http://arxiv.org/abs/1703.04145v1
PDF	http://arxiv.org/pdf/1703.04145v1.pdf
PWC	https://paperswithcode.com/paper/robustness-from-structure-inference-with
Repo
Framework

CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction


Title	CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction
Authors	Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab
Abstract	Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM. Our fusion scheme privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction for estimating the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, yielding semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.
Tasks	Depth Estimation
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03489v1
PDF	http://arxiv.org/pdf/1704.03489v1.pdf
PWC	https://paperswithcode.com/paper/cnn-slam-real-time-dense-monocular-slam-with
Repo
Framework

Drone Squadron Optimization: a Self-adaptive Algorithm for Global Numerical Optimization


Title	Drone Squadron Optimization: a Self-adaptive Algorithm for Global Numerical Optimization
Authors	Vinícius Veloso de Melo, Wolfgang Banzhaf
Abstract	This paper proposes Drone Squadron Optimization, a new self-adaptive metaheuristic for global numerical optimization which is updated online by a hyper-heuristic. DSO is an artifact-inspired technique, as opposed to many algorithms used nowadays, which are nature-inspired. DSO is very flexible because it is not related to behaviors or natural phenomena. DSO has two core parts: the semi-autonomous drones that fly over a landscape to explore, and the Command Center that processes the retrieved data and updates the drones’ firmware whenever necessary. The self-adaptive aspect of DSO in this work is the perturbation/movement scheme, which is the procedure used to generate target coordinates. This procedure is evolved by the Command Center during the global optimization process in order to adapt DSO to the search landscape. DSO was evaluated on a set of widely employed benchmark functions. The statistical analysis of the results shows that the proposed method is competitive with the other methods in the comparison, the performance is promising, but several future improvements are planned.
Tasks
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04561v1
PDF	http://arxiv.org/pdf/1703.04561v1.pdf
PWC	https://paperswithcode.com/paper/drone-squadron-optimization-a-self-adaptive
Repo
Framework

Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders


Title	Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders
Authors	Le Hui, Xiang Li, Jiaxin Chen, Hongliang He, Chen gong, Jian Yang
Abstract	Unsupervised Image-to-Image Translation achieves spectacularly advanced developments nowadays. However, recent approaches mainly focus on one model with two domains, which may face heavy burdens with large cost of $O(n^2)$ training time and model parameters, under such a requirement that $n$ domains are freely transferred to each other in a general setting. To address this problem, we propose a novel and unified framework named Domain-Bank, which consists of a global shared auto-encoder and $n$ domain-specific encoders/decoders, assuming that a universal shared-latent sapce can be projected. Thus, we yield $O(n)$ complexity in model parameters along with a huge reduction of the time budgets. Besides the high efficiency, we show the comparable (or even better) image translation results over state-of-the-arts on various challenging unsupervised image translation tasks, including face image translation, fashion-clothes translation and painting style translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on digit benchmark datasets. Further, thanks to the explicit representation of the domain-specific decoders as well as the universal shared-latent space, it also enables us to conduct incremental learning to add a new domain encoder/decoder. Linear combination of different domains’ representations is also obtained by fusing the corresponding decoders.
Tasks	Domain Adaptation, Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02050v1
PDF	http://arxiv.org/pdf/1712.02050v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-multi-domain-image-translation
Repo
Framework

Truncated Variational EM for Semi-Supervised Neural Simpletrons


Title	Truncated Variational EM for Semi-Supervised Neural Simpletrons
Authors	Dennis Forster, Jörg Lücke
Abstract	Inference and learning for probabilistic generative networks is often very challenging and typically prevents scalability to as large networks as used for deep discriminative approaches. To obtain efficiently trainable, large-scale and well performing generative networks for semi-supervised learning, we here combine two recent developments: a neural network reformulation of hierarchical Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM approach (TV-EM). TV-EM provides theoretical guarantees for learning in generative networks, and its application to Neural Simpletrons results in particularly compact, yet approximately optimal, modifications of learning equations. If applied to standard benchmarks, we empirically find, that learning converges in fewer EM iterations, that the complexity per EM iteration is reduced, and that final likelihood values are higher on average. For the task of classification on data sets with few labels, learning improvements result in consistently lower error rates if compared to applications without truncation. Experiments on the MNIST data set herein allow for comparison to standard and state-of-the-art models in the semi-supervised setting. Further experiments on the NIST SD19 data set show the scalability of the approach when a manifold of additional unlabeled data is available.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.01997v1
PDF	http://arxiv.org/pdf/1702.01997v1.pdf
PWC	https://paperswithcode.com/paper/truncated-variational-em-for-semi-supervised
Repo
Framework

Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration


Title	Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration
Authors	Judith Bütepage, Hedvig Kjellström, Danica Kragic
Abstract	Fluent and safe interactions of humans and robots require both partners to anticipate the others’ actions. A common approach to human intention inference is to model specific trajectories towards known goals with supervised classifiers. However, these approaches do not take possible future movements into account nor do they make use of kinematic cues, such as legible and predictable motion. The bottleneck of these methods is the lack of an accurate model of general human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motions. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.
Tasks	motion prediction
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08212v1
PDF	http://arxiv.org/pdf/1702.08212v1.pdf
PWC	https://paperswithcode.com/paper/anticipating-many-futures-online-human-motion
Repo
Framework

Measurement-Adaptive Sparse Image Sampling and Recovery


Title	Measurement-Adaptive Sparse Image Sampling and Recovery
Authors	Ali Taimori, Farokh Marvasti
Abstract	This paper presents an adaptive and intelligent sparse model for digital image sampling and recovery. In the proposed sampler, we adaptively determine the number of required samples for retrieving image based on space-frequency-gradient information content of image patches. By leveraging texture in space, sparsity locations in DCT domain, and directional decomposition of gradients, the sampler structure consists of a combination of uniform, random, and nonuniform sampling strategies. For reconstruction, we model the recovery problem as a two-state cellular automaton to iteratively restore image with scalable windows from generation to generation. We demonstrate the recovery algorithm quickly converges after a few generations for an image with arbitrary degree of texture. For a given number of measurements, extensive experiments on standard image-sets, infra-red, and mega-pixel range imaging devices show that the proposed measurement matrix considerably increases the overall recovery performance, or equivalently decreases the number of sampled pixels for a specific recovery quality compared to random sampling matrix and Gaussian linear combinations employed by the state-of-the-art compressive sensing methods. In practice, the proposed measurement-adaptive sampling/recovery framework includes various applications from intelligent compressive imaging-based acquisition devices to computer vision and graphics, and image processing technology. Simulation codes are available online for reproduction purposes.
Tasks	Compressive Sensing
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03129v2
PDF	http://arxiv.org/pdf/1706.03129v2.pdf
PWC	https://paperswithcode.com/paper/measurement-adaptive-sparse-image-sampling
Repo
Framework

On Multi-Relational Link Prediction with Bilinear Models


Title	On Multi-Relational Link Prediction with Bilinear Models
Authors	Yanjie Wang, Rainer Gemulla, Hui Li
Abstract	We study bilinear embedding models for the task of multi-relational link prediction and knowledge graph completion. Bilinear models belong to the most basic models for this task, they are comparably efficient to train and use, and they can provide good prediction performance. The main goal of this paper is to explore the expressiveness of and the connections between various bilinear models proposed in the literature. In particular, a substantial number of models can be represented as bilinear models with certain additional constraints enforced on the embeddings. We explore whether or not these constraints lead to universal models, which can in principle represent every set of relations, and whether or not there are subsumption relationships between various models. We report results of an independent experimental study that evaluates recent bilinear models in a common experimental setup. Finally, we provide evidence that relation-level ensembles of multiple bilinear models can achieve state-of-the art prediction performance.
Tasks	Knowledge Graph Completion, Link Prediction
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04808v1
PDF	http://arxiv.org/pdf/1709.04808v1.pdf
PWC	https://paperswithcode.com/paper/on-multi-relational-link-prediction-with
Repo
Framework

Clickbait Identification using Neural Networks


Title	Clickbait Identification using Neural Networks
Authors	Philippe Thomas
Abstract	This paper presents the results of our participation in the Clickbait Detection Challenge 2017. The system relies on a fusion of neural networks, incorporating different types of available informations. It does not require any linguistic preprocessing, and hence generalizes more easily to new domains and languages. The final combined model achieves a mean squared error of 0.0428, an accuracy of 0.826, and a F1 score of 0.564. According to the official evaluation metric the system ranked 6th of the 13 participating teams.
Tasks	Clickbait Detection
Published	2017-10-24
URL	http://arxiv.org/abs/1710.08721v1
PDF	http://arxiv.org/pdf/1710.08721v1.pdf
PWC	https://paperswithcode.com/paper/clickbait-identification-using-neural
Repo
Framework

Link the head to the “beak”: Zero Shot Learning from Noisy Text Description at Part Precision


Title	Link the head to the “beak”: Zero Shot Learning from Noisy Text Description at Part Precision
Authors	Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal
Abstract	In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like “beak” to be sparsely linked to the visual representation of parts like head, while reduces the effect of non-visual terms like “migrate” on classifier prediction. Images are encoded by a part-based CNN that detect bird parts and learn part-specific representation. Part-based visual classifiers are predicted from text descriptions of unseen visual classifiers to facilitate classification without training images (also known as zero-shot recognition). We performed our experiments on CUBirds 2011 dataset and improves the state-of-the-art text-based zero-shot recognition results from 34.7% to 43.6%. We also created large scale benchmarks on North American Bird Images augmented with text descriptions, where we also show that our approach outperforms existing methods. Our code, data, and models are publically available.
Tasks	Zero-Shot Learning
Published	2017-09-04
URL	http://arxiv.org/abs/1709.01148v1
PDF	http://arxiv.org/pdf/1709.01148v1.pdf
PWC	https://paperswithcode.com/paper/link-the-head-to-the-beak-zero-shot-learning
Repo
Framework

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low


Title	Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low
Authors	Rahul Mazumder, Peter Radchenko, Antoine Dedieu
Abstract	We study the behavior of a fundamental tool in sparse statistical modeling –the best-subset selection procedure (aka “best-subsets”). Assuming that the underlying linear model is sparse, it is well known, both in theory and in practice, that the best-subsets procedure works extremely well in terms of several statistical metrics (prediction, estimation and variable selection) when the signal to noise ratio (SNR) is high. However, its performance degrades substantially when the SNR is low – it is outperformed in predictive accuracy by continuous shrinkage methods, such as ridge regression and the Lasso. We explain why this behavior should not come as a surprise, and contend that the original version of the classical best-subsets procedure was, perhaps, not designed to be used in the low SNR regimes. We propose a close cousin of best-subsets, namely, its $\ell_{q}$-regularized version, for $q \in{1, 2}$, which (a) mitigates, to a large extent, the poor predictive performance of best-subsets in the low SNR regimes; (b) performs favorably and generally delivers a substantially sparser model when compared to the best predictive models available via ridge regression and the Lasso. Our estimator can be expressed as a solution to a mixed integer second order conic optimization problem and, hence, is amenable to modern computational tools from mathematical optimization. We explore the theoretical properties of the predictive capabilities of the proposed estimator and complement our findings via several numerical experiments.
Tasks
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03288v1
PDF	http://arxiv.org/pdf/1708.03288v1.pdf
PWC	https://paperswithcode.com/paper/subset-selection-with-shrinkage-sparse-linear
Repo
Framework