Paper Group ANR 644
Impression Network for Video Object Detection. Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models. Descriptions of Objectives and Processes of Mechanical Learning. Formal approaches to a definition of agents. Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware. CNN- …
Impression Network for Video Object Detection
Title | Impression Network for Video Object Detection |
Authors | Congrui Hetang, Hongwei Qin, Shaohui Liu, Junjie Yan |
Abstract | Video object detection is more challenging compared to image object detection. Previous works proved that applying object detector frame by frame is not only slow but also inaccurate. Visual clues get weakened by defocus and motion blur, causing failure on corresponding frames. Multi-frame feature fusion methods proved effective in improving the accuracy, but they dramatically sacrifice the speed. Feature propagation based methods proved effective in improving the speed, but they sacrifice the accuracy. So is it possible to improve speed and performance simultaneously? Inspired by how human utilize impression to recognize objects from blurry frames, we propose Impression Network that embodies a natural and efficient feature aggregation mechanism. In our framework, an impression feature is established by iteratively absorbing sparsely extracted frame features. The impression feature is propagated all the way down the video, helping enhance features of low-quality frames. This impression mechanism makes it possible to perform long-range multi-frame feature fusion among sparse keyframes with minimal overhead. It significantly improves per-frame detection baseline on ImageNet VID while being 3 times faster (20 fps). We hope Impression Network can provide a new perspective on video feature enhancement. Code will be made available. |
Tasks | Object Detection, Video Object Detection |
Published | 2017-12-16 |
URL | http://arxiv.org/abs/1712.05896v1 |
http://arxiv.org/pdf/1712.05896v1.pdf | |
PWC | https://paperswithcode.com/paper/impression-network-for-video-object-detection |
Repo | |
Framework | |
Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models
Title | Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models |
Authors | Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, Lenka Zdeborová |
Abstract | Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Non-rigorous predictions for the optimal errors existed for special cases of GLMs, e.g. for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance, and locate the associated sharp phase transitions separating learnable and non-learnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multi-purpose algorithms. This paper is divided in two parts that can be read independently: The first part (main part) presents the model and main results, discusses some applications and sketches the main ideas of the proof. The second part (supplementary informations) is much more detailed and provides more examples as well as all the proofs. |
Tasks | |
Published | 2017-08-10 |
URL | http://arxiv.org/abs/1708.03395v3 |
http://arxiv.org/pdf/1708.03395v3.pdf | |
PWC | https://paperswithcode.com/paper/optimal-errors-and-phase-transitions-in-high |
Repo | |
Framework | |
Descriptions of Objectives and Processes of Mechanical Learning
Title | Descriptions of Objectives and Processes of Mechanical Learning |
Authors | Chuyu Xiong |
Abstract | In [1], we introduced mechanical learning and proposed 2 approaches to mechanical learning. Here, we follow one such approach to well describe the objects and the processes of learning. We discuss 2 kinds of patterns: objective and subjective pattern. Subjective pattern is crucial for learning machine. We prove that for any objective pattern we can find a proper subjective pattern based upon least base patterns to express the objective pattern well. X-form is algebraic expression for subjective pattern. Collection of X-forms form internal representation space, which is center of learning machine. We discuss learning by teaching and without teaching. We define data sufficiency by X-form. We then discussed some learning strategies. We show, in each strategy, with sufficient data, and with certain capabilities, learning machine indeed can learn any pattern (universal learning machine). In appendix, with knowledge of learning machine, we try to view deep learning from a different angle, i.e. its internal representation space and its learning dynamics. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00066v1 |
http://arxiv.org/pdf/1706.00066v1.pdf | |
PWC | https://paperswithcode.com/paper/descriptions-of-objectives-and-processes-of |
Repo | |
Framework | |
Formal approaches to a definition of agents
Title | Formal approaches to a definition of agents |
Authors | Martin Biehl |
Abstract | This thesis contributes to the formalisation of the notion of an agent within the class of finite multivariate Markov chains. Agents are seen as entities that act, perceive, and are goal-directed. We present a new measure that can be used to identify entities (called $\iota$-entities), some general requirements for entities in multivariate Markov chains, as well as formal definitions of actions and perceptions suitable for such entities. The intuition behind $\iota$-entities is that entities are spatiotemporal patterns for which every part makes every other part more probable. The measure, complete local integration (CLI), is formally investigated in general Bayesian networks. It is based on the specific local integration (SLI) which is measured with respect to a partition. CLI is the minimum value of SLI over all partitions. We prove that $\iota$-entities are blocks in specific partitions of the global trajectory. These partitions are the finest partitions that achieve a given SLI value. We also establish the transformation behaviour of SLI under permutations of nodes in the network. We go on to present three conditions on general definitions of entities. These are not fulfilled by sets of random variables i.e.\ the perception-action loop, which is often used to model agents, is too restrictive. We propose that any general entity definition should in effect specify a subset (called an an entity-set) of the set of all spatiotemporal patterns of a given multivariate Markov chain. The set of $\iota$-entities is such a set. Importantly the perception-action loop also induces an entity-set. We then propose formal definitions of actions and perceptions for arbitrary entity-sets. These specialise to standard notions in case of the perception-action loop entity-set. Finally we look at some very simple examples. |
Tasks | |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02716v1 |
http://arxiv.org/pdf/1704.02716v1.pdf | |
PWC | https://paperswithcode.com/paper/formal-approaches-to-a-definition-of-agents |
Repo | |
Framework | |
Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware
Title | Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware |
Authors | Mihai A. Petrovici, Anna Schroeder, Oliver Breitwieser, Andreas Grübl, Johannes Schemmel, Karlheinz Meier |
Abstract | How spiking networks are able to perform probabilistic inference is an intriguing question, not only for understanding information processing in the brain, but also for transferring these computational principles to neuromorphic silicon circuits. A number of computationally powerful spiking network models have been proposed, but most of them have only been tested, under ideal conditions, in software simulations. Any implementation in an analog, physical system, be it in vivo or in silico, will generally lead to distorted dynamics due to the physical properties of the underlying substrate. In this paper, we discuss several such distortive effects that are difficult or impossible to remove by classical calibration routines or parameter training. We then argue that hierarchical networks of leaky integrate-and-fire neurons can offer the required robustness for physical implementation and demonstrate this with both software simulations and emulation on an accelerated analog neuromorphic device. |
Tasks | Calibration |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04145v1 |
http://arxiv.org/pdf/1703.04145v1.pdf | |
PWC | https://paperswithcode.com/paper/robustness-from-structure-inference-with |
Repo | |
Framework | |
CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction
Title | CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction |
Authors | Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab |
Abstract | Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM. Our fusion scheme privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction for estimating the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, yielding semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach. |
Tasks | Depth Estimation |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03489v1 |
http://arxiv.org/pdf/1704.03489v1.pdf | |
PWC | https://paperswithcode.com/paper/cnn-slam-real-time-dense-monocular-slam-with |
Repo | |
Framework | |
Drone Squadron Optimization: a Self-adaptive Algorithm for Global Numerical Optimization
Title | Drone Squadron Optimization: a Self-adaptive Algorithm for Global Numerical Optimization |
Authors | Vinícius Veloso de Melo, Wolfgang Banzhaf |
Abstract | This paper proposes Drone Squadron Optimization, a new self-adaptive metaheuristic for global numerical optimization which is updated online by a hyper-heuristic. DSO is an artifact-inspired technique, as opposed to many algorithms used nowadays, which are nature-inspired. DSO is very flexible because it is not related to behaviors or natural phenomena. DSO has two core parts: the semi-autonomous drones that fly over a landscape to explore, and the Command Center that processes the retrieved data and updates the drones’ firmware whenever necessary. The self-adaptive aspect of DSO in this work is the perturbation/movement scheme, which is the procedure used to generate target coordinates. This procedure is evolved by the Command Center during the global optimization process in order to adapt DSO to the search landscape. DSO was evaluated on a set of widely employed benchmark functions. The statistical analysis of the results shows that the proposed method is competitive with the other methods in the comparison, the performance is promising, but several future improvements are planned. |
Tasks | |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04561v1 |
http://arxiv.org/pdf/1703.04561v1.pdf | |
PWC | https://paperswithcode.com/paper/drone-squadron-optimization-a-self-adaptive |
Repo | |
Framework | |
Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders
Title | Unsupervised Multi-Domain Image Translation with Domain-Specific Encoders/Decoders |
Authors | Le Hui, Xiang Li, Jiaxin Chen, Hongliang He, Chen gong, Jian Yang |
Abstract | Unsupervised Image-to-Image Translation achieves spectacularly advanced developments nowadays. However, recent approaches mainly focus on one model with two domains, which may face heavy burdens with large cost of $O(n^2)$ training time and model parameters, under such a requirement that $n$ domains are freely transferred to each other in a general setting. To address this problem, we propose a novel and unified framework named Domain-Bank, which consists of a global shared auto-encoder and $n$ domain-specific encoders/decoders, assuming that a universal shared-latent sapce can be projected. Thus, we yield $O(n)$ complexity in model parameters along with a huge reduction of the time budgets. Besides the high efficiency, we show the comparable (or even better) image translation results over state-of-the-arts on various challenging unsupervised image translation tasks, including face image translation, fashion-clothes translation and painting style translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on digit benchmark datasets. Further, thanks to the explicit representation of the domain-specific decoders as well as the universal shared-latent space, it also enables us to conduct incremental learning to add a new domain encoder/decoder. Linear combination of different domains’ representations is also obtained by fusing the corresponding decoders. |
Tasks | Domain Adaptation, Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02050v1 |
http://arxiv.org/pdf/1712.02050v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-multi-domain-image-translation |
Repo | |
Framework | |
Truncated Variational EM for Semi-Supervised Neural Simpletrons
Title | Truncated Variational EM for Semi-Supervised Neural Simpletrons |
Authors | Dennis Forster, Jörg Lücke |
Abstract | Inference and learning for probabilistic generative networks is often very challenging and typically prevents scalability to as large networks as used for deep discriminative approaches. To obtain efficiently trainable, large-scale and well performing generative networks for semi-supervised learning, we here combine two recent developments: a neural network reformulation of hierarchical Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM approach (TV-EM). TV-EM provides theoretical guarantees for learning in generative networks, and its application to Neural Simpletrons results in particularly compact, yet approximately optimal, modifications of learning equations. If applied to standard benchmarks, we empirically find, that learning converges in fewer EM iterations, that the complexity per EM iteration is reduced, and that final likelihood values are higher on average. For the task of classification on data sets with few labels, learning improvements result in consistently lower error rates if compared to applications without truncation. Experiments on the MNIST data set herein allow for comparison to standard and state-of-the-art models in the semi-supervised setting. Further experiments on the NIST SD19 data set show the scalability of the approach when a manifold of additional unlabeled data is available. |
Tasks | |
Published | 2017-02-07 |
URL | http://arxiv.org/abs/1702.01997v1 |
http://arxiv.org/pdf/1702.01997v1.pdf | |
PWC | https://paperswithcode.com/paper/truncated-variational-em-for-semi-supervised |
Repo | |
Framework | |
Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration
Title | Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration |
Authors | Judith Bütepage, Hedvig Kjellström, Danica Kragic |
Abstract | Fluent and safe interactions of humans and robots require both partners to anticipate the others’ actions. A common approach to human intention inference is to model specific trajectories towards known goals with supervised classifiers. However, these approaches do not take possible future movements into account nor do they make use of kinematic cues, such as legible and predictable motion. The bottleneck of these methods is the lack of an accurate model of general human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motions. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold. |
Tasks | motion prediction |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08212v1 |
http://arxiv.org/pdf/1702.08212v1.pdf | |
PWC | https://paperswithcode.com/paper/anticipating-many-futures-online-human-motion |
Repo | |
Framework | |
Measurement-Adaptive Sparse Image Sampling and Recovery
Title | Measurement-Adaptive Sparse Image Sampling and Recovery |
Authors | Ali Taimori, Farokh Marvasti |
Abstract | This paper presents an adaptive and intelligent sparse model for digital image sampling and recovery. In the proposed sampler, we adaptively determine the number of required samples for retrieving image based on space-frequency-gradient information content of image patches. By leveraging texture in space, sparsity locations in DCT domain, and directional decomposition of gradients, the sampler structure consists of a combination of uniform, random, and nonuniform sampling strategies. For reconstruction, we model the recovery problem as a two-state cellular automaton to iteratively restore image with scalable windows from generation to generation. We demonstrate the recovery algorithm quickly converges after a few generations for an image with arbitrary degree of texture. For a given number of measurements, extensive experiments on standard image-sets, infra-red, and mega-pixel range imaging devices show that the proposed measurement matrix considerably increases the overall recovery performance, or equivalently decreases the number of sampled pixels for a specific recovery quality compared to random sampling matrix and Gaussian linear combinations employed by the state-of-the-art compressive sensing methods. In practice, the proposed measurement-adaptive sampling/recovery framework includes various applications from intelligent compressive imaging-based acquisition devices to computer vision and graphics, and image processing technology. Simulation codes are available online for reproduction purposes. |
Tasks | Compressive Sensing |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.03129v2 |
http://arxiv.org/pdf/1706.03129v2.pdf | |
PWC | https://paperswithcode.com/paper/measurement-adaptive-sparse-image-sampling |
Repo | |
Framework | |
On Multi-Relational Link Prediction with Bilinear Models
Title | On Multi-Relational Link Prediction with Bilinear Models |
Authors | Yanjie Wang, Rainer Gemulla, Hui Li |
Abstract | We study bilinear embedding models for the task of multi-relational link prediction and knowledge graph completion. Bilinear models belong to the most basic models for this task, they are comparably efficient to train and use, and they can provide good prediction performance. The main goal of this paper is to explore the expressiveness of and the connections between various bilinear models proposed in the literature. In particular, a substantial number of models can be represented as bilinear models with certain additional constraints enforced on the embeddings. We explore whether or not these constraints lead to universal models, which can in principle represent every set of relations, and whether or not there are subsumption relationships between various models. We report results of an independent experimental study that evaluates recent bilinear models in a common experimental setup. Finally, we provide evidence that relation-level ensembles of multiple bilinear models can achieve state-of-the art prediction performance. |
Tasks | Knowledge Graph Completion, Link Prediction |
Published | 2017-09-14 |
URL | http://arxiv.org/abs/1709.04808v1 |
http://arxiv.org/pdf/1709.04808v1.pdf | |
PWC | https://paperswithcode.com/paper/on-multi-relational-link-prediction-with |
Repo | |
Framework | |
Clickbait Identification using Neural Networks
Title | Clickbait Identification using Neural Networks |
Authors | Philippe Thomas |
Abstract | This paper presents the results of our participation in the Clickbait Detection Challenge 2017. The system relies on a fusion of neural networks, incorporating different types of available informations. It does not require any linguistic preprocessing, and hence generalizes more easily to new domains and languages. The final combined model achieves a mean squared error of 0.0428, an accuracy of 0.826, and a F1 score of 0.564. According to the official evaluation metric the system ranked 6th of the 13 participating teams. |
Tasks | Clickbait Detection |
Published | 2017-10-24 |
URL | http://arxiv.org/abs/1710.08721v1 |
http://arxiv.org/pdf/1710.08721v1.pdf | |
PWC | https://paperswithcode.com/paper/clickbait-identification-using-neural |
Repo | |
Framework | |
Link the head to the “beak”: Zero Shot Learning from Noisy Text Description at Part Precision
Title | Link the head to the “beak”: Zero Shot Learning from Noisy Text Description at Part Precision |
Authors | Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal |
Abstract | In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like “beak” to be sparsely linked to the visual representation of parts like head, while reduces the effect of non-visual terms like “migrate” on classifier prediction. Images are encoded by a part-based CNN that detect bird parts and learn part-specific representation. Part-based visual classifiers are predicted from text descriptions of unseen visual classifiers to facilitate classification without training images (also known as zero-shot recognition). We performed our experiments on CUBirds 2011 dataset and improves the state-of-the-art text-based zero-shot recognition results from 34.7% to 43.6%. We also created large scale benchmarks on North American Bird Images augmented with text descriptions, where we also show that our approach outperforms existing methods. Our code, data, and models are publically available. |
Tasks | Zero-Shot Learning |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.01148v1 |
http://arxiv.org/pdf/1709.01148v1.pdf | |
PWC | https://paperswithcode.com/paper/link-the-head-to-the-beak-zero-shot-learning |
Repo | |
Framework | |
Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low
Title | Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low |
Authors | Rahul Mazumder, Peter Radchenko, Antoine Dedieu |
Abstract | We study the behavior of a fundamental tool in sparse statistical modeling –the best-subset selection procedure (aka “best-subsets”). Assuming that the underlying linear model is sparse, it is well known, both in theory and in practice, that the best-subsets procedure works extremely well in terms of several statistical metrics (prediction, estimation and variable selection) when the signal to noise ratio (SNR) is high. However, its performance degrades substantially when the SNR is low – it is outperformed in predictive accuracy by continuous shrinkage methods, such as ridge regression and the Lasso. We explain why this behavior should not come as a surprise, and contend that the original version of the classical best-subsets procedure was, perhaps, not designed to be used in the low SNR regimes. We propose a close cousin of best-subsets, namely, its $\ell_{q}$-regularized version, for $q \in{1, 2}$, which (a) mitigates, to a large extent, the poor predictive performance of best-subsets in the low SNR regimes; (b) performs favorably and generally delivers a substantially sparser model when compared to the best predictive models available via ridge regression and the Lasso. Our estimator can be expressed as a solution to a mixed integer second order conic optimization problem and, hence, is amenable to modern computational tools from mathematical optimization. We explore the theoretical properties of the predictive capabilities of the proposed estimator and complement our findings via several numerical experiments. |
Tasks | |
Published | 2017-08-10 |
URL | http://arxiv.org/abs/1708.03288v1 |
http://arxiv.org/pdf/1708.03288v1.pdf | |
PWC | https://paperswithcode.com/paper/subset-selection-with-shrinkage-sparse-linear |
Repo | |
Framework | |