October 17, 2019

3159 words 15 mins read

Paper Group ANR 744

Paper Group ANR 744

SAM-RCNN: Scale-Aware Multi-Resolution Multi-Channel Pedestrian Detection. Data Fine-tuning. Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning. Spatio-Temporal Channel Correlation Networks for Action Classification. Resource-Constrained Simultaneous Detection and Labeling of Objects in High-Reso …

SAM-RCNN: Scale-Aware Multi-Resolution Multi-Channel Pedestrian Detection

Title SAM-RCNN: Scale-Aware Multi-Resolution Multi-Channel Pedestrian Detection
Authors Tianrui Liu, Mohamed Elmikaty, Tania Stathaki
Abstract Convolutional neural networks (CNN) have enabled significant improvements in pedestrian detection owing to the strong representation ability of the CNN features. Recently, aggregating features from multiple layers of a CNN has been considered as an effective approach, however, the same approach regarding feature representation is used for detecting pedestrians of varying scales. Consequently, it is not guaranteed that the feature representation for pedestrians of a particular scale is optimised. In this paper, we propose a Scale-Aware Multi-resolution (SAM) method for pedestrian detection which can adaptively select multi-resolution convolutional features according to pedestrian sizes. The proposed SAM method extracts the appropriate CNN features that have strong representation ability as well as sufficient feature resolution, given the size of the pedestrian candidate output from a region proposal network. Moreover, we propose an enhanced SAM method, termed as SAM+, which incorporates complementary features channels and achieves further performance improvement. Evaluations on the challenging Caltech and KITTI pedestrian benchmarks demonstrate the superiority of our proposed method.
Tasks Pedestrian Detection
Published 2018-08-07
URL http://arxiv.org/abs/1808.02246v1
PDF http://arxiv.org/pdf/1808.02246v1.pdf
PWC https://paperswithcode.com/paper/sam-rcnn-scale-aware-multi-resolution-multi
Repo
Framework

Data Fine-tuning

Title Data Fine-tuning
Authors Saheb Chhabra, Puspita Majumdar, Mayank Vatsa, Richa Singh
Abstract In real-world applications, commercial off-the-shelf systems are utilized for performing automated facial analysis including face recognition, emotion recognition, and attribute prediction. However, a majority of these commercial systems act as black boxes due to the inaccessibility of the model parameters which makes it challenging to fine-tune the models for specific applications. Stimulated by the advances in adversarial perturbations, this research proposes the concept of Data Fine-tuning to improve the classification accuracy of a given model without changing the parameters of the model. This is accomplished by modeling it as data (image) perturbation problem. A small amount of “noise” is added to the input with the objective of minimizing the classification loss without affecting the (visual) appearance. Experiments performed on three publicly available datasets LFW, CelebA, and MUCT, demonstrate the effectiveness of the proposed concept.
Tasks Emotion Recognition, Face Recognition
Published 2018-12-10
URL http://arxiv.org/abs/1812.03944v1
PDF http://arxiv.org/pdf/1812.03944v1.pdf
PWC https://paperswithcode.com/paper/data-fine-tuning
Repo
Framework

Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning

Title Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Authors Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang
Abstract Although stochastic gradient descent (SGD) is a driving force behind the recent success of deep learning, our understanding of its dynamics in a high-dimensional parameter space is limited. In recent years, some researchers have used the stochasticity of minibatch gradients, or the signal-to-noise ratio, to better characterize the learning dynamics of SGD. Inspired from these work, we here analyze SGD from a geometrical perspective by inspecting the stochasticity of the norms and directions of minibatch gradients. We propose a model of the directional concentration for minibatch gradients through von Mises-Fisher (VMF) distribution, and show that the directional uniformity of minibatch gradients increases over the course of SGD. We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity, suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.
Tasks
Published 2018-09-29
URL http://arxiv.org/abs/1810.00150v2
PDF http://arxiv.org/pdf/1810.00150v2.pdf
PWC https://paperswithcode.com/paper/directional-analysis-of-stochastic-gradient
Repo
Framework

Spatio-Temporal Channel Correlation Networks for Action Classification

Title Spatio-Temporal Channel Correlation Networks for Action Classification
Authors Ali Diba, Mohsen Fayyaz, Vivek Sharma, M. Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc Van Gool
Abstract The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block ‘Spatio-Temporal Channel Correlation’ (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improved the performance by 2-3% on Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
Tasks Action Classification
Published 2018-06-19
URL http://arxiv.org/abs/1806.07754v3
PDF http://arxiv.org/pdf/1806.07754v3.pdf
PWC https://paperswithcode.com/paper/spatio-temporal-channel-correlation-networks
Repo
Framework

Resource-Constrained Simultaneous Detection and Labeling of Objects in High-Resolution Satellite Images

Title Resource-Constrained Simultaneous Detection and Labeling of Objects in High-Resolution Satellite Images
Authors Gilbert Rotich, Rodrigo Minetto, Sudeep Sarkar
Abstract We describe a strategy for detection and classification of man-made objects in large high-resolution satellite photos under computational resource constraints. We detect and classify candidate objects by using five pipelines of convolutional neural network processing (CNN), run in parallel. Each pipeline has its own unique strategy for fine tunning parameters, proposal region filtering, and dealing with image scales. The conflicting region proposals are merged based on region confidence and not just based on overlap areas, which improves the quality of the final bounding-box regions selected. We demonstrate this strategy using the recent xView challenge, which is a complex benchmark with more than 1,100 high-resolution images, spanning 800,000 aerial objects around the world covering a total area of 1,400 square kilometers at 0.3 meter ground sample distance. To tackle the resource-constrained problem posed by the xView challenge, where inferences are restricted to be on CPU with 8GB memory limit, we used lightweight CNN’s trained with the single shot detector algorithm. Our approach was competitive on sequestered sets; it was ranked third.
Tasks
Published 2018-10-23
URL http://arxiv.org/abs/1810.10110v1
PDF http://arxiv.org/pdf/1810.10110v1.pdf
PWC https://paperswithcode.com/paper/resource-constrained-simultaneous-detection
Repo
Framework

Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations

Title Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations
Authors Bohao Huang, Daniel Reichman, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof
Abstract In this work we consider the application of convolutional neural networks (CNNs) for pixel-wise labeling (a.k.a., semantic segmentation) of remote sensing imagery (e.g., aerial color or hyperspectral imagery). Remote sensing imagery is usually stored in the form of very large images, referred to as “tiles”, which are too large to be segmented directly using most CNNs and their associated hardware. As a result, during label inference, smaller sub-images, called “patches”, are processed individually and then “stitched” (concatenated) back together to create a tile-sized label map. This approach suffers from computational ineffiency and can result in discontinuities at output boundaries. We propose a simple alternative approach in which the input size of the CNN is dramatically increased only during label inference. This does not avoid stitching altogether, but substantially mitigates its limitations. We evaluate the performance of the proposed approach against a vonventional stitching approach using two popular segmentation CNN models and two large-scale remote sensing imagery datasets. The results suggest that the proposed approach substantially reduces label inference time, while also yielding modest overall label accuracy increases. This approach contributed to our wining entry (overall performance) in the INRIA building labeling competition.
Tasks Segmentation Of Remote Sensing Imagery, Semantic Segmentation
Published 2018-05-30
URL http://arxiv.org/abs/1805.12219v3
PDF http://arxiv.org/pdf/1805.12219v3.pdf
PWC https://paperswithcode.com/paper/tiling-and-stitching-segmentation-output-for
Repo
Framework

fMRI: preprocessing, classification and pattern recognition

Title fMRI: preprocessing, classification and pattern recognition
Authors Maxim Sharaev, Alexander Andreev, Alexey Artemov, Alexander Bernstein, Evgeny Burnaev, Ekaterina Kondratyeva, Svetlana Sushchinskaya, Renat Akzhigitov
Abstract As machine learning continues to gain momentum in the neuroscience community, we witness the emergence of novel applications such as diagnostics, characterization, and treatment outcome prediction for psychiatric and neurological disorders, for instance, epilepsy and depression. Systematic research into these mental disorders increasingly involves drawing clinical conclusions on the basis of data-driven approaches; to this end, structural and functional neuroimaging serve as key source modalities. Identification of informative neuroimaging markers requires establishing a comprehensive preparation pipeline for data which may be severely corrupted by artifactual signal fluctuations. In this work, we review a large body of literature to provide ample evidence for the advantages of pattern recognition approaches in clinical applications, overview advanced graph-based pattern recognition approaches, and propose a noise-aware neuroimaging data processing pipeline. To demonstrate the effectiveness of our approach, we provide results from a pilot study, which show a significant improvement in classification accuracy, indicating a promising research direction.
Tasks
Published 2018-04-26
URL http://arxiv.org/abs/1804.10167v1
PDF http://arxiv.org/pdf/1804.10167v1.pdf
PWC https://paperswithcode.com/paper/fmri-preprocessing-classification-and-pattern
Repo
Framework

On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches

Title On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches
Authors Jie Liu, Yu Rong, Martin Takac, Junzhou Huang
Abstract This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. Different from the classical L-BFGS where stochastic batches lead to instability, we use a smooth estimate for the evaluations of the gradient differences while achieving acceleration by well-scaling the initial Hessians. We provide theoretical analyses for both convex and nonconvex cases. In addition, we demonstrate that within the popular applications of least-square and cross-entropy losses, the algorithm admits a simple implementation in the distributed environment. Numerical experiments support the efficiency of our algorithms.
Tasks
Published 2018-07-14
URL http://arxiv.org/abs/1807.05328v1
PDF http://arxiv.org/pdf/1807.05328v1.pdf
PWC https://paperswithcode.com/paper/on-the-acceleration-of-l-bfgs-with-second
Repo
Framework

Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction

Title Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction
Authors Yova Kementchedjhieva, Sebastian Ruder, Ryan Cotterell, Anders Søgaard
Abstract Most recent approaches to bilingual dictionary induction find a linear alignment between the word vector spaces of two languages. We show that projecting the two languages onto a third, latent space, rather than directly onto each other, while equivalent in terms of expressivity, makes it easier to learn approximate alignments. Our modified approach also allows for supporting languages to be included in the alignment process, to obtain an even better performance in low resource settings.
Tasks
Published 2018-08-31
URL https://arxiv.org/abs/1809.00064v3
PDF https://arxiv.org/pdf/1809.00064v3.pdf
PWC https://paperswithcode.com/paper/generalizing-procrustes-analysis-for-better
Repo
Framework

Modeling Naive Psychology of Characters in Simple Commonsense Stories

Title Modeling Naive Psychology of Characters in Simple Commonsense Stories
Authors Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi
Abstract Understanding a narrative requires reading between the lines and reasoning about the unspoken but obvious implications about events and people’s mental states - a capability that is trivial for humans but remarkably hard for machines. To facilitate research addressing this challenge, we introduce a new annotation framework to explain naive psychology of story characters as fully-specified chains of mental states with respect to motivations and emotional reactions. Our work presents a new large-scale dataset with rich low-level annotations and establishes baseline performance on several new tasks, suggesting avenues for future research.
Tasks
Published 2018-05-16
URL http://arxiv.org/abs/1805.06533v1
PDF http://arxiv.org/pdf/1805.06533v1.pdf
PWC https://paperswithcode.com/paper/modeling-naive-psychology-of-characters-in
Repo
Framework

Cognition in Dynamical Systems, Second Edition

Title Cognition in Dynamical Systems, Second Edition
Authors Jack Hall
Abstract Cognition is the process of knowing. As carried out by a dynamical system, it is the process by which the system absorbs information into its state. A complex network of agents cognizes knowledge about its environment, internal dynamics and initial state by forming emergent, macro-level patterns. Such patterns require each agent to find its place while partially aware of the whole pattern. Such partial awareness can be achieved by separating the system dynamics into two parts by timescale: the propagation dynamics and the pattern dynamics. The fast propagation dynamics describe the spread of signals across the network. If they converge to a fixed point for any quasi-static state of the slow pattern dynamics, that fixed point represents an aggregate of macro-level information. On longer timescales, agents coordinate via positive feedback to form patterns, which are defined using closed walks in the graph of agents. Patterns can be coherent, in that every part of the pattern depends on every other part for context. Coherent patterns are acausal, in that (a) they cannot be predicted and (b) no part of the stored knowledge can be mapped to any part of the pattern, or vice versa. A cognitive network’s knowledge is encoded or embodied by the selection of patterns which emerge. The theory of cognition summarized here can model autocatalytic reaction-diffusion systems, artificial neural networks, market economies and ant colony optimization, among many other real and virtual systems. This theory suggests a new understanding of complexity as a lattice of contexts rather than a single measure.
Tasks
Published 2018-04-09
URL http://arxiv.org/abs/1805.00787v1
PDF http://arxiv.org/pdf/1805.00787v1.pdf
PWC https://paperswithcode.com/paper/cognition-in-dynamical-systems-second-edition
Repo
Framework

Composable Probabilistic Inference Networks Using MRAM-based Stochastic Neurons

Title Composable Probabilistic Inference Networks Using MRAM-based Stochastic Neurons
Authors Ramtin Zand, Kerem Y. Camsari, Supriyo Datta, Ronald F. DeMara
Abstract Magnetoresistive random access memory (MRAM) technologies with thermally unstable nanomagnets are leveraged to develop an intrinsic stochastic neuron as a building block for restricted Boltzmann machines (RBMs) to form deep belief networks (DBNs). The embedded MRAM-based neuron is modeled using precise physics equations. The simulation results exhibit the desired sigmoidal relation between the input voltages and probability of the output state. A probabilistic inference network simulator (PIN-Sim) is developed to realize a circuit-level model of an RBM utilizing resistive crossbar arrays along with differential amplifiers to implement the positive and negative weight values. The PIN-Sim is composed of five main blocks to train a DBN, evaluate its accuracy, and measure its power consumption. The MNIST dataset is leveraged to investigate the energy and accuracy tradeoffs of seven distinct network topologies in SPICE using the 14nm HP-FinFET technology library with the nominal voltage of 0.8V, in which an MRAM-based neuron is used as the activation function. The software and hardware level simulations indicate that a $784\times200\times10$ topology can achieve less than 5% error rates with $\sim400 pJ$ energy consumption. The error rates can be reduced to 2.5% by using a $784\times500\times500\times500\times10$ DBN at the cost of $\sim10\times$ higher energy consumption and significant area overhead. Finally, the effects of specific hardware-level parameters on power dissipation and accuracy tradeoffs are identified via the developed PIN-Sim framework.
Tasks
Published 2018-11-28
URL http://arxiv.org/abs/1811.11390v1
PDF http://arxiv.org/pdf/1811.11390v1.pdf
PWC https://paperswithcode.com/paper/composable-probabilistic-inference-networks
Repo
Framework

Path-Specific Counterfactual Fairness

Title Path-Specific Counterfactual Fairness
Authors Silvia Chiappa, Thomas P. S. Gillam
Abstract We consider the problem of learning fair decision systems in complex scenarios in which a sensitive attribute might affect the decision along both fair and unfair pathways. We introduce a causal approach to disregard effects along unfair pathways that simplifies and generalizes previous literature. Our method corrects observations adversely affected by the sensitive attribute, and uses these to form a decision. This avoids disregarding fair information, and does not require an often intractable computation of the path-specific effect. We leverage recent developments in deep learning and approximate inference to achieve a solution that is widely applicable to complex, non-linear scenarios.
Tasks
Published 2018-02-22
URL http://arxiv.org/abs/1802.08139v1
PDF http://arxiv.org/pdf/1802.08139v1.pdf
PWC https://paperswithcode.com/paper/path-specific-counterfactual-fairness
Repo
Framework

Fast Object Detection in Compressed Video

Title Fast Object Detection in Compressed Video
Authors Shiyao Wang, Hongchao Lu, Zhidong Deng
Abstract Object detection in videos has drawn increasing attention since it is more practical in real scenarios. Most of the deep learning methods use CNNs to process each decoded frame in a video stream individually. However, the free of charge yet valuable motion information already embedded in the video compression format is usually overlooked. In this paper, we propose a fast object detection method by taking advantage of this with a novel Motion aided Memory Network (MMNet). The MMNet has two major advantages: 1) It significantly accelerates the procedure of feature extraction for compressed videos. It only need to run a complete recognition network for I-frames, i.e. a few reference frames in a video, and it produces the features for the following P frames (predictive frames) with a light weight memory network, which runs fast; 2) Unlike existing methods that establish an additional network to model motion of frames, we take full advantage of both motion vectors and residual errors that are freely available in video streams. To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos. Our method is evaluated on the large-scale ImageNet VID dataset, and the results show that it is 3x times faster than single image detector R-FCN and 10x times faster than high-performance detector MANet at a minor accuracy loss.
Tasks Object Detection, Real-Time Object Detection, Video Compression
Published 2018-11-27
URL https://arxiv.org/abs/1811.11057v3
PDF https://arxiv.org/pdf/1811.11057v3.pdf
PWC https://paperswithcode.com/paper/fast-object-detection-in-compressed-video
Repo
Framework

Brain-Computer Interface with Corrupted EEG Data: A Tensor Completion Approach

Title Brain-Computer Interface with Corrupted EEG Data: A Tensor Completion Approach
Authors Jordi Sole-Casals, Cesar F. Caiafa, Qibin Zhao, Adrzej Cichocki
Abstract One of the current issues in Brain-Computer Interface is how to deal with noisy Electroencephalography measurements organized as multidimensional datasets. On the other hand, recently, significant advances have been made in multidimensional signal completion algorithms that exploit tensor decomposition models to capture the intricate relationship among entries in a multidimensional signal. We propose to use tensor completion applied to EEG data for improving the classification performance in a motor imagery BCI system with corrupted measurements. Noisy measurements are considered as unknowns that are inferred from a tensor decomposition model. We evaluate the performance of four recently proposed tensor completion algorithms plus a simple interpolation strategy, first with random missing entries and then with missing samples constrained to have a specific structure (random missing channels), which is a more realistic assumption in BCI Applications. We measured the ability of these algorithms to reconstruct the tensor from observed data. Then, we tested the classification accuracy of imagined movement in a BCI experiment with missing samples. We show that for random missing entries, all tensor completion algorithms can recover missing samples increasing the classification performance compared to a simple interpolation approach. For the random missing channels case, we show that tensor completion algorithms help to reconstruct missing channels, significantly improving the accuracy in the classification of motor imagery, however, not at the same level as clean data. Tensor completion algorithms are useful in real BCI applications. The proposed strategy could allow using motor imagery BCI systems even when EEG data is highly affected by missing channels and/or samples, avoiding the need of new acquisitions in the calibration stage.
Tasks Calibration, EEG
Published 2018-06-13
URL http://arxiv.org/abs/1806.05017v2
PDF http://arxiv.org/pdf/1806.05017v2.pdf
PWC https://paperswithcode.com/paper/brain-computer-interface-with-corrupted-eeg
Repo
Framework
comments powered by Disqus