Paper Group ANR 55
Geometry of Optimization and Implicit Regularization in Deep Learning. MAT: A Multimodal Attentive Translator for Image Captioning. Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics. Understanding the visual speech signal. 3D seismic data denoising using two-dimensional sparse coding scheme. Replicator Equation: Applicat …
Geometry of Optimization and Implicit Regularization in Deep Learning
Title | Geometry of Optimization and Implicit Regularization in Deep Learning |
Authors | Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro |
Abstract | We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry. |
Tasks | |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.03071v1 |
http://arxiv.org/pdf/1705.03071v1.pdf | |
PWC | https://paperswithcode.com/paper/geometry-of-optimization-and-implicit |
Repo | |
Framework | |
MAT: A Multimodal Attentive Translator for Image Captioning
Title | MAT: A Multimodal Attentive Translator for Image Captioning |
Authors | Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille |
Abstract | In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40). |
Tasks | Image Captioning, Machine Translation |
Published | 2017-02-18 |
URL | http://arxiv.org/abs/1702.05658v3 |
http://arxiv.org/pdf/1702.05658v3.pdf | |
PWC | https://paperswithcode.com/paper/mat-a-multimodal-attentive-translator-for |
Repo | |
Framework | |
Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics
Title | Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics |
Authors | Aoxue Li, Zhiwu Lu, Liwei Wang, Tao Xiang, Xinqi Li, Ji-Rong Wen |
Abstract | Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task due to two main issues: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e. zero-shot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot fine-grained classification. |
Tasks | Domain Adaptation, Fine-Grained Image Classification, Image Classification, Zero-Shot Learning |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.00785v1 |
http://arxiv.org/pdf/1707.00785v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-fine-grained-classification-by-deep |
Repo | |
Framework | |
Understanding the visual speech signal
Title | Understanding the visual speech signal |
Authors | Helen L Bear |
Abstract | For machines to lipread, or understand speech from lip movement, they decode lip-motions (known as visemes) into the spoken sounds. We investigate the visual speech channel to further our understanding of visemes. This has applications beyond machine lipreading; speech therapists, animators, and psychologists can benefit from this work. We explain the influence of speaker individuality, and demonstrate how one can use visemes to boost lipreading. |
Tasks | Lipreading |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01351v1 |
http://arxiv.org/pdf/1710.01351v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-visual-speech-signal |
Repo | |
Framework | |
3D seismic data denoising using two-dimensional sparse coding scheme
Title | 3D seismic data denoising using two-dimensional sparse coding scheme |
Authors | Ming-Jun Su, Jingbo Chang, Feng Qian, Guangmin Hu, Xiao-Yang Liu |
Abstract | Seismic data denoising is vital to geophysical applications and the transform-based function method is one of the most widely used techniques. However, it is challenging to design a suit- able sparse representation to express a transform-based func- tion group due to the complexity of seismic data. In this paper, we apply a seismic data denoising method based on learning- type overcomplete dictionaries which uses two-dimensional sparse coding (2DSC). First, we model the input seismic data and dictionaries as third-order tensors and introduce tensor- linear combinations for data approximation. Second, we ap- ply learning-type overcomplete dictionary, i.e., optimal sparse data representation is achieved through learning and training. Third, we exploit the alternating minimization algorithm to solve the optimization problem of seismic denoising. Finally we evaluate its denoising performance on synthetic seismic data and land data survey. Experiment results show that the two-dimensional sparse coding scheme reduces computational costs and enhances the signal-to-noise ratio. |
Tasks | Denoising |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.04429v1 |
http://arxiv.org/pdf/1704.04429v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-seismic-data-denoising-using-two |
Repo | |
Framework | |
Replicator Equation: Applications Revisited
Title | Replicator Equation: Applications Revisited |
Authors | Tinsae G. Dulecha |
Abstract | The replicator equation is a simple model of evolution that leads to stable form of Nash Equilibrium, Evolutionary Stable Strategy (ESS). It has been studied in connection with Evolutionary Game Theory and was originally developed for symmetric games. Beyond its first emphasis in biological use, evolutionary game theory has been expanded well beyond in social studies for behavioral analysis, in machine learning, computer vision and others. Its several applications in the fields of machine learning and computer vision has drawn my attention which is the reason to write this extended abstract |
Tasks | |
Published | 2017-04-16 |
URL | http://arxiv.org/abs/1704.04805v2 |
http://arxiv.org/pdf/1704.04805v2.pdf | |
PWC | https://paperswithcode.com/paper/replicator-equation-applications-revisited |
Repo | |
Framework | |
Designing Autonomous Vehicles: Evaluating the Role of Human Emotions and Social Norms
Title | Designing Autonomous Vehicles: Evaluating the Role of Human Emotions and Social Norms |
Authors | Faisal Riaz, Muaz A. Niazi |
Abstract | Humans are going to delegate the rights of driving to the autonomous vehicles in near future. However, to fulfill this complicated task, there is a need for a mechanism, which enforces the autonomous vehicles to obey the road and social rules that have been practiced by well-behaved drivers. This task can be achieved by introducing social norms compliance mechanism in the autonomous vehicles. This research paper is proposing an artificial society of autonomous vehicles as an analogy of human social society. Each AV has been assigned a social personality having different social influence. Social norms have been introduced which help the AVs in making the decisions, influenced by emotions, regarding road collision avoidance. Furthermore, social norms compliance mechanism, by artificial social AVs, has been proposed using prospect based emotion i.e. fear, which is conceived from OCC model. Fuzzy logic has been employed to compute the emotions quantitatively. Then, using SimConnect approach, fuzzy values of fear has been provided to the Netlogo simulation environment to simulate artificial society of AVs. Extensive testing has been performed using the behavior space tool to find out the performance of the proposed approach in terms of the number of collisions. For comparison, the random-walk model based artificial society of AVs has been proposed as well. A comparative study with a random walk, prove that proposed approach provides a better option to tailor the autopilots of future AVS, Which will be more socially acceptable and trustworthy by their riders in terms of safe road travel. |
Tasks | Autonomous Vehicles |
Published | 2017-08-06 |
URL | http://arxiv.org/abs/1708.01925v1 |
http://arxiv.org/pdf/1708.01925v1.pdf | |
PWC | https://paperswithcode.com/paper/designing-autonomous-vehicles-evaluating-the |
Repo | |
Framework | |
Background Subtraction via Fast Robust Matrix Completion
Title | Background Subtraction via Fast Robust Matrix Completion |
Authors | Behnaz Rezaei, Sarah Ostadabbas |
Abstract | Background subtraction is the primary task of the majority of video inspection systems. The most important part of the background subtraction which is common among different algorithms is background modeling. In this regard, our paper addresses the problem of background modeling in a computationally efficient way, which is important for current eruption of “big data” processing coming from high resolution multi-channel videos. Our model is based on the assumption that background in natural images lies on a low-dimensional subspace. We formulated and solved this problem in a low-rank matrix completion framework. In modeling the background, we benefited from the in-face extended Frank-Wolfe algorithm for solving a defined convex optimization problem. We evaluated our fast robust matrix completion (fRMC) method on both background models challenge (BMC) and Stuttgart artificial background subtraction (SABS) datasets. The results were compared with the robust principle component analysis (RPCA) and low-rank robust matrix completion (RMC) methods, both solved by inexact augmented Lagrangian multiplier (IALM). The results showed faster computation, at least twice as when IALM solver is used, while having a comparable accuracy even better in some challenges, in subtracting the backgrounds in order to detect moving objects in the scene. |
Tasks | Low-Rank Matrix Completion, Matrix Completion |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01218v1 |
http://arxiv.org/pdf/1711.01218v1.pdf | |
PWC | https://paperswithcode.com/paper/background-subtraction-via-fast-robust-matrix |
Repo | |
Framework | |
A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications
Title | A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications |
Authors | Nariman Farsad, David Pan, Andrea Goldsmith |
Abstract | This work presents a new multi-chemical experimental platform for molecular communication where the transmitter can release different chemicals. This platform is designed to be inexpensive and accessible, and it can be expanded to simulate different environments including the cardiovascular system and complex network of pipes in industrial complexes and city infrastructures. To demonstrate the capabilities of the platform, we implement a time-slotted binary communication system where a bit-0 is represented by an acid pulse, a bit-1 by a base pulse, and information is carried via pH signals. The channel model for this system, which is nonlinear and has long memories, is unknown. Therefore, we devise novel detection algorithms that use techniques from machine learning and deep learning to train a maximum-likelihood detector. Using these algorithms the bit error rate improves by an order of magnitude relative to the approach used in previous works. Moreover, our system achieves a data rate that is an order of magnitude higher than any of the previous molecular communication platforms. |
Tasks | |
Published | 2017-04-16 |
URL | http://arxiv.org/abs/1704.04810v1 |
http://arxiv.org/pdf/1704.04810v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-experimental-platform-for-in-vessel |
Repo | |
Framework | |
Representation Learning using Event-based STDP
Title | Representation Learning using Event-based STDP |
Authors | Amirhossein Tavanaei, Timothee Masquelier, Anthony Maida |
Abstract | Although representation learning methods developed within the framework of traditional neural networks are relatively mature, developing a spiking representation model remains a challenging problem. This paper proposes an event-based method to train a feedforward spiking neural network (SNN) layer for extracting visual features. The method introduces a novel spike-timing-dependent plasticity (STDP) learning rule and a threshold adjustment rule both derived from a vector quantization-like objective function subject to a sparsity constraint. The STDP rule is obtained by the gradient of a vector quantization criterion that is converted to spike-based, spatio-temporally local update rules in a spiking network of leaky, integrate-and-fire (LIF) neurons. Independence and sparsity of the model are achieved by the threshold adjustment rule and by a softmax function implementing inhibition in the representation layer consisting of WTA-thresholded spiking neurons. Together, these mechanisms implement a form of spike-based, competitive learning. Two sets of experiments are performed on the MNIST and natural image datasets. The results demonstrate a sparse spiking visual representation model with low reconstruction loss comparable with state-of-the-art visual coding approaches, yet our rule is local in both time and space, thus biologically plausible and hardware friendly. |
Tasks | Quantization, Representation Learning |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06699v3 |
http://arxiv.org/pdf/1706.06699v3.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-using-event-based |
Repo | |
Framework | |
Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora
Title | Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora |
Authors | Ebrahim Ansari, M. H. Sadreddini, Mostafa Sheikhalishahi, Richard Wallace, Fatemeh Alimardani |
Abstract | The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource language pairs there are not enough parallel corpora to build an accurate SMT. In this paper, a novel approach is presented to extract bilingual Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In this study, English is used as the pivot language to compute the matching scores between source and target sentences and candidate selection phase. Additionally, a new monolingual sentence similarity metric, Normalized Google Distance (NGD) is proposed to improve the matching process. Moreover, some extensions of the baseline system are applied to improve the quality of extracted sentences measured with BLEU. Experimental results show that using the new pivot based extraction can increase the quality of bilingual corpus significantly and consequently improves the performance of the Persian-Italian SMT system. |
Tasks | Machine Translation |
Published | 2017-01-29 |
URL | http://arxiv.org/abs/1701.08339v1 |
http://arxiv.org/pdf/1701.08339v1.pdf | |
PWC | https://paperswithcode.com/paper/using-english-as-pivot-to-extract-persian |
Repo | |
Framework | |
Joint Epipolar Tracking (JET): Simultaneous optimization of epipolar geometry and feature correspondences
Title | Joint Epipolar Tracking (JET): Simultaneous optimization of epipolar geometry and feature correspondences |
Authors | Henry Bradler, Matthias Ochs, Rudolf Mester |
Abstract | Traditionally, pose estimation is considered as a two step problem. First, feature correspondences are determined by direct comparison of image patches, or by associating feature descriptors. In a second step, the relative pose and the coordinates of corresponding points are estimated, most often by minimizing the reprojection error (RPE). RPE optimization is based on a loss function that is merely aware of the feature pixel positions but not of the underlying image intensities. In this paper, we propose a sparse direct method which introduces a loss function that allows to simultaneously optimize the unscaled relative pose, as well as the set of feature correspondences directly considering the image intensity values. Furthermore, we show how to integrate statistical prior information on the motion into the optimization process. This constructive inclusion of a Bayesian bias term is particularly efficient in application cases with a strongly predictable (short term) dynamic, e.g. in a driving scenario. In our experiments, we demonstrate that the JET algorithm we propose outperforms the classical reprojection error optimization on two synthetic datasets and on the KITTI dataset. The JET algorithm runs in real-time on a single CPU thread. |
Tasks | Pose Estimation |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05065v1 |
http://arxiv.org/pdf/1703.05065v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-epipolar-tracking-jet-simultaneous |
Repo | |
Framework | |
A brain signature highly predictive of future progression to Alzheimer’s dementia
Title | A brain signature highly predictive of future progression to Alzheimer’s dementia |
Authors | Christian Dansereau, Angela Tam, AmanPreet Badhwar, Sebastian Urchs, Pierre Orban, Pedro Rosa-Neto, Pierre Bellec |
Abstract | Early prognosis of Alzheimer’s dementia is hard. Mild cognitive impairment (MCI) typically precedes Alzheimer’s dementia, yet only a fraction of MCI individuals will progress to dementia, even when screened using biomarkers. We propose here to identify a subset of individuals who share a common brain signature highly predictive of oncoming dementia. This signature was composed of brain atrophy and functional dysconnectivity and discovered using a machine learning model in patients suffering from dementia. The model recognized the same brain signature in MCI individuals, 90% of which progressed to dementia within three years. This result is a marked improvement on the state-of-the-art in prognostic precision, while the brain signature still identified 47% of all MCI progressors. We thus discovered a sizable MCI subpopulation which represents an excellent recruitment target for clinical trials at the prodromal stage of Alzheimer’s disease. |
Tasks | |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.08058v2 |
http://arxiv.org/pdf/1712.08058v2.pdf | |
PWC | https://paperswithcode.com/paper/a-brain-signature-highly-predictive-of-future |
Repo | |
Framework | |
Optimized Cost per Click in Taobao Display Advertising
Title | Optimized Cost per Click in Taobao Display Advertising |
Authors | Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, Kun Gai |
Abstract | Taobao, as the largest online retail platform in the world, provides billions of online display advertising impressions for millions of advertisers every day. For commercial purposes, the advertisers bid for specific spots and target crowds to compete for business traffic. The platform chooses the most suitable ads to display in tens of milliseconds. Common pricing methods include cost per mille (CPM) and cost per click (CPC). Traditional advertising systems target certain traits of users and ad placements with fixed bids, essentially regarded as coarse-grained matching of bid and traffic quality. However, the fixed bids set by the advertisers competing for different quality requests cannot fully optimize the advertisers’ key requirements. Moreover, the platform has to be responsible for the business revenue and user experience. Thus, we proposed a bid optimizing strategy called optimized cost per click (OCPC) which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity. Our approach optimizes advertisers’ demands, platform business revenue and user experience and as a whole improves traffic allocation efficiency. We have validated our approach in Taobao display advertising system in production. The online A/B test shows our algorithm yields substantially better results than previous fixed bid manner. |
Tasks | |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1703.02091v4 |
http://arxiv.org/pdf/1703.02091v4.pdf | |
PWC | https://paperswithcode.com/paper/optimized-cost-per-click-in-taobao-display |
Repo | |
Framework | |
OPEB: Open Physical Environment Benchmark for Artificial Intelligence
Title | OPEB: Open Physical Environment Benchmark for Artificial Intelligence |
Authors | Hamid Mirzaei, Mona Fathollahi, Tony Givargis |
Abstract | Artificial Intelligence methods to solve continuous- control tasks have made significant progress in recent years. However, these algorithms have important limitations and still need significant improvement to be used in industry and real- world applications. This means that this area is still in an active research phase. To involve a large number of research groups, standard benchmarks are needed to evaluate and compare proposed algorithms. In this paper, we propose a physical environment benchmark framework to facilitate collaborative research in this area by enabling different research groups to integrate their designed benchmarks in a unified cloud-based repository and also share their actual implemented benchmarks via the cloud. We demonstrate the proposed framework using an actual implementation of the classical mountain-car example and present the results obtained using a Reinforcement Learning algorithm. |
Tasks | Continuous Control |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.00790v1 |
http://arxiv.org/pdf/1707.00790v1.pdf | |
PWC | https://paperswithcode.com/paper/opeb-open-physical-environment-benchmark-for |
Repo | |
Framework | |