July 27, 2019

3168 words 15 mins read

Paper Group ANR 648

Random matrix approach for primal-dual portfolio optimization problems. Evaluation of Hashing Methods Performance on Binary Feature Descriptors. Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms. Investigating the Impact of Data Volume and Domain Similarity on Transfer Learning Applications. The exploding grad …

Random matrix approach for primal-dual portfolio optimization problems


Title	Random matrix approach for primal-dual portfolio optimization problems
Authors	Daichi Tada, Hisashi Yamamoto, Takashi Shinzato
Abstract	In this paper, we revisit the portfolio optimization problems of the minimization/maximization of investment risk under constraints of budget and investment concentration (primal problem) and the maximization/minimization of investment concentration under constraints of budget and investment risk (dual problem) for the case that the variances of the return rates of the assets are identical. We analyze both optimization problems by using the Lagrange multiplier method and the random matrix approach. Thereafter, we compare the results obtained from our proposed approach with the results obtained in previous work. Moreover, we use numerical experiments to validate the results obtained from the replica approach and the random matrix approach as methods for analyzing both the primal and dual portfolio optimization problems.
Tasks	Portfolio Optimization
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04620v2
PDF	http://arxiv.org/pdf/1709.04620v2.pdf
PWC	https://paperswithcode.com/paper/random-matrix-approach-for-primal-dual
Repo
Framework

Evaluation of Hashing Methods Performance on Binary Feature Descriptors


Title	Evaluation of Hashing Methods Performance on Binary Feature Descriptors
Authors	Jacek Komorowski, Tomasz Trzcinski
Abstract	In this paper we evaluate performance of data-dependent hashing methods on binary data. The goal is to find a hashing method that can effectively produce lower dimensional binary representation of 512-bit FREAK descriptors. A representative sample of recent unsupervised, semi-supervised and supervised hashing methods was experimentally evaluated on large datasets of labelled binary FREAK feature descriptors.
Tasks
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06825v1
PDF	http://arxiv.org/pdf/1707.06825v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-hashing-methods-performance-on
Repo
Framework

Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms


Title	Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms
Authors	Thom Fruehwirth
Abstract	We present a straightforward source-to-source transformation that introduces justifications for user-defined constraints into the CHR programming language. Then a scheme of two rules suffices to allow for logical retraction (deletion, removal) of constraints during computation. Without the need to recompute from scratch, these rules remove not only the constraint but also undo all consequences of the rule applications that involved the constraint. We prove a confluence result concerning the rule scheme and show its correctness. When algorithms are written in CHR, constraints represent both data and operations. CHR is already incremental by nature, i.e. constraints can be added at runtime. Logical retraction adds decrementality. Hence any algorithm written in CHR with justifications will become fully dynamic. Operations can be undone and data can be removed at any point in the computation without compromising the correctness of the result. We present two classical examples of dynamic algorithms, written in our prototype implementation of CHR with justifications that is available online: maintaining the minimum of a changing set of numbers and shortest paths in a graph whose edges change.
Tasks
Published	2017-06-24
URL	http://arxiv.org/abs/1706.07946v2
PDF	http://arxiv.org/pdf/1706.07946v2.pdf
PWC	https://paperswithcode.com/paper/justifications-in-constraint-handling-rules
Repo
Framework

Investigating the Impact of Data Volume and Domain Similarity on Transfer Learning Applications


Title	Investigating the Impact of Data Volume and Domain Similarity on Transfer Learning Applications
Authors	Michael Bernico, Yuntao Li, Dingchao Zhang
Abstract	Transfer learning allows practitioners to recognize and apply knowledge learned in previous tasks (source task) to new tasks or new domains (target task), which share some commonality. The two important factors impacting the performance of transfer learning models are: (a) the size of the target dataset, and (b) the similarity in distribution between source and target domains. Thus far, there has been little investigation into just how important these factors are. In this paper, we investigate the impact of target dataset size and source/target domain similarity on model performance through a series of experiments. We find that more data is always beneficial, and model performance improves linearly with the log of data size, until we are out of data. As source/target domains differ, more data is required and fine tuning will render better performance than feature extraction. When source/target domains are similar and data size is small, fine tuning and feature extraction renders equivalent performance. Our hope is that by beginning this quantitative investigation on the effect of data volume and domain similarity in transfer learning we might inspire others to explore the significance of data in developing more accurate statistical models.
Tasks	Transfer Learning
Published	2017-12-11
URL	http://arxiv.org/abs/1712.04008v4
PDF	http://arxiv.org/pdf/1712.04008v4.pdf
PWC	https://paperswithcode.com/paper/investigating-the-impact-of-data-volume-and
Repo
Framework

The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions


Title	The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
Authors	George Philipp, Dawn Song, Jaime G. Carbonell
Abstract	Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities “solve” the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the collapsing domain problem, which can arise in architectures that avoid exploding gradients. ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks. We show this is a direct consequence of the Pythagorean equation. By noticing that any neural network is a residual network, we devise the residual trick, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.
Tasks
Published	2017-12-15
URL	http://arxiv.org/abs/1712.05577v4
PDF	http://arxiv.org/pdf/1712.05577v4.pdf
PWC	https://paperswithcode.com/paper/the-exploding-gradient-problem-demystified
Repo
Framework

An Infinite Hidden Markov Model With Similarity-Biased Transitions


Title	An Infinite Hidden Markov Model With Similarity-Biased Transitions
Authors	Colin Reimer Dawson, Chaofan Huang, Clayton T. Morrison
Abstract	We describe a generalization of the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) which is able to encode prior information that state transitions are more likely between “nearby” states. This is accomplished by defining a similarity function on the state space and scaling transition probabilities by pair-wise similarities, thereby inducing correlations among the transition distributions. We present an augmented data representation of the model as a Markov Jump Process in which: (1) some jump attempts fail, and (2) the probability of success is proportional to the similarity between the source and destination states. This augmentation restores conditional conjugacy and admits a simple Gibbs sampler. We evaluate the model and inference method on a speaker diarization task and a “harmonic parsing” task using four-part chorale data, as well as on several synthetic datasets, achieving favorable comparisons to existing models.
Tasks	Speaker Diarization
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06756v1
PDF	http://arxiv.org/pdf/1707.06756v1.pdf
PWC	https://paperswithcode.com/paper/an-infinite-hidden-markov-model-with
Repo
Framework

A Latent Variable Model for Two-Dimensional Canonical Correlation Analysis and its Variational Inference


Title	A Latent Variable Model for Two-Dimensional Canonical Correlation Analysis and its Variational Inference
Authors	Mehran Safayani, Saeid Momenzadeh
Abstract	Describing the dimension reduction (DR) techniques by means of probabilistic models has recently been given special attention. Probabilistic models, in addition to a better interpretability of the DR methods, provide a framework for further extensions of such algorithms. One of the new approaches to the probabilistic DR methods is to preserving the internal structure of data. It is meant that it is not necessary that the data first be converted from the matrix or tensor format to the vector format in the process of dimensionality reduction. In this paper, a latent variable model for matrix-variate data for canonical correlation analysis (CCA) is proposed. Since in general there is not any analytical maximum likelihood solution for this model, we present two approaches for learning the parameters. The proposed methods are evaluated using the synthetic data in terms of convergence and quality of mappings. Also, real data set is employed for assessing the proposed methods with several probabilistic and none-probabilistic CCA based approaches. The results confirm the superiority of the proposed methods with respect to the competing algorithms. Moreover, this model can be considered as a framework for further extensions.
Tasks	Dimensionality Reduction
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01519v1
PDF	http://arxiv.org/pdf/1708.01519v1.pdf
PWC	https://paperswithcode.com/paper/a-latent-variable-model-for-two-dimensional
Repo
Framework

Accelerated Block Coordinate Proximal Gradients with Applications in High Dimensional Statistics


Title	Accelerated Block Coordinate Proximal Gradients with Applications in High Dimensional Statistics
Authors	Tsz Kit Lau, Yuan Yao
Abstract	Nonconvex optimization problems arise in different research fields and arouse lots of attention in signal processing, statistics and machine learning. In this work, we explore the accelerated proximal gradient method and some of its variants which have been shown to converge under nonconvex context recently. We show that a novel variant proposed here, which exploits adaptive momentum and block coordinate update with specific update rules, further improves the performance of a broad class of nonconvex problems. In applications to sparse linear regression with regularizations like Lasso, grouped Lasso, capped $\ell_1$ and SCAP, the proposed scheme enjoys provable local linear convergence, with experimental justification.
Tasks
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05338v7
PDF	http://arxiv.org/pdf/1710.05338v7.pdf
PWC	https://paperswithcode.com/paper/accelerated-block-coordinate-proximal
Repo
Framework

Decentralized Online Learning with Kernels


Title	Decentralized Online Learning with Kernels
Authors	Alec Koppel, Santiago Paternain, Cedric Richard, Alejandro Ribeiro
Abstract	We consider multi-agent stochastic optimization problems over reproducing kernel Hilbert spaces (RKHS). In this setting, a network of interconnected agents aims to learn decision functions, i.e., nonlinear statistical models, that are optimal in terms of a global convex functional that aggregates data across the network, with only access to locally and sequentially observed samples. We propose solving this problem by allowing each agent to learn a local regression function while enforcing consensus constraints. We use a penalized variant of functional stochastic gradient descent operating simultaneously with low-dimensional subspace projections. These subspaces are constructed greedily by applying orthogonal matching pursuit to the sequence of kernel dictionaries and weights. By tuning the projection-induced bias, we propose an algorithm that allows for each individual agent to learn, based upon its locally observed data stream and message passing with its neighbors only, a regression function that is close to the globally optimal regression function. That is, we establish that with constant step-size selections agents’ functions converge to a neighborhood of the globally optimal one while satisfying the consensus constraints as the penalty parameter is increased. Moreover, the complexity of the learned regression functions is guaranteed to remain finite. On both multi-class kernel logistic regression and multi-class kernel support vector classification with data generated from class-dependent Gaussian mixture models, we observe stable function estimation and state of the art performance for distributed online multi-class classification. Experiments on the Brodatz textures further substantiate the empirical validity of this approach.
Tasks	Stochastic Optimization
Published	2017-10-11
URL	http://arxiv.org/abs/1710.04062v1
PDF	http://arxiv.org/pdf/1710.04062v1.pdf
PWC	https://paperswithcode.com/paper/decentralized-online-learning-with-kernels
Repo
Framework

A Sequential Neural Encoder with Latent Structured Description for Modeling Sentences


Title	A Sequential Neural Encoder with Latent Structured Description for Modeling Sentences
Authors	Yu-Ping Ruan, Qian Chen, Zhen-Hua Ling
Abstract	In this paper, we propose a sequential neural encoder with latent structured description (SNELSD) for modeling sentences. This model introduces latent chunk-level representations into conventional sequential neural encoders, i.e., recurrent neural networks (RNNs) with long short-term memory (LSTM) units, to consider the compositionality of languages in semantic modeling. An SNELSD model has a hierarchical structure that includes a detection layer and a description layer. The detection layer predicts the boundaries of latent word chunks in an input sentence and derives a chunk-level vector for each word. The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs. These output vectors are further concatenated with word vectors or the outputs of a chain LSTM encoder to obtain the final sentence representation. All the model parameters are learned in an end-to-end manner without a dependency on additional text chunking or syntax parsing. A natural language inference (NLI) task and a sentiment analysis (SA) task are adopted to evaluate the performance of our proposed model. The experimental results demonstrate the effectiveness of the proposed SNELSD model on exploring task-dependent chunking patterns during the semantic modeling of sentences. Furthermore, the proposed method achieves better performance than conventional chain LSTMs and tree-structured LSTMs on both tasks.
Tasks	Chunking, Natural Language Inference, Sentence Embeddings, Sentiment Analysis
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05433v1
PDF	http://arxiv.org/pdf/1711.05433v1.pdf
PWC	https://paperswithcode.com/paper/a-sequential-neural-encoder-with-latent
Repo
Framework

Vocabulary-informed Extreme Value Learning


Title	Vocabulary-informed Extreme Value Learning
Authors	Yanwei Fu, HanZe Dong, Yu-feng Ma, Zhengjun Zhang, Xiangyang Xue
Abstract	The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabulary-informed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm – Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zero-shot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks.
Tasks	Open Set Learning
Published	2017-05-28
URL	http://arxiv.org/abs/1705.09887v2
PDF	http://arxiv.org/pdf/1705.09887v2.pdf
PWC	https://paperswithcode.com/paper/vocabulary-informed-extreme-value-learning
Repo
Framework

Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking


Title	Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking
Authors	James Tompkin, Kwang In Kim, Hanspeter Pfister, Christian Theobalt
Abstract	Large databases are often organized by hand-labeled metadata, or criteria, which are expensive to collect. We can use unsupervised learning to model database variation, but these models are often high dimensional, complex to parameterize, or require expert knowledge. We learn low-dimensional continuous criteria via interactive ranking, so that the novice user need only describe the relative ordering of examples. This is formed as semi-supervised label propagation in which we maximize the information gained from a limited number of examples. Further, we actively suggest data points to the user to rank in a more informative way than existing work. Our efficient approach allows users to interactively organize thousands of data points along 1D and 2D continuous sliders. We experiment with datasets of imagery and geometry to demonstrate that our tool is useful for quickly assessing and organizing the content of large databases.
Tasks
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03863v1
PDF	http://arxiv.org/pdf/1706.03863v1.pdf
PWC	https://paperswithcode.com/paper/criteria-sliders-learning-continuous-database
Repo
Framework

Too Far to See? Not Really! — Pedestrian Detection with Scale-aware Localization Policy


Title	Too Far to See? Not Really! — Pedestrian Detection with Scale-aware Localization Policy
Authors	Xiaowei Zhang, Li Cheng, Bo Li, Hai-Miao Hu
Abstract	A major bottleneck of pedestrian detection lies on the sharp performance deterioration in the presence of small-size pedestrians that are relatively far from the camera. Motivated by the observation that pedestrians of disparate spatial scales exhibit distinct visual appearances, we propose in this paper an active pedestrian detector that explicitly operates over multiple-layer neuronal representations of the input still image. More specifically, convolutional neural nets such as ResNet and faster R-CNNs are exploited to provide a rich and discriminative hierarchy of feature representations as well as initial pedestrian proposals. Here each pedestrian observation of distinct size could be best characterized in terms of the ResNet feature representation at a certain layer of the hierarchy; Meanwhile, initial pedestrian proposals are attained by faster R-CNNs techniques, i.e. region proposal network and follow-up region of interesting pooling layer employed right after the specific ResNet convolutional layer of interest, to produce joint predictions on the bounding-box proposals’ locations and categories (i.e. pedestrian or not). This is engaged as input to our active detector where for each initial pedestrian proposal, a sequence of coordinate transformation actions is carried out to determine its proper x-y 2D location and layer of feature representation, or eventually terminated as being background. Empirically our approach is demonstrated to produce overall lower detection errors on widely-used benchmarks, and it works particularly well with far-scale pedestrians. For example, compared with 60.51% log-average miss rate of the state-of-the-art MS-CNN for far-scale pedestrians (those below 80 pixels in bounding-box height) of the Caltech benchmark, the miss rate of our approach is 41.85%, with a notable reduction of 18.68%.
Tasks	Pedestrian Detection
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00235v1
PDF	http://arxiv.org/pdf/1709.00235v1.pdf
PWC	https://paperswithcode.com/paper/too-far-to-see-not-really-pedestrian
Repo
Framework

The Absent-Minded Driver Problem Redux


Title	The Absent-Minded Driver Problem Redux
Authors	Subhash Kak
Abstract	This paper reconsiders the problem of the absent-minded driver who must choose between alternatives with different payoff with imperfect recall and varying degrees of knowledge of the system. The classical absent-minded driver problem represents the case with limited information and it has bearing on the general area of communication and learning, social choice, mechanism design, auctions, theories of knowledge, belief, and rational agency. Within the framework of extensive games, this problem has applications to many artificial intelligence scenarios. It is obvious that the performance of the agent improves as information available increases. It is shown that a non-uniform assignment strategy for successive choices does better than a fixed probability strategy. We consider both classical and quantum approaches to the problem. We argue that the superior performance of quantum decisions with access to entanglement cannot be fairly compared to a classical algorithm. If the cognitive systems of agents are taken to have access to quantum resources, or have a quantum mechanical basis, then that can be leveraged into superior performance.
Tasks
Published	2017-02-19
URL	http://arxiv.org/abs/1702.05778v1
PDF	http://arxiv.org/pdf/1702.05778v1.pdf
PWC	https://paperswithcode.com/paper/the-absent-minded-driver-problem-redux
Repo
Framework

A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community


Title	A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
Authors	John E. Ball, Derek T. Anderson, Chee Seng Chan
Abstract	In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.
Tasks	Speech Recognition, Transfer Learning
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00308v2
PDF	http://arxiv.org/pdf/1709.00308v2.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-survey-of-deep-learning-in
Repo
Framework