January 26, 2020

3412 words 17 mins read

Paper Group ANR 1599

Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction. Knowledge-rich Image Gist Understanding Beyond Literal Meaning. Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes. Maximum entropy methods for texture synthesis: theory and practice. Semi-supervised Complex-valued GAN for Polarimetric SAR …

Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction


Title	Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction
Authors	Liangyi Kang, Jie Liu, Lingqiao Liu, Qinfeng Shi, Dan Ye
Abstract	Charge prediction, determining charges for criminal cases by analyzing the textual fact descriptions, is a promising technology in legal assistant systems. In practice, the fact descriptions could exhibit a significant intra-class variation due to factors like non-normative use of language, which makes the prediction task very challenging, especially for charge classes with too few samples to cover the expression variation. In this work, we explore to use the charge definitions from criminal law to alleviate this issue. The key idea is that the expressions in a fact description should have corresponding formal terms in charge definitions, and those terms are shared across classes and could account for the diversity in the fact descriptions. Thus, we propose to create auxiliary fact representations from charge definitions to augment fact descriptions representation. The generated auxiliary representations are created through the interaction of fact description with the relevant charge definitions and terms in those definitions by integrated sentence- and word-level attention scheme. Experimental results on two datasets show that our model achieves significant improvement than baselines, especially for classes with few samples.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05202v1
PDF	https://arxiv.org/pdf/1911.05202v1.pdf
PWC	https://paperswithcode.com/paper/creating-auxiliary-representations-from
Repo
Framework

Knowledge-rich Image Gist Understanding Beyond Literal Meaning


Title	Knowledge-rich Image Gist Understanding Beyond Literal Meaning
Authors	Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz
Abstract	We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process.
Tasks
Published	2019-04-18
URL	http://arxiv.org/abs/1904.08709v1
PDF	http://arxiv.org/pdf/1904.08709v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-rich-image-gist-understanding
Repo
Framework

Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes


Title	Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes
Authors	Chaoyang Wang, Simon Lucey, Federico Perazzi, Oliver Wang
Abstract	We present a fully data-driven method to compute depth from diverse monocular video sequences that contain large amounts of non-rigid objects, e.g., people. In order to learn reconstruction cues for non-rigid scenes, we introduce a new dataset consisting of stereo videos scraped in-the-wild. This dataset has a wide variety of scene types, and features large amounts of nonrigid objects, especially people. From this, we compute disparity maps to be used as supervision to train our approach. We propose a loss function that allows us to generate a depth prediction even with unknown camera intrinsics and stereo baselines in the dataset. We validate the use of large amounts of Internet video by evaluating our method on existing video datasets with depth supervision, including SINTEL, and KITTI, and show that our approach generalizes better to natural scenes.
Tasks	Depth Estimation
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11112v1
PDF	http://arxiv.org/pdf/1904.11112v1.pdf
PWC	https://paperswithcode.com/paper/web-stereo-video-supervision-for-depth
Repo
Framework

Maximum entropy methods for texture synthesis: theory and practice


Title	Maximum entropy methods for texture synthesis: theory and practice
Authors	Valentin De Bortoli, Agnes Desolneux, Alain Durmus, Bruno Galerne, Arthur Leclaire
Abstract	Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experimentally, another framework to deal with this problem using an alternate sampling/minimization scheme. First, we use results from information geometry to assess that our method yields a probability measure which has maximum entropy under some constraints in expectation. Then, we turn to the analysis of our method and we show, using recent results from the Markov chain literature, that its error can be explicitly bounded with constants which depend polynomially in the dimension even in the non-convex setting. This includes the case where the constraints are defined via a differentiable neural network. Finally, we present an extensive experimental study of the model, including a comparison with state-of-the-art methods and an extension to style transfer.
Tasks	Image Generation, Style Transfer, Texture Synthesis
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01691v1
PDF	https://arxiv.org/pdf/1912.01691v1.pdf
PWC	https://paperswithcode.com/paper/maximum-entropy-methods-for-texture-synthesis
Repo
Framework

Semi-supervised Complex-valued GAN for Polarimetric SAR Image Classification


Title	Semi-supervised Complex-valued GAN for Polarimetric SAR Image Classification
Authors	Qigong Sun, Xiufang Li, Lingling Li, Xu Liu, Fang Liu, Licheng Jiao
Abstract	Polarimetric synthetic aperture radar (PolSAR) images are widely used in disaster detection and military reconnaissance and so on. However, their interpretation faces some challenges, e.g., deficiency of labeled data, inadequate utilization of data information and so on. In this paper, a complex-valued generative adversarial network (GAN) is proposed for the first time to address these issues. The complex number form of model complies with the physical mechanism of PolSAR data and in favor of utilizing and retaining amplitude and phase information of PolSAR data. GAN architecture and semi-supervised learning are combined to handle deficiency of labeled data. GAN expands training data and semi-supervised learning is used to train network with generated, labeled and unlabeled data. Experimental results on two benchmark data sets show that our model outperforms existing state-of-the-art models, especially for conditions with fewer labeled data.
Tasks	Image Classification
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03605v1
PDF	https://arxiv.org/pdf/1906.03605v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-complex-valued-gan-for
Repo
Framework

A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends


Title	A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends
Authors	Saptarshi Sengupta, Sanchita Basak, Pallabi Saikia, Sayak Paul, Vasilios Tsalavoutis, Frederick Atiah, Vadlamani Ravi, Alan Peters
Abstract	Deep learning has solved a problem that as little as five years ago was thought by many to be intractable - the automatic recognition of patterns in data; and it can do so with accuracy that often surpasses human beings. It has solved problems beyond the realm of traditional, hand-crafted machine learning algorithms and captured the imagination of practitioners trying to make sense out of the flood of data that now inundates our society. As public awareness of the efficacy of DL increases so does the desire to make use of it. But even for highly trained professionals it can be daunting to approach the rapidly increasing body of knowledge produced by experts in the field. Where does one start? How does one determine if a particular model is applicable to their problem? How does one train and deploy such a network? A primer on the subject can be a good place to start. With that in mind, we present an overview of some of the key multilayer ANNs that comprise DL. We also discuss some new automatic architecture optimization protocols that use multi-agent approaches. Further, since guaranteeing system uptime is becoming critical to many computer applications, we include a section on using neural networks for fault detection and subsequent mitigation. This is followed by an exploratory survey of several application areas where DL has emerged as a game-changing technology: anomalous behavior detection in financial applications or in financial time-series forecasting, predictive and prescriptive analytics, medical image processing and analysis and power systems research. The thrust of this review is to outline emerging areas of application-oriented research within the DL community as well as to provide a reference to researchers seeking to use it in their work for what it does best: statistical pattern recognition with unparalleled learning capacity with the ability to scale with information.
Tasks	Fault Detection, Time Series, Time Series Forecasting
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13294v3
PDF	https://arxiv.org/pdf/1905.13294v3.pdf
PWC	https://paperswithcode.com/paper/a-review-of-deep-learning-with-special
Repo
Framework

Investigations of the Influences of a CNN’s Receptive Field on Segmentation of Subnuclei of Bilateral Amygdalae


Title	Investigations of the Influences of a CNN’s Receptive Field on Segmentation of Subnuclei of Bilateral Amygdalae
Authors	Han Bao
Abstract	Segmentation of objects with various sizes is relatively less explored in medical imaging, and has been very challenging in computer vision tasks in general. We hypothesize that the receptive field of a deep model corresponds closely to the size of object to be segmented, which could critically influence the segmentation accuracy of objects with varied sizes. In this study, we employed “AmygNet”, a dual-branch fully convolutional neural network (FCNN) with two different sizes of receptive fields, to investigate the effects of receptive field on segmenting four major subnuclei of bilateral amygdalae. The experiment was conducted on 14 subjects, which are all 3-dimensional MRI human brain images. Since the scale of different subnuclear groups are different, by investigating the accuracy of each subnuclear group while using receptive fields of various sizes, we may find which kind of receptive field is suitable for object of which scale respectively. In the given condition, AmygNet with multiple receptive fields presents great potential in segmenting objects of different sizes.
Tasks
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02761v1
PDF	https://arxiv.org/pdf/1911.02761v1.pdf
PWC	https://paperswithcode.com/paper/investigations-of-the-influences-of-a-cnns
Repo
Framework

Entropic Regularization of Markov Decision Processes


Title	Entropic Regularization of Markov Decision Processes
Authors	Boris Belousov, Jan Peters
Abstract	An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment. Such interactive data gathering commonly leads to divergence towards dangerous or uninformative regions of the state space unless additional regularization measures are taken. Prior works proposed bounding the information loss measured by the Kullback-Leibler (KL) divergence at every policy improvement step to eliminate instability in the learning dynamics. In this paper, we consider a broader family of $f$-divergences, and more concretely $\alpha$-divergences, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures. In particular, common least-squares value function estimation coupled with advantage-weighted maximum likelihood policy improvement is shown to correspond to the Pearson $\chi^2$-divergence penalty. Other actor-critic pairs arise for various choices of the penalty-generating function $f$. On a concrete instantiation of our framework with the $\alpha$-divergence, we carry out asymptotic analysis of the solutions for different values of $\alpha$ and demonstrate the effects of the divergence function choice on common standard reinforcement learning problems.
Tasks
Published	2019-07-06
URL	https://arxiv.org/abs/1907.04214v2
PDF	https://arxiv.org/pdf/1907.04214v2.pdf
PWC	https://paperswithcode.com/paper/entropic-regularization-of-markov-decision
Repo
Framework

Semi-Supervised Regression using Cluster Ensemble and Low-Rank Co-Association Matrix Decomposition under Uncertainties


Title	Semi-Supervised Regression using Cluster Ensemble and Low-Rank Co-Association Matrix Decomposition under Uncertainties
Authors	Vladimir Berikov, Alexander Litvinenko
Abstract	In this paper, we solve a semi-supervised regression problem. Due to the lack of knowledge about the data structure and the presence of random noise, the considered data model is uncertain. We propose a method which combines graph Laplacian regularization and cluster ensemble methodologies. The co-association matrix of the ensemble is calculated on both labeled and unlabeled data; this matrix is used as a similarity matrix in the regularization framework to derive the predicted outputs. We use the low-rank decomposition of the co-association matrix to significantly speedup calculations and reduce memory. Numerical experiments using the Monte Carlo approach demonstrate robustness, efficiency, and scalability of the proposed method.
Tasks
Published	2019-01-13
URL	http://arxiv.org/abs/1901.03919v1
PDF	http://arxiv.org/pdf/1901.03919v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-regression-using-cluster
Repo
Framework

What Will Your Child Look Like? DNA-Net: Age and Gender Aware Kin Face Synthesizer


Title	What Will Your Child Look Like? DNA-Net: Age and Gender Aware Kin Face Synthesizer
Authors	Pengyu Gao, Siyu Xia, Joseph Robinson, Junkang Zhang, Chao Xia, Ming Shao, Yun Fu
Abstract	Visual kinship recognition aims to identify blood relatives from facial images. Its practical application– like in law-enforcement, video surveillance, automatic family album management, and more– has motivated many researchers to put forth effort on the topic as of recent. In this paper, we focus on a new view of visual kinship technology: kin-based face generation. Specifically, we propose a two-stage kin-face generation model to predict the appearance of a child given a pair of parents. The first stage includes a deep generative adversarial autoencoder conditioned on ages and genders to map between facial appearance and high-level features. The second stage is our proposed DNA-Net, which serves as a transformation between the deep and genetic features based on a random selection process to fuse genes of a parent pair to form the genes of a child. We demonstrate the effectiveness of the proposed method quantitatively and qualitatively: quantitatively, pre-trained models and human subjects perform kinship verification on the generated images of children; qualitatively, we show photo-realistic face images of children that closely resemble the given pair of parents. In the end, experiments validate that the proposed model synthesizes convincing kin-faces using both subjective and objective standards.
Tasks	Face Generation
Published	2019-11-16
URL	https://arxiv.org/abs/1911.07014v1
PDF	https://arxiv.org/pdf/1911.07014v1.pdf
PWC	https://paperswithcode.com/paper/what-will-your-child-look-like-dna-net-age
Repo
Framework


Title	Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem
Authors	Yi Zhang, Cheng Zeng, Hao Cheng, Chongjun Wang, Lei Zhang
Abstract	With the emergence of diverse data collection techniques, objects in real applications can be represented as multi-modal features. What’s more, objects may have multiple semantic meanings. Multi-modal and Multi-label (MMML) problem becomes a universal phenomenon. The quality of data collected from different channels are inconsistent and some of them may not benefit for prediction. In real life, not all the modalities are needed for prediction. As a result, we propose a novel instance-oriented Multi-modal Classifier Chains (MCC) algorithm for MMML problem, which can make convince prediction with partial modalities. MCC extracts different modalities for different instances in the testing phase. Extensive experiments are performed on one real-world herbs dataset and two public datasets to validate our proposed algorithm, which reveals that it may be better to extract many instead of all of the modalities at hand.
Tasks
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11857v1
PDF	https://arxiv.org/pdf/1907.11857v1.pdf
PWC	https://paperswithcode.com/paper/many-could-be-better-than-all-a-novel
Repo
Framework

k-Nearest Neighbor Optimization via Randomized Hyperstructure Convex Hull


Title	k-Nearest Neighbor Optimization via Randomized Hyperstructure Convex Hull
Authors	Jasper Kyle Catapang
Abstract	In the k-nearest neighbor algorithm (k-NN), the determination of classes for test instances is usually performed via a majority vote system, which may ignore the similarities among data. In this research, the researcher proposes an approach to fine-tune the selection of neighbors to be passed to the majority vote system through the construction of a random n-dimensional hyperstructure around the test instance by introducing a new threshold parameter. The accuracy of the proposed k-NN algorithm is 85.71%, while the accuracy of the conventional k-NN algorithm is 80.95% when performed on the Haberman’s Cancer Survival dataset, and 94.44% for the proposed k-NN algorithm, compared to the conventional’s 88.89% accuracy score on the Seeds dataset. The proposed k-NN algorithm is also on par with the conventional support vector machine algorithm accuracy, even on the Banknote Authentication and Iris datasets, even surpassing the accuracy of support vector machine on the Seeds dataset.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04559v1
PDF	https://arxiv.org/pdf/1906.04559v1.pdf
PWC	https://paperswithcode.com/paper/k-nearest-neighbor-optimization-via
Repo
Framework

Improving Fictitious Play Reinforcement Learning with Expanding Models


Title	Improving Fictitious Play Reinforcement Learning with Expanding Models
Authors	Rong-Jun Qin, Jing-Cheng Pang, Yang Yu
Abstract	Fictitious play with reinforcement learning is a general and effective framework for zero-sum games. However, using the current deep neural network models, the implementation of fictitious play faces crucial challenges. Neural network model training employs gradient descent approaches to update all connection weights, and thus is easy to forget the old opponents after training to beat the new opponents. Existing approaches often maintain a pool of historical policy models to avoid the forgetting. However, learning to beat a pool in stochastic games, i.e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples. In this paper, we propose a learning process with neural fictitious play to alleviate the above issues. We train a single model as our policy model, which consists of sub-models and a selector. Everytime facing a new opponent, the model is expanded by adding a new sub-model, where only the new sub-model is updated instead of the whole model. At the same time, the selector is also updated to mix up the new sub-model with the previous ones at the state-level, so that the model is maintained as a behavior strategy instead of a wide distribution over policy models. Experiments on Kuhn poker, a grid-world Treasure Hunting game, and Mini-RTS environments show that the proposed approach alleviates the forgetting problem, and consequently improves the learning efficiency and the robustness of neural fictitious play.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.11928v2
PDF	https://arxiv.org/pdf/1911.11928v2.pdf
PWC	https://paperswithcode.com/paper/improving-fictitious-play-reinforcement
Repo
Framework

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond


Title	Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond
Authors	Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab S. Mirrokni
Abstract	Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression.
Tasks	Model Compression
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12414v1
PDF	https://arxiv.org/pdf/1910.12414v1.pdf
PWC	https://paperswithcode.com/paper/locality-sensitive-hashing-for-f-divergences
Repo
Framework

VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions


Title	VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions
Authors	Pranava Madhyastha, Josiah Wang, Lucia Specia
Abstract	We address the task of evaluating image description generation systems. We propose a novel image-aware metric for this task: VIFIDEL. It estimates the faithfulness of a generated caption with respect to the content of the actual image, based on the semantic similarity between labels of objects depicted in images and words in the description. The metric is also able to take into account the relative importance of objects mentioned in human reference descriptions during evaluation. Even if these human reference descriptions are not available, VIFIDEL can still reliably evaluate system descriptions. The metric achieves high correlation with human judgments on two well-known datasets and is competitive with metrics that depend on human references
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09340v1
PDF	https://arxiv.org/pdf/1907.09340v1.pdf
PWC	https://paperswithcode.com/paper/vifidel-evaluating-the-visual-fidelity-of
Repo
Framework