Paper Group AWR 155
Pros and Cons of GAN Evaluation Measures. Distributed linear regression by averaging. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. A Simple Recurrent Unit with Reduced Tensor Product Representations. Regularization Learning Networks: Deep Learning for Tabular Datasets. Canoni …
Pros and Cons of GAN Evaluation Measures
Title | Pros and Cons of GAN Evaluation Measures |
Authors | Ali Borji |
Abstract | Generative models, in particular generative adversarial networks (GANs), have received significant attention recently. A number of GAN variants have been proposed and have been utilized in many applications. Despite large strides in terms of theoretical progress, evaluating and comparing GANs remains a daunting task. While several measures have been introduced, as of yet, there is no consensus as to which measure best captures strengths and limitations of models and should be used for fair model comparison. As in other areas of computer vision and machine learning, it is critical to settle on one or few good measures to steer the progress in this field. In this paper, I review and critically discuss more than 24 quantitative and 5 qualitative measures for evaluating generative models with a particular emphasis on GAN-derived models. I also provide a set of 7 desiderata followed by an evaluation of whether a given measure or a family of measures is compatible with them. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03446v5 |
http://arxiv.org/pdf/1802.03446v5.pdf | |
PWC | https://paperswithcode.com/paper/pros-and-cons-of-gan-evaluation-measures |
Repo | https://github.com/suhitd1729/Action-cameras-evaluating-image-quality-and-suitability-for-machine-learning |
Framework | pytorch |
Distributed linear regression by averaging
Title | Distributed linear regression by averaging |
Authors | Edgar Dobriban, Yue Sheng |
Abstract | Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck. In this paper, we study one-step and iterative weighted parameter averaging in statistical linear models under data parallelism. We do linear regression on each machine, send the results to a central server, and take a weighted average of the parameters. Optionally, we iterate, sending back the weighted average and doing local ridge regressions centered at it. How does this work compared to doing linear regression on the full data? Here we study the performance loss in estimation, test error, and confidence interval length in high dimensions, where the number of parameters is comparable to the training data size. We find the performance loss in one-step weighted averaging, and also give results for iterative averaging. We also find that different problems are affected differently by the distributed framework. Estimation error and confidence interval length increase a lot, while prediction error increases much less. We rely on recent results from random matrix theory, where we develop a new calculus of deterministic equivalents as a tool of broader interest. |
Tasks | |
Published | 2018-09-30 |
URL | https://arxiv.org/abs/1810.00412v2 |
https://arxiv.org/pdf/1810.00412v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-linear-regression-by-averaging |
Repo | https://github.com/dobriban/Dist |
Framework | none |
Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain
Title | Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain |
Authors | Yu-Xiang Wang |
Abstract | We revisit the problem of linear regression under a differential privacy constraint. By consolidating existing pieces in the literature, we clarify the correct dependence of the feature, label and coefficient domains in the optimization error and estimation error, hence revealing the delicate price of differential privacy in statistical estimation and statistical learning. Moreover, we propose simple modifications of two existing DP algorithms: (a) posterior sampling, (b) sufficient statistics perturbation, and show that they can be upgraded into adaptive algorithms that are able to exploit data-dependent quantities and behave nearly optimally for every instance. Extensive experiments are conducted on both simulated data and real data, which conclude that both AdaOPS and AdaSSP outperform the existing techniques on nearly all 36 data sets that we test on. |
Tasks | |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02596v2 |
http://arxiv.org/pdf/1803.02596v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-differentially-private-linear |
Repo | https://github.com/yuxiangw/optimal_dp_linear_regression |
Framework | none |
A Simple Recurrent Unit with Reduced Tensor Product Representations
Title | A Simple Recurrent Unit with Reduced Tensor Product Representations |
Authors | Shuai Tang, Paul Smolensky, Virginia R. de Sa |
Abstract | idely used recurrent units, including Long-short Term Memory (LSTM) and the Gated Recurrent Unit (GRU), perform well on natural language tasks, but their ability to learn structured representations is still questionable. Exploiting reduced Tensor Product Representations (TPRs) — distributed representations of symbolic structure in which vector-embedded symbols are bound to vector-embedded structural positions — we propose the TPRU, a simple recurrent unit that, at each time step, explicitly executes structural-role binding and unbinding operations to incorporate structural information into learning. A gradient analysis of our proposed TPRU is conducted to support our model design, and its performance on multiple datasets shows the effectiveness of our design choices. Furthermore, observations on a linguistically grounded study demonstrate the interpretability of our TPRU. |
Tasks | Natural Language Inference |
Published | 2018-10-29 |
URL | https://arxiv.org/abs/1810.12456v6 |
https://arxiv.org/pdf/1810.12456v6.pdf | |
PWC | https://paperswithcode.com/paper/learning-distributed-representations-of-1 |
Repo | https://github.com/shuaitang/TPRU |
Framework | pytorch |
Regularization Learning Networks: Deep Learning for Tabular Datasets
Title | Regularization Learning Networks: Deep Learning for Tabular Datasets |
Authors | Ira Shavitt, Eran Segal |
Abstract | Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparameters. Here, we introduce Regularization Learning Networks (RLNs), which overcome this challenge by introducing an efficient hyperparameter tuning scheme which minimizes a new Counterfactual Loss. Our results show that RLNs significantly improve DNNs on tabular datasets, and achieve comparable results to GBTs, with the best performance achieved with an ensemble that combines GBTs and RLNs. RLNs produce extremely sparse networks, eliminating up to 99.8% of the network edges and 82% of the input features, thus providing more interpretable models and reveal the importance that the network assigns to different inputs. RLNs could efficiently learn a single network in datasets that comprise both tabular and unstructured data, such as in the setting of medical imaging accompanied by electronic health records. An open source implementation of RLN can be found at https://github.com/irashavitt/regularization_learning_networks. |
Tasks | |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06440v3 |
http://arxiv.org/pdf/1805.06440v3.pdf | |
PWC | https://paperswithcode.com/paper/regularization-learning-networks-deep |
Repo | https://github.com/petteriTeikari/RLN_tabularData |
Framework | none |
Canonical Tensor Decomposition for Knowledge Base Completion
Title | Canonical Tensor Decomposition for Knowledge Base Completion |
Authors | Timothée Lacroix, Nicolas Usunier, Guillaume Obozinski |
Abstract | The problem of Knowledge Base Completion can be framed as a 3rd-order binary tensor completion problem. In this light, the Canonical Tensor Decomposition (CP) (Hitchcock, 1927) seems like a natural solution; however, current implementations of CP on standard Knowledge Base Completion benchmarks are lagging behind their competitors. In this work, we attempt to understand the limits of CP for knowledge base completion. First, we motivate and test a novel regularizer, based on tensor nuclear $p$-norms. Then, we present a reformulation of the problem that makes it invariant to arbitrary choices in the inclusion of predicates or their reciprocals in the dataset. These two methods combined allow us to beat the current state of the art on several datasets with a CP decomposition, and obtain even better results using the more advanced ComplEx model. |
Tasks | Knowledge Base Completion, Link Prediction |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07297v1 |
http://arxiv.org/pdf/1806.07297v1.pdf | |
PWC | https://paperswithcode.com/paper/canonical-tensor-decomposition-for-knowledge |
Repo | https://github.com/facebookresearch/kbc |
Framework | pytorch |
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Title | Gradient Energy Matching for Distributed Asynchronous Gradient Descent |
Authors | Joeri Hermans, Gilles Louppe |
Abstract | Distributed asynchronous SGD has become widely used for deep learning in large-scale systems, but remains notorious for its instability when increasing the number of workers. In this work, we study the dynamics of distributed asynchronous SGD under the lens of Lagrangian mechanics. Using this description, we introduce the concept of energy to describe the optimization process and derive a sufficient condition ensuring its stability as long as the collective energy induced by the active workers remains below the energy of a target synchronous process. Making use of this criterion, we derive a stable distributed asynchronous optimization procedure, GEM, that estimates and maintains the energy of the asynchronous system below or equal to the energy of sequential SGD with momentum. Experimental results highlight the stability and speedup of GEM compared to existing schemes, even when scaling to one hundred asynchronous workers. Results also indicate better generalization compared to the targeted SGD with momentum. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08469v1 |
http://arxiv.org/pdf/1805.08469v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-energy-matching-for-distributed |
Repo | https://github.com/vlimant/NNLO |
Framework | tf |
SiCloPe: Silhouette-Based Clothed People
Title | SiCloPe: Silhouette-Based Clothed People |
Authors | Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, Shigeo Morishima |
Abstract | We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. Our method can reconstruct a complete and textured 3D model of a person wearing clothes from a single input picture. Inspired by the visual hull algorithm, our implicit representation uses 2D silhouettes and 3D joints of a body pose to describe the immense shape complexity and variations of clothed people. Given a segmented 2D silhouette of a person and its inferred 3D joints from the input picture, we first synthesize consistent silhouettes from novel view points around the subject. The synthesized silhouettes which are the most consistent with the input segmentation are fed into a deep visual hull algorithm for robust 3D shape prediction. We then infer the texture of the subject’s back view using the frontal image and segmentation mask as input to a conditional generative adversarial network. Our experiments demonstrate that our silhouette-based model is an effective representation and the appearance of the back view can be predicted reliably using an image-to-image translation network. While classic methods based on parametric models often fail for single-view images of subjects with challenging clothing, our approach can still produce successful results, which are comparable to those obtained from multi-view input. |
Tasks | Image-to-Image Translation |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1901.00049v2 |
http://arxiv.org/pdf/1901.00049v2.pdf | |
PWC | https://paperswithcode.com/paper/siclope-silhouette-based-clothed-people |
Repo | https://github.com/shunsukesaito/PIFu |
Framework | pytorch |
Automatic brain tumor grading from MRI data using convolutional neural networks and quality assessment
Title | Automatic brain tumor grading from MRI data using convolutional neural networks and quality assessment |
Authors | Sergio Pereira, Raphael Meier, Victor Alves, Mauricio Reyes, Carlos A. Silva |
Abstract | Glioblastoma Multiforme is a high grade, very aggressive, brain tumor, with patients having a poor prognosis. Lower grade gliomas are less aggressive, but they can evolve into higher grade tumors over time. Patient management and treatment can vary considerably with tumor grade, ranging from tumor resection followed by a combined radio- and chemotherapy to a “wait and see” approach. Hence, tumor grading is important for adequate treatment planning and monitoring. The gold standard for tumor grading relies on histopathological diagnosis of biopsy specimens. However, this procedure is invasive, time consuming, and prone to sampling error. Given these disadvantages, automatic tumor grading from widely used MRI protocols would be clinically important, as a way to expedite treatment planning and assessment of tumor evolution. In this paper, we propose to use Convolutional Neural Networks for predicting tumor grade directly from imaging data. In this way, we overcome the need for expert annotations of regions of interest. We evaluate two prediction approaches: from the whole brain, and from an automatically defined tumor region. Finally, we employ interpretability methodologies as a quality assurance stage to check if the method is using image regions indicative of tumor grade for classification. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09468v1 |
http://arxiv.org/pdf/1809.09468v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-brain-tumor-grading-from-mri-data |
Repo | https://github.com/sergiormpereira/brain_tumor_grading |
Framework | none |
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation
Title | GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation |
Authors | Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak Zhang |
Abstract | The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images. The main difficulty is how to generate image-specific sentences within the context of overall images. Here we propose a deep learning network model, GLAC Net, that generates visual stories by combining global-local (glocal) attention and context cascading mechanisms. The model incorporates two levels of attention, i.e., overall encoding level and image feature level, to construct image-dependent sentences. While standard attention configuration needs a large number of parameters, the GLAC Net implements them in a very simple way via hard connections from the outputs of encoders or image features onto the sentence generators. The coherency of the generated story is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially. We evaluate the performance of the GLAC Net on the visual storytelling dataset (VIST) and achieve very competitive results compared to the state-of-the-art techniques. Our code and pre-trained models are available here. |
Tasks | Visual Storytelling |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10973v3 |
http://arxiv.org/pdf/1805.10973v3.pdf | |
PWC | https://paperswithcode.com/paper/glac-net-glocal-attention-cascading-networks |
Repo | https://github.com/tkim-snu/GLACNet |
Framework | pytorch |
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Title | Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings |
Authors | Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee |
Abstract | Unsupervised discovery of acoustic tokens from audio corpora without annotation and learning vector representations for these tokens have been widely studied. Although these techniques have been shown successful in some applications such as query-by-example Spoken Term Detection (STD), the lack of mapping relationships between these discovered tokens and real phonemes have limited the down-stream applications. This paper represents probably the first attempt towards the goal of completely unsupervised phoneme recognition, or mapping audio signals to phoneme sequences without phoneme-labeled audio data. The basic idea is to cluster the embedded acoustic tokens and learn the mapping between the cluster sequences and the unknown phoneme sequences with a Generative Adversarial Network (GAN). An unsupervised phoneme recognition accuracy of 36% was achieved in the preliminary experiments. |
Tasks | |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.00316v1 |
http://arxiv.org/pdf/1804.00316v1.pdf | |
PWC | https://paperswithcode.com/paper/completely-unsupervised-phoneme-recognition |
Repo | https://github.com/thtang/aMMAI2018-paper-summary |
Framework | none |
Correlation Tracking via Joint Discrimination and Reliability Learning
Title | Correlation Tracking via Joint Discrimination and Reliability Learning |
Authors | Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang |
Abstract | For visual tracking, an ideal filter learned by the correlation filter (CF) method should take both discrimination and reliability information. However, existing attempts usually focus on the former one while pay less attention to reliability learning. This may make the learned filter be dominated by the unexpected salient regions on the feature map, thereby resulting in model degradation. To address this issue, we propose a novel CF-based optimization problem to jointly model the discrimination and reliability information. First, we treat the filter as the element-wise product of a base filter and a reliability term. The base filter is aimed to learn the discrimination information between the target and backgrounds, and the reliability term encourages the final filter to focus on more reliable regions. Second, we introduce a local response consistency regular term to emphasize equal contributions of different regions and avoid the tracker being dominated by unreliable regions. The proposed optimization problem can be solved using the alternating direction method and speeded up in the Fourier domain. We conduct extensive experiments on the OTB-2013, OTB-2015 and VOT-2016 datasets to evaluate the proposed tracker. Experimental results show that our tracker performs favorably against other state-of-the-art trackers. |
Tasks | Visual Tracking |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.08965v1 |
http://arxiv.org/pdf/1804.08965v1.pdf | |
PWC | https://paperswithcode.com/paper/correlation-tracking-via-joint-discrimination |
Repo | https://github.com/cswaynecool/DRT |
Framework | none |
Stochastic Gradient Descent on Highly-Parallel Architectures
Title | Stochastic Gradient Descent on Highly-Parallel Architectures |
Authors | Yujing Ma, Florin Rusu, Martin Torres |
Abstract | There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in two flavors—as multi-thread routines for multi-core CPUs and as highly-parallel kernels executed on GPU. Stochastic gradient descent (SGD) is the most popular optimization method for model training implemented extensively on modern data analytics platforms. While the data-intensive properties of SGD are well-known, there is an intense debate on which of the many SGD variants is better in practice. In this paper, we perform a comprehensive study of parallel SGD for training generalized linear models. We consider the impact of three factors – computing architecture (multi-core CPU or GPU), synchronous or asynchronous model updates, and data sparsity – on three measures—hardware efficiency, statistical efficiency, and time to convergence. In the process, we design an optimized asynchronous SGD algorithm for GPU that leverages warp shuffling and cache coalescing for data and model access. We draw several interesting findings from our extensive experiments with logistic regression (LR) and support vector machines (SVM) on five real datasets. For synchronous SGD, GPU always outperforms parallel CPU—they both outperform a sequential CPU solution by more than 400X. For asynchronous SGD, parallel CPU is the safest choice while GPU with data replication is better in certain situations. The choice between synchronous GPU and asynchronous CPU depends on the task and the characteristics of the data. As a reference, our best implementation outperforms TensorFlow and BIDMach consistently. We hope that our insights provide a useful guide for applying parallel SGD to generalized linear models. |
Tasks | |
Published | 2018-02-24 |
URL | http://arxiv.org/abs/1802.08800v1 |
http://arxiv.org/pdf/1802.08800v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-on-highly |
Repo | https://github.com/Shashank-Ojha/ParallelGradientDescent |
Framework | none |
Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics
Title | Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics |
Authors | Simon Bussy, Mokhtar Z. Alaya, Anne-Sophie Jannot, Agathe Guilloux |
Abstract | We introduce the binacox, a prognostic method to deal with the problem of detecting multiple cut-points per features in a multivariate setting where a large number of continuous features are available. The method is based on the Cox model and combines one-hot encoding with the binarsity penalty, which uses total-variation regularization together with an extra linear constraint, and enables feature selection. Original nonasymptotic oracle inequalities for prediction (in terms of Kullback-Leibler divergence) and estimation with a fast rate of convergence are established. The statistical performance of the method is examined in an extensive Monte Carlo simulation study, and then illustrated on three publicly available genetic cancer datasets. On these high-dimensional datasets, our proposed method significantly outperforms state-of-the-art survival models regarding risk prediction in terms of the C-index, with a computing time orders of magnitude faster. In addition, it provides powerful interpretability from a clinical perspective by automatically pinpointing significant cut-points in relevant variables. |
Tasks | Feature Selection, Survival Analysis |
Published | 2018-07-25 |
URL | https://arxiv.org/abs/1807.09813v4 |
https://arxiv.org/pdf/1807.09813v4.pdf | |
PWC | https://paperswithcode.com/paper/binacox-automatic-cut-points-detection-in |
Repo | https://github.com/SimonBussy/binacox |
Framework | none |
Market Self-Learning of Signals, Impact and Optimal Trading: Invisible Hand Inference with Free Energy
Title | Market Self-Learning of Signals, Impact and Optimal Trading: Invisible Hand Inference with Free Energy |
Authors | Igor Halperin, Ilya Feldshteyn |
Abstract | We present a simple model of a non-equilibrium self-organizing market where asset prices are partially driven by investment decisions of a bounded-rational agent. The agent acts in a stochastic market environment driven by various exogenous “alpha” signals, agent’s own actions (via market impact), and noise. Unlike traditional agent-based models, our agent aggregates all traders in the market, rather than being a representative agent. Therefore, it can be identified with a bounded-rational component of the market itself, providing a particular implementation of an Invisible Hand market mechanism. In such setting, market dynamics are modeled as a fictitious self-play of such bounded-rational market-agent in its adversarial stochastic environment. As rewards obtained by such self-playing market agent are not observed from market data, we formulate and solve a simple model of such market dynamics based on a neuroscience-inspired Bounded Rational Information Theoretic Inverse Reinforcement Learning (BRIT-IRL). This results in effective asset price dynamics with a non-linear mean reversion - which in our model is generated dynamically, rather than being postulated. We argue that our model can be used in a similar way to the Black-Litterman model. In particular, it represents, in a simple modeling framework, market views of common predictive signals, market impacts and implied optimal dynamic portfolio allocations, and can be used to assess values of private signals. Moreover, it allows one to quantify a “market-implied” optimal investment strategy, along with a measure of market rationality. Our approach is numerically light, and can be implemented using standard off-the-shelf software such as TensorFlow. |
Tasks | |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06126v1 |
http://arxiv.org/pdf/1805.06126v1.pdf | |
PWC | https://paperswithcode.com/paper/market-self-learning-of-signals-impact-and |
Repo | https://github.com/harshit0511/Inv-RL |
Framework | none |