January 30, 2020

2939 words 14 mins read

Paper Group ANR 468

AffWild Net and Aff-Wild Database. Heuristic solutions to robust variants of the minimum-cost integer flow problem. Doubly Aligned Incomplete Multi-view Clustering. Realizing Continual Learning through Modeling a Learning System as a Fiber Bundle. Graph-Driven Generative Models for Heterogeneous Multi-Task Learning. An Edit-centric Approach for Wik …

AffWild Net and Aff-Wild Database


Title	AffWild Net and Aff-Wild Database
Authors	Alvertos Benroumpi, Dimitrios Kollias
Abstract	Emotions recognition is the task of recognizing people’s emotions. Usually it is achieved by analyzing expression of peoples faces. There are two ways for representing emotions: The categorical approach and the dimensional approach by using valence and arousal values. Valence shows how negative or positive an emotion is and arousal shows how much it is activated. Recent deep learning models, that have to do with emotions recognition, are using the second approach, valence and arousal. Moreover, a more interesting concept, which is useful in real life is the “in the wild” emotions recognition. “In the wild” means that the images analyzed for the recognition task, come from from real life sources(online videos, online photos, etc.) and not from staged experiments. So, they introduce unpredictable situations in the images, that have to be modeled. The purpose of this project is to study the previous work that was done for the “in the wild” emotions recognition concept, design a new dataset which has as a standard the “Aff-wild” database, implement new deep learning models and evaluate the results. First, already existing databases and deep learning models are presented. Then, inspired by them a new database is created which includes 507.208 frames in total from 106 videos, which were gathered from online sources. Then, the data are tested in a CNN model based on CNN-M architecture, in order to be sure about their usability. Next, the main model of this project is implemented. That is a Regression GAN which can execute unsupervised and supervised learning at the same time. More specifically, it keeps the main functionality of GANs, which is to produce fake images that look as good as the real ones, while it can also predict valence and arousal values for both real and fake images. Finally, the database created earlier is applied to this model and the results are presented and evaluated.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05376v2
PDF	https://arxiv.org/pdf/1910.05376v2.pdf
PWC	https://paperswithcode.com/paper/affwild-net-and-aff-wild-database
Repo
Framework

Heuristic solutions to robust variants of the minimum-cost integer flow problem


Title	Heuristic solutions to robust variants of the minimum-cost integer flow problem
Authors	Marko Špoljarec, Robert Manger
Abstract	This paper deals with robust optimization applied to network flows. Two robust variants of the minimum-cost integer flow problem are considered. Thereby, uncertainty in problem formulation is limited to arc unit costs and expressed by a finite set of explicitly given scenarios. It is shown that both problem variants are NP-hard. To solve the considered variants, several heuristics based on local search or evolutionary computing are proposed. The heuristics are experimentally evaluated on appropriate problem instances.
Tasks
Published	2019-07-21
URL	https://arxiv.org/abs/1907.09468v1
PDF	https://arxiv.org/pdf/1907.09468v1.pdf
PWC	https://paperswithcode.com/paper/heuristic-solutions-to-robust-variants-of-the
Repo
Framework

Doubly Aligned Incomplete Multi-view Clustering


Title	Doubly Aligned Incomplete Multi-view Clustering
Authors	Menglei Hu, Songcan Chen
Abstract	Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view Clustering algorithm (DAIMC) based on weighted semi-nonnegative matrix factorization (semi-NMF). Specifically, on the one hand, DAIMC utilizes the given instance alignment information to learn a common latent feature matrix for all the views. On the other hand, DAIMC establishes a consensus basis matrix with the help of $L_{2,1}$-Norm regularized regression for reducing the influence of missing instances. Consequently, compared with existing methods, besides inheriting the strength of semi-NMF with ability to handle negative entries, DAIMC has two unique advantages: 1) solving the incomplete view problem by introducing a respective weight matrix for each view, making it able to easily adapt to the case with more than two views; 2) reducing the influence of view incompleteness on clustering by enforcing the basis matrices of individual views being aligned with the help of regression. Experiments on four real-world datasets demonstrate its advantages.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02785v1
PDF	http://arxiv.org/pdf/1903.02785v1.pdf
PWC	https://paperswithcode.com/paper/doubly-aligned-incomplete-multi-view
Repo
Framework

Realizing Continual Learning through Modeling a Learning System as a Fiber Bundle


Title	Realizing Continual Learning through Modeling a Learning System as a Fiber Bundle
Authors	Zhenfeng Cao
Abstract	A human brain is capable of continual learning by nature; however the current mainstream deep neural networks suffer from a phenomenon named catastrophic forgetting (i.e., learning a new set of patterns suddenly and completely would result in fully forgetting what has already been learned). In this paper we propose a generic learning model, which regards a learning system as a fiber bundle. By comparing the learning performance of our model with conventional ones whose neural networks are multilayer perceptrons through a variety of machine-learning experiments, we found our proposed model not only enjoys a distinguished capability of continual learning but also bears a high information capacity. In addition, we found in some learning scenarios the learning performance can be further enhanced by making the learning time-aware to mimic the episodic memory in human brain. Last but not least, we found that the properties of forgetting in our model correspond well to those of human memory. This work may shed light on how a human brain learns.
Tasks	Continual Learning
Published	2019-02-16
URL	http://arxiv.org/abs/1903.03511v1
PDF	http://arxiv.org/pdf/1903.03511v1.pdf
PWC	https://paperswithcode.com/paper/realizing-continual-learning-through-modeling
Repo
Framework

Graph-Driven Generative Models for Heterogeneous Multi-Task Learning


Title	Graph-Driven Generative Models for Heterogeneous Multi-Task Learning
Authors	Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang, Lawrence Carin
Abstract	We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework. The proposed model is based on the fact that heterogeneous learning tasks, which correspond to different generative processes, often rely on data with a shared graph structure. Accordingly, our model combines a graph convolutional network (GCN) with multiple variational autoencoders, thus embedding the nodes of the graph i.e., samples for the tasks) in a uniform manner while specializing their organization and usage to different tasks. With a focus on healthcare applications (tasks), including clinical topic modeling, procedure recommendation and admission-type prediction, we demonstrate that our method successfully leverages information across different tasks, boosting performance in all tasks and outperforming existing state-of-the-art approaches.
Tasks	Multi-Task Learning
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08709v1
PDF	https://arxiv.org/pdf/1911.08709v1.pdf
PWC	https://paperswithcode.com/paper/graph-driven-generative-models-for
Repo
Framework

An Edit-centric Approach for Wikipedia Article Quality Assessment


Title	An Edit-centric Approach for Wikipedia Article Quality Assessment
Authors	Edison Marrese-Taylor, Pablo Loyola, Yutaka Matsuo
Abstract	We propose an edit-centric approach to assess Wikipedia article quality as a complementary alternative to current full document-based techniques. Our model consists of a main classifier equipped with an auxiliary generative module which, for a given edit, jointly provides an estimation of its quality and generates a description in natural language. We performed an empirical study to assess the feasibility of the proposed model and its cost-effectiveness in terms of data and quality requirements.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08880v1
PDF	https://arxiv.org/pdf/1909.08880v1.pdf
PWC	https://paperswithcode.com/paper/an-edit-centric-approach-for-wikipedia
Repo
Framework

A fully 3D multi-path convolutional neural network with feature fusion and feature weighting for automatic lesion identification in brain MRI images


Title	A fully 3D multi-path convolutional neural network with feature fusion and feature weighting for automatic lesion identification in brain MRI images
Authors	Yunzhe Xue, Meiyan Xie, Fadi G. Farhat, Olga Boukrina, A. M. Barrett, Jeffrey R. Binder, Usman W. Roshan, William W. Graves
Abstract	We propose a fully 3D multi-path convolutional network to predict stroke lesions from 3D brain MRI images. Our multi-path model has independent encoders for different modalities containing residual convolutional blocks, weighted multi-path feature fusion from different modalities, and weighted fusion modules to combine encoder and decoder features. Compared to existing 3D CNNs like DeepMedic, 3D U-Net, and AnatomyNet, our networks achieves the highest statistically significant cross-validation accuracy of 60.5% on the large ATLAS benchmark of 220 patients. We also test our model on multi-modal images from the Kessler Foundation and Medical College Wisconsin and achieve a statistically significant cross-validation accuracy of 65%, significantly outperforming the multi-modal 3D U-Net and DeepMedic. Overall our model offers a principled, extensible multi-path approach that outperforms multi-channel alternatives and achieves high Dice accuracies on existing benchmarks.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07807v2
PDF	https://arxiv.org/pdf/1907.07807v2.pdf
PWC	https://paperswithcode.com/paper/a-fully-3d-multi-path-convolutional-neural
Repo
Framework

Incremental Reading for Question Answering


Title	Incremental Reading for Question Answering
Authors	Samira Abnar, Tania Bedrax-weiss, Tom Kwiatkowski, William W. Cohen
Abstract	Any system which performs goal-directed continual learning must not only learn incrementally but process and absorb information incrementally. Such a system also has to understand when its goals have been achieved. In this paper, we consider these issues in the context of question answering. Current state-of-the-art question answering models reason over an entire passage, not incrementally. As we will show, naive approaches to incremental reading, such as restriction to unidirectional language models in the model, perform poorly. We present extensions to the DocQA [2] model to allow incremental reading without loss of accuracy. The model also jointly learns to provide the best answer given the text that is seen so far and predict whether this best-so-far answer is sufficient.
Tasks	Continual Learning, Question Answering
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04936v1
PDF	http://arxiv.org/pdf/1901.04936v1.pdf
PWC	https://paperswithcode.com/paper/incremental-reading-for-question-answering
Repo
Framework

Learning sparse linear dynamic networks in a hyper-parameter free setting


Title	Learning sparse linear dynamic networks in a hyper-parameter free setting
Authors	Arun Venkitaraman, Håkan Hjalmarsson, Bo Wahlberg
Abstract	We address the issue of estimating the topology and dynamics of sparse linear dynamic networks in a hyperparameter-free setting. We propose a method to estimate the network dynamics in a computationally efficient and parameter tuning-free iterative framework known as SPICE (Sparse Iterative Covariance Estimation). The estimated dynamics directly reveal the underlying topology. Our approach does not assume that the network is undirected and is applicable even with varying noise levels across the modules of the network. We also do not assume any explicit prior knowledge on the network dynamics. Numerical experiments with realistic dynamic networks illustrate the usefulness of our method.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11553v1
PDF	https://arxiv.org/pdf/1911.11553v1.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-linear-dynamic-networks-in-a
Repo
Framework

4K-Memristor Analog-Grade Passive Crossbar Circuit


Title	4K-Memristor Analog-Grade Passive Crossbar Circuit
Authors	Hyungjin Kim, Hussein Nili, Mahmood Mahmoodi, Dmitri Strukov
Abstract	The superior density of passive analog-grade memristive crossbars may enable storing large synaptic weight matrices directly on specialized neuromorphic chips, thus avoiding costly off-chip communication. To ensure efficient use of such crossbars in neuromorphic computing circuits, variations of current-voltage characteristics of crosspoint devices must be substantially lower than those of memory cells with select transistors. Apparently, this requirement explains why there were so few demonstrations of neuromorphic system prototypes using passive crossbars. Here we report a 64x64 passive metal-oxide memristor crossbar circuit with ~99% device yield, based on a foundry-compatible fabrication process featuring etch-down patterning and low-temperature budget, conducive to vertical integration. The achieved ~26% variations of switching voltages of our devices were sufficient for programming 4K-pixel gray-scale patterns with an average tuning error smaller than 4%. The analog properties were further verified by experimentally demonstrating MNIST pattern classification with a fidelity close to the software-modeled limit for a network of this size, with an ~1% average error of import of ex-situ-calculated synaptic weights. We believe that our work is a significant improvement over the state-of-the-art passive crossbar memories in both complexity and analog properties.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1906.12045v1
PDF	https://arxiv.org/pdf/1906.12045v1.pdf
PWC	https://paperswithcode.com/paper/4k-memristor-analog-grade-passive-crossbar
Repo
Framework

Lifelong and Interactive Learning of Factual Knowledge in Dialogues


Title	Lifelong and Interactive Learning of Factual Knowledge in Dialogues
Authors	Sahisnu Mazumder, Bing Liu, Shuai Wang, Nianzu Ma
Abstract	Dialogue systems are increasingly using knowledge bases (KBs) storing real-world facts to help generate quality responses. However, as the KBs are inherently incomplete and remain fixed during conversation, it limits dialogue systems’ ability to answer questions and to handle questions involving entities or relations that are not in the KB. In this paper, we make an attempt to propose an engine for Continuous and Interactive Learning of Knowledge (CILK) for dialogue systems to give them the ability to continuously and interactively learn and infer new knowledge during conversations. With more knowledge accumulated over time, they will be able to learn better and answer more questions. Our empirical evaluation shows that CILK is promising.
Tasks
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13295v2
PDF	https://arxiv.org/pdf/1907.13295v2.pdf
PWC	https://paperswithcode.com/paper/lifelong-and-interactive-learning-of-factual
Repo
Framework

Neural Inverse Rendering of an Indoor Scene from a Single Image


Title	Neural Inverse Rendering of an Indoor Scene from a Single Image
Authors	Soumyadip Sengupta, Jinwei Gu, Kihwan Kim, Guilin Liu, David W. Jacobs, Jan Kautz
Abstract	Inverse rendering aims to estimate physical attributes of a scene, e.g., reflectance, geometry, and lighting, from image(s). Inverse rendering has been studied primarily for single objects or with methods that solve for only one of the scene attributes. We propose the first learning-based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image. Our key contribution is the Residual Appearance Renderer (RAR), which can be trained to synthesize complex appearance effects (e.g., inter-reflection, cast shadows, near-field illumination, and realistic shading), which would be neglected otherwise. This enables us to perform self-supervised learning on real data using a reconstruction loss, based on re-synthesizing the input image from the estimated components. We finetune with real data after pretraining with synthetic data. To this end, we use physically-based rendering to create a large-scale synthetic dataset, which is a significant improvement over prior datasets. Experimental results show that our approach outperforms state-of-the-art methods that estimate one or more scene attributes.
Tasks
Published	2019-01-08
URL	https://arxiv.org/abs/1901.02453v3
PDF	https://arxiv.org/pdf/1901.02453v3.pdf
PWC	https://paperswithcode.com/paper/neural-inverse-rendering-of-an-indoor-scene
Repo
Framework

TracKlinic: Diagnosis of Challenge Factors in Visual Tracking


Title	TracKlinic: Diagnosis of Challenge Factors in Visual Tracking
Authors	Heng Fan, Fan Yang, Peng Chu, Lin Yuan, Haibin Ling
Abstract	Generic visual tracking is difficult due to many challenge factors (e.g., occlusion, blur, etc.). Each of these factors may cause serious problems for a tracking algorithm, and when they work together can make things even more complicated. Despite a great amount of efforts devoted to understanding the behavior of tracking algorithms, reliable and quantifiable ways for studying the per factor tracking behavior remain barely available. Addressing this issue, in this paper we contribute to the community a tracking diagnosis toolkit, TracKlinic, for diagnosis of challenge factors of tracking algorithms. TracKlinic consists of two novel components focusing on the data and analysis aspects, respectively. For the data component, we carefully prepare a set of 2,390 annotated videos, each involving one and only one major challenge factor. When analyzing an algorithm for a specific challenge factor, such one-factor-per-sequence rule greatly inhibits the disturbance from other factors and consequently leads to more faithful analysis. For the analysis component, given the tracking results on all sequences, it investigates the behavior of the tracker under each individual factor and generates the report automatically. With TracKlinic, a thorough study is conducted on ten state-of-the-art trackers on nine challenge factors (including two compound ones). The results suggest that, heavy shape variation and occlusion are the two most challenging factors faced by most trackers. Besides, out-of-view, though does not happen frequently, is often fatal. By sharing TracKlinic, we expect to make it much easier for diagnosing tracking algorithms, and to thus facilitate developing better ones.
Tasks	Visual Tracking
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07959v2
PDF	https://arxiv.org/pdf/1911.07959v2.pdf
PWC	https://paperswithcode.com/paper/tracklinic-diagnosis-of-challenge-factors-in
Repo
Framework

Pseudolikelihood Reranking with Masked Language Models


Title	Pseudolikelihood Reranking with Masked Language Models
Authors	Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff
Abstract	We rerank with scores from pretrained masked language models like BERT to improve ASR and NMT performance. These log-pseudolikelihood scores (LPLs) can outperform large, autoregressive language models (GPT-2) in out-of-the-box scoring. RoBERTa reduces WER by up to 30% relative on an end-to-end LibriSpeech system and adds up to +1.7 BLEU on state-of-the-art baselines for TED Talks low-resource pairs, with further gains from domain adaptation. In the multilingual setting, a single XLM can be used to rerank translation outputs in multiple languages. The numerical and qualitative properties of LPL scores suggest that LPLs capture sentence fluency better than autoregressive scores. Finally, we finetune BERT to estimate sentence LPLs without masking, enabling scoring in a single, non-recurrent inference pass.
Tasks	Domain Adaptation
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14659v1
PDF	https://arxiv.org/pdf/1910.14659v1.pdf
PWC	https://paperswithcode.com/paper/pseudolikelihood-reranking-with-masked
Repo
Framework

SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking


Title	SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking
Authors	Dongyan Guo, Jun Wang, Ying Cui, Zhenhua Wang, Shengyong Chen
Abstract	By decomposing the visual tracking task into two subproblems as classification for pixel category and regression for object bounding box at this pixel, we propose a novel fully convolutional Siamese network to solve visual tracking end-to-end in a per-pixel manner. The proposed framework SiamCAR consists of two simple subnetworks: one Siamese subnetwork for feature extraction and one classification-regression subnetwork for bounding box prediction. Our framework takes ResNet-50 as backbone. Different from state-of-the-art trackers like Siamese-RPN, SiamRPN++ and SPM, which are based on region proposal, the proposed framework is both proposal and anchor free. Consequently, we are able to avoid the tricky hyper-parameter tuning of anchors and reduce human intervention. The proposed framework is simple, neat and effective. Extensive experiments and comparisons with state-of-the-art trackers are conducted on many challenging benchmarks like GOT-10K, LaSOT, UAV123 and OTB-50. Without bells and whistles, our SiamCAR achieves the leading performance with a considerable real-time speed.
Tasks	Visual Tracking
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07241v2
PDF	https://arxiv.org/pdf/1911.07241v2.pdf
PWC	https://paperswithcode.com/paper/siamcar-siamese-fully-convolutional
Repo
Framework