January 31, 2020

3370 words 16 mins read

Paper Group AWR 408

Reasoning Visual Dialogs with Structural and Partial Observations. Multimodal Deep Learning for Finance: Integrating and Forecasting International Stock Markets. Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection. Learning Transformation Synchronization. miniSAM: A Flexible Factor Graph Non- …

Reasoning Visual Dialogs with Structural and Partial Observations


Title	Reasoning Visual Dialogs with Structural and Partial Observations
Authors	Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu
Abstract	We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
Tasks	Visual Dialog
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05548v2
PDF	https://arxiv.org/pdf/1904.05548v2.pdf
PWC	https://paperswithcode.com/paper/reasoning-visual-dialogs-with-structural-and
Repo	https://github.com/zilongzheng/visdial-gnn
Framework	pytorch

Multimodal Deep Learning for Finance: Integrating and Forecasting International Stock Markets


Title	Multimodal Deep Learning for Finance: Integrating and Forecasting International Stock Markets
Authors	Sang Il Lee, Seong Joon Yoo
Abstract	In today’s increasingly international economy, return and volatility spillover effects across international equity markets are major macroeconomic drivers of stock dynamics. Thus, information regarding foreign markets is one of the most important factors in forecasting domestic stock prices. However, the cross-correlation between domestic and foreign markets is highly complex. Hence, it is extremely difficult to explicitly express this cross-correlation with a dynamical equation. In this study, we develop stock return prediction models that can jointly consider international markets, using multimodal deep learning. Our contributions are three-fold: (1) we visualize the transfer information between South Korea and US stock markets by using scatter plots; (2) we incorporate the information into the stock prediction models with the help of multimodal deep learning; (3) we conclusively demonstrate that the early and intermediate fusion models achieve a significant performance boost in comparison with the late fusion and single modality models. Our study indicates that jointly considering international stock markets can improve the prediction accuracy and deep neural networks are highly effective for such tasks.
Tasks	Stock Prediction
Published	2019-03-15
URL	https://arxiv.org/abs/1903.06478v2
PDF	https://arxiv.org/pdf/1903.06478v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-deep-learning-for-finance
Repo	https://github.com/koos808/Papers_books_summary
Framework	none

Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection


Title	Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection
Authors	Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, Anton van den Hengel
Abstract	Deep autoencoder has been extensively used for anomaly detection. Training on the normal data, the autoencoder is expected to produce higher reconstruction error for the abnormal inputs than the normal ones, which is adopted as a criterion for identifying anomalies. However, this assumption does not always hold in practice. It has been observed that sometimes the autoencoder “generalizes” so well that it can also reconstruct anomalies well, leading to the miss detection of anomalies. To mitigate this drawback for autoencoder based anomaly detector, we propose to augment the autoencoder with a memory module and develop an improved autoencoder called memory-augmented autoencoder, i.e. MemAE. Given an input, MemAE firstly obtains the encoding from the encoder and then uses it as a query to retrieve the most relevant memory items for reconstruction. At the training stage, the memory contents are updated and are encouraged to represent the prototypical elements of the normal data. At the test stage, the learned memory will be fixed, and the reconstruction is obtained from a few selected memory records of the normal data. The reconstruction will thus tend to be close to a normal sample. Thus the reconstructed errors on anomalies will be strengthened for anomaly detection. MemAE is free of assumptions on the data type and thus general to be applied to different tasks. Experiments on various datasets prove the excellent generalization and high effectiveness of the proposed MemAE.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02639v2
PDF	https://arxiv.org/pdf/1904.02639v2.pdf
PWC	https://paperswithcode.com/paper/memorizing-normality-to-detect-anomaly-memory
Repo	https://github.com/h19920918/memae
Framework	pytorch

Learning Transformation Synchronization


Title	Learning Transformation Synchronization
Authors	Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas Guibas, Qixing Huang
Abstract	Reconstructing the 3D model of a physical object typically requires us to align the depth scans obtained from different camera poses into the same coordinate system. Solutions to this global alignment problem usually proceed in two steps. The first step estimates relative transformations between pairs of scans using an off-the-shelf technique. Due to limited information presented between pairs of scans, the resulting relative transformations are generally noisy. The second step then jointly optimizes the relative transformations among all input depth scans. A natural constraint used in this step is the cycle-consistency constraint, which allows us to prune incorrect relative transformations by detecting inconsistent cycles. The performance of such approaches, however, heavily relies on the quality of the input relative transformations. Instead of merely using the relative transformations as the input to perform transformation synchronization, we propose to use a neural network to learn the weights associated with each relative transformation. Our approach alternates between transformation synchronization using weighted relative transformations and predicting new weights of the input relative transformations using a neural network. We demonstrate the usefulness of this approach across a wide range of datasets.
Tasks
Published	2019-01-27
URL	https://arxiv.org/abs/1901.09458v2
PDF	https://arxiv.org/pdf/1901.09458v2.pdf
PWC	https://paperswithcode.com/paper/learning-transformation-synchronization
Repo	https://github.com/xiangruhuang/Learning2Sync
Framework	none

miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization Framework


Title	miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization Framework
Authors	Jing Dong, Zhaoyang Lv
Abstract	Many problems in computer vision and robotics can be phrased as non-linear least squares optimization problems represented by factor graphs, for example, simultaneous localization and mapping (SLAM), structure from motion (SfM), motion planning, and control. We have developed an open-source C++/Python framework miniSAM, for solving such factor graph based least squares problems. Compared to most existing frameworks for least squares solvers, miniSAM has (1) full Python/NumPy API, which enables more agile development and easy binding with existing Python projects, and (2) a wide list of sparse linear solvers, including CUDA enabled sparse linear solvers. Our benchmarking results shows miniSAM offers comparable performances on various types of problems, with more flexible and smoother development experience.
Tasks	Motion Planning, Simultaneous Localization and Mapping
Published	2019-09-03
URL	https://arxiv.org/abs/1909.00903v1
PDF	https://arxiv.org/pdf/1909.00903v1.pdf
PWC	https://paperswithcode.com/paper/minisam-a-flexible-factor-graph-non-linear
Repo	https://github.com/dongjing3309/minisam
Framework	none

Enforcing temporal consistency in Deep Learning segmentation of brain MR images


Title	Enforcing temporal consistency in Deep Learning segmentation of brain MR images
Authors	Malav Bateriwala, Pierrick Bourgeat
Abstract	Longitudinal analysis has great potential to reveal developmental trajectories and monitor disease progression in medical imaging. This process relies on consistent and robust joint 4D segmentation. Traditional techniques are dependent on the similarity of images over time and the use of subject-specific priors to reduce random variation and improve the robustness and sensitivity of the overall longitudinal analysis. This is however slow and computationally intensive as subject-specific templates need to be rebuilt every time. The focus of this work to accelerate this analysis with the use of deep learning. The proposed approach is based on deep CNNs and incorporates semantic segmentation and provides a longitudinal relationship for the same subject. The proposed approach is based on deep CNNs and incorporates semantic segmentation and provides a longitudinal relationship for the same subject. The state of art using 3D patches as inputs to modified Unet provides results around ${0.91 \pm 0.5}$ Dice and using multi-view atlas in CNNs provide around the same results. In this work, different models are explored, each offers better accuracy and fast results while increasing the segmentation quality. These methods are evaluated on 135 scans from the EADC-ADNI Harmonized Hippocampus Protocol. Proposed CNN based segmentation approaches demonstrate how 2D segmentation using prior slices can provide similar results to 3D segmentation while maintaining good continuity in the 3D dimension and improved speed. Just using 2D modified sagittal slices provide us a better Dice and longitudinal analysis for a given subject. For the ADNI dataset, using the simple UNet CNN technique gives us ${0.84 \pm 0.5}$ and while using modified CNN techniques on the same input yields ${0.89 \pm 0.5}$. Rate of atrophy and RMS error are calculated for several test cases using various methods and analyzed.
Tasks	3D Medical Imaging Segmentation, 4D Spatio Temporal Semantic Segmentation, Brain Image Segmentation, Semantic Segmentation
Published	2019-06-13
URL	https://arxiv.org/abs/1906.07160v1
PDF	https://arxiv.org/pdf/1906.07160v1.pdf
PWC	https://paperswithcode.com/paper/enforcing-temporal-consistency-in-deep
Repo	https://github.com/zhusiling/UNets
Framework	pytorch

Med3D: Transfer Learning for 3D Medical Image Analysis


Title	Med3D: Transfer Learning for 3D Medical Image Analysis
Authors	Sihong Chen, Kai Ma, Yefeng Zheng
Abstract	The performance on deep learning is significantly affected by volume of training data. Models pre-trained from massive dataset such as ImageNet become a powerful weapon for speeding up training convergence and improving accuracy. Similarly, models based on large dataset are important for the development of deep learning in 3D medical images. However, it is extremely challenging to build a sufficiently large dataset due to difficulty of data acquisition and annotation in 3D medical imaging. We aggregate the dataset from several medical challenges to build 3DSeg-8 dataset with diverse modalities, target organs, and pathologies. To extract general medical three-dimension (3D) features, we design a heterogeneous 3D network called Med3D to co-train multi-domain 3DSeg-8 so as to make a series of pre-trained models. We transfer Med3D pre-trained models to lung segmentation in LIDC dataset, pulmonary nodule classification in LIDC dataset and liver segmentation on LiTS challenge. Experiments show that the Med3D can accelerate the training convergence speed of target 3D medical tasks 2 times compared with model pre-trained on Kinetics dataset, and 10 times compared with training from scratch as well as improve accuracy ranging from 3% to 20%. Transferring our Med3D model on state-the-of-art DenseASPP segmentation network, in case of single model, we achieve 94.6% Dice coefficient which approaches the result of top-ranged algorithms on the LiTS challenge.
Tasks	3D Medical Imaging Segmentation, Liver Segmentation, Transfer Learning
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00625v4
PDF	https://arxiv.org/pdf/1904.00625v4.pdf
PWC	https://paperswithcode.com/paper/med3d-transfer-learning-for-3d-medical-image
Repo	https://github.com/Tencent/MedicalNet
Framework	pytorch

Clustering in graphs and hypergraphs with categorical edge labels


Title	Clustering in graphs and hypergraphs with categorical edge labels
Authors	Ilya Amburg, Nate Veldt, Austin R. Benson
Abstract	Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.
Tasks	Community Detection
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09943v2
PDF	https://arxiv.org/pdf/1910.09943v2.pdf
PWC	https://paperswithcode.com/paper/hypergraph-clustering-with-categorical-edge
Repo	https://github.com/nveldt/CategoricalEdgeClustering
Framework	none

Unsupervised Co-Learning on $\mathcal{G}$-Manifolds Across Irreducible Representations


Title	Unsupervised Co-Learning on $\mathcal{G}$-Manifolds Across Irreducible Representations
Authors	Yifeng Fan, Tingran Gao, Zhizhen Zhao
Abstract	We introduce a novel co-learning paradigm for manifolds naturally equipped with a group action, motivated by recent developments on learning a manifold from attached fibre bundle structures. We utilize a representation theoretic mechanism that canonically associates multiple independent vector bundles over a common base manifold, which provides multiple views for the geometry of the underlying manifold. The consistency across these fibre bundles provide a common base for performing unsupervised manifold co-learning through the redundancy created artificially across irreducible representations of the transformation group. We demonstrate the efficacy of the proposed algorithmic paradigm through drastically improved robust nearest neighbor search and community detection on rotation-invariant cryo-electron microscopy image analysis.
Tasks	Community Detection
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02707v3
PDF	https://arxiv.org/pdf/1906.02707v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-co-learning-on-mathcalg
Repo	https://github.com/frankfyf/G-manifold-learning
Framework	none

In Search of Credible News


Title	In Search of Credible News
Authors	Momchil Hardalov, Ivan Koychev, Preslav Nakov
Abstract	We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DBPedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08125v1
PDF	https://arxiv.org/pdf/1911.08125v1.pdf
PWC	https://paperswithcode.com/paper/in-search-of-credible-news
Repo	https://github.com/mhardalov/news-credibility
Framework	none

A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems


Title	A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems
Authors	Lars Ruthotto, Stanley Osher, Wuchen Li, Levon Nurbekyan, Samy Wu Fung
Abstract	Mean field games (MFG) and mean field control (MFC) are critical classes of multi-agent models for efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. State-of-the-art numerical methods for solving such problems utilize spatial discretization that leads to a curse-of-dimensionality. We approximately solve high-dimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton-Jacobi-Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100-dimensional instances of optimal transport and crowd motion problems on a standard work station and a validation using an Eulerian solver in two dimensions. These results open the door to much-anticipated applications of MFG and MFC models that were beyond reach with existing numerical methods.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01825v3
PDF	https://arxiv.org/pdf/1912.01825v3.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-framework-for-solving-high
Repo	https://github.com/EmoryMLIP/MFGnet.jl
Framework	none

On the Utility of Learning about Humans for Human-AI Coordination


Title	On the Utility of Learning about Humans for Human-AI Coordination
Authors	Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan
Abstract	While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human’s gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05789v2
PDF	https://arxiv.org/pdf/1910.05789v2.pdf
PWC	https://paperswithcode.com/paper/on-the-utility-of-learning-about-humans-for
Repo	https://github.com/HumanCompatibleAI/overcooked_ai
Framework	none

Individual common dolphin identification via metric embedding learning


Title	Individual common dolphin identification via metric embedding learning
Authors	Soren Bouma, Matthew D. M. Pawley, Krista Hupman, Andrew Gilman
Abstract	Photo-identification (photo-id) of dolphin individuals is a commonly used technique in ecological sciences to monitor state and health of individuals, as well as to study the social structure and distribution of a population. Traditional photo-id involves a laborious manual process of matching each dolphin fin photograph captured in the field to a catalogue of known individuals. We examine this problem in the context of open-set recognition and utilise a triplet loss function to learn a compact representation of fin images in a Euclidean embedding, where the Euclidean distance metric represents fin similarity. We show that this compact representation can be successfully learnt from a fairly small (in deep learning context) training set and still generalise well to out-of-sample identities (completely new dolphin individuals), with top-1 and top-5 test set (37 individuals) accuracy of $90.5\pm2$ and $93.6\pm1$ percent. In the presence of 1200 distractors, top-1 accuracy dropped by $12%$; however, top-5 accuracy saw only a $2.8%$ drop
Tasks	Open Set Learning
Published	2019-01-09
URL	http://arxiv.org/abs/1901.03662v1
PDF	http://arxiv.org/pdf/1901.03662v1.pdf
PWC	https://paperswithcode.com/paper/individual-common-dolphin-identification-via
Repo	https://github.com/omallo/kaggle-whale
Framework	pytorch

Accept Synthetic Objects as Real: End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in Clutter


Title	Accept Synthetic Objects as Real: End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in Clutter
Authors	Pooya Abolghasemi, Ladislau Bölöni
Abstract	Recent research demonstrated that it is feasible to end-to-end train multi-task deep visuomotor policies for robotic manipulation using variations of learning from demonstration (LfD) and reinforcement learning (RL). In this paper, we extend the capabilities of end-to-end LfD architectures to object manipulation in clutter. We start by introducing a data augmentation procedure called Accept Synthetic Objects as Real (ASOR). Using ASOR we develop two network architectures: implicit attention ASOR-IA and explicit attention ASOR-EA. Both architectures use the same training data (demonstrations in uncluttered environments) as previous approaches. Experimental results show that ASOR-IA and ASOR-EA succeed ina significant fraction of trials in cluttered environments where previous approaches never succeed. In addition, we find that both ASOR-IA and ASOR-EA outperform previous approaches even in uncluttered environments, with ASOR-EA performing better even in clutter compared to the previous best baseline in an uncluttered environment.
Tasks	Data Augmentation, Imitation Learning, Robotic Grasping
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11128v2
PDF	https://arxiv.org/pdf/1909.11128v2.pdf
PWC	https://paperswithcode.com/paper/accept-synthetic-objects-as-real-end-to-end
Repo	https://github.com/pouyaAB/Accept_Synthetic_Objects_as_Real
Framework	none

Adversarial Attacks on GMM i-vector based Speaker Verification Systems


Title	Adversarial Attacks on GMM i-vector based Speaker Verification Systems
Authors	Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng
Abstract	This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems. In detail, we formulate the GMM i-vector system as a scoring function of enrollment and testing utterance pairs. Then we leverage the fast gradient sign method (FGSM) to optimize testing utterances for adversarial samples generation. These adversarial samples are used to attack both GMM i-vector and x-vector systems. We measure the system vulnerability by the degradation of equal error rate and false acceptance rate. Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e.g. x-vector systems).
Tasks	Speaker Verification
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03078v2
PDF	https://arxiv.org/pdf/1911.03078v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-on-gmm-i-vector-based
Repo	https://github.com/lixucuhk/adversarial-attack-on-GMM-i-vector-based-speaker-verification-systems
Framework	pytorch