October 18, 2019

2786 words 14 mins read

Paper Group ANR 543

MMFNet: A Multi-modality MRI Fusion Network for Segmentation of Nasopharyngeal Carcinoma. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks. Unary and Binary Classification Approaches and their Implications for Authorship Verification. Natural Language Multitasking: Analyzing and Improvi …

MMFNet: A Multi-modality MRI Fusion Network for Segmentation of Nasopharyngeal Carcinoma


Title	MMFNet: A Multi-modality MRI Fusion Network for Segmentation of Nasopharyngeal Carcinoma
Authors	Huai Chen, Yuxiao Qi, Yong Yin, Tengxiang Li, Xiaoqing Liu, Xiuli Li, Guanzhong Gong, Lisheng Wang
Abstract	Segmentation of nasopharyngeal carcinoma (NPC) from Magnetic Resonance Images (MRI) is a crucial prerequisite for NPC radiotherapy. However, manually segmenting of NPC is time-consuming and labor-intensive. Additionally, single-modality MRI generally cannot provide enough information for its accurate delineation. Therefore, a multi-modality MRI fusion network (MMFNet) based on three modalities of MRI (T1, T2 and contrast-enhanced T1) is proposed to complete accurate segmentation of NPC. The backbone of MMFNet is designed as a multi-encoder-based network, consisting of several encoders to capture modality-specific features and one single decoder to fuse them and obtain high-level features for NPC segmentation. A fusion block is presented to effectively fuse features from multi-modality MRI. It firstly recalibrates low-level features captured from modality-specific encoders to highlight both informative features and regions of interest, then fuses weighted features by a residual fusion block to keep balance between fused ones and high-level features from decoder. Moreover, a training strategy named self-transfer, which utilizes pre-trained modality-specific encoders to initialize multi-encoder-based network, is proposed to make full mining of information from different modalities of MRI. The proposed method based on multi-modality MRI can effectively segment NPC and its advantages are validated by extensive experiments.
Tasks
Published	2018-12-25
URL	https://arxiv.org/abs/1812.10033v6
PDF	https://arxiv.org/pdf/1812.10033v6.pdf
PWC	https://paperswithcode.com/paper/mmfnet-a-multi-modality-mri-fusion-network
Repo
Framework

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks


Title	Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Authors	Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg
Abstract	Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. Results for simulated and real robot experiments are presented.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10191v2
PDF	http://arxiv.org/pdf/1810.10191v2.pdf
PWC	https://paperswithcode.com/paper/making-sense-of-vision-and-touch-self
Repo
Framework

Unary and Binary Classification Approaches and their Implications for Authorship Verification


Title	Unary and Binary Classification Approaches and their Implications for Authorship Verification
Authors	Oren Halvani, Christian Winter, Lukas Graner
Abstract	Retrieving indexed documents, not by their topical content but their writing style opens the door for a number of applications in information retrieval (IR). One application is to retrieve textual content of a certain author X, where the queried IR system is provided beforehand with a set of reference texts of X. Authorship verification (AV), which is a research subject in the field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two documents (i.e. an indexed and a reference document) have been written by the same author X. Even though AV represents a unary classification problem, a number of existing approaches consider it as a binary classification task. However, the underlying classification model of an AV method has a number of serious implications regarding its prerequisites, evaluability, and applicability. In our comprehensive literature review, we observed several misunderstandings regarding the differentiation of unary and binary AV approaches that require consideration. The objective of this paper is, therefore, to clarify these by proposing clear criteria and new properties that aim to improve the characterization of existing and future AV approaches. Given both, we investigate the applicability of eleven existing unary and binary AV methods as well as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we highlight an important issue concerning the evaluation of AV methods based on fixed decision criterions, which has not been paid attention in previous AV studies.
Tasks	Information Retrieval
Published	2018-12-31
URL	http://arxiv.org/abs/1901.00399v1
PDF	http://arxiv.org/pdf/1901.00399v1.pdf
PWC	https://paperswithcode.com/paper/unary-and-binary-classification-approaches
Repo
Framework

Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations


Title	Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations
Authors	Gino Brunner, Yuyi Wang, Roger Wattenhofer, Michael Weigelt
Abstract	We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.
Tasks
Published	2018-01-18
URL	http://arxiv.org/abs/1801.06024v1
PDF	http://arxiv.org/pdf/1801.06024v1.pdf
PWC	https://paperswithcode.com/paper/natural-language-multitasking-analyzing-and
Repo
Framework

How You See Me


Title	How You See Me
Authors	Rohit Gandikota, Deepak Mishra
Abstract	Convolution Neural Networks is one of the most powerful tools in the present era of science. There has been a lot of research done to improve their performance and robustness while their internal working was left unexplored to much extent. They are often defined as black boxes that can map non-linear data very effectively. This paper tries to show how CNN has learned to look at an image. The proposed algorithm exploits the basic math of CNN to backtrack the important pixels it is considering to predict. This is a simple algorithm which does not involve any training of its own over a pre-trained CNN which can classify.
Tasks
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08152v1
PDF	http://arxiv.org/pdf/1811.08152v1.pdf
PWC	https://paperswithcode.com/paper/how-you-see-me
Repo
Framework

Escaping Saddle Points in Constrained Optimization


Title	Escaping Saddle Points in Constrained Optimization
Authors	Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie
Abstract	In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $\rho$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $\rho<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(\epsilon,\gamma)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max{\epsilon^{-2},\rho^{-3}\gamma^{-3}})$ iterations. We further characterize the overall complexity of reaching an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints and the objective function Hessian has a specific structure over the convex set $\mathcal{C}$. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(\epsilon,\gamma)$-SOSP.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02162v2
PDF	http://arxiv.org/pdf/1809.02162v2.pdf
PWC	https://paperswithcode.com/paper/escaping-saddle-points-in-constrained
Repo
Framework

Pyramid Person Matching Network for Person Re-identification


Title	Pyramid Person Matching Network for Person Re-identification
Authors	Chaojie Mao, Yingming Li, Zhongfei Zhang, Yaqing Zhang, Xi Li
Abstract	In this work, we present a deep convolutional pyramid person matching network (PPMN) with specially designed Pyramid Matching Module to address the problem of person re-identification. The architecture takes a pair of RGB images as input, and outputs a similiarity value indicating whether the two input images represent the same person or not. Based on deep convolutional neural networks, our approach first learns the discriminative semantic representation with the semantic-component-aware features for persons and then employs the Pyramid Matching Module to match the common semantic-components of persons, which is robust to the variation of spatial scales and misalignment of locations posed by viewpoint changes. The above two processes are jointly optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art approaches, especially on the rank-1 recognition rate.
Tasks	Person Re-Identification
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02547v1
PDF	http://arxiv.org/pdf/1803.02547v1.pdf
PWC	https://paperswithcode.com/paper/pyramid-person-matching-network-for-person-re
Repo
Framework

Robust Maximization of Non-Submodular Objectives


Title	Robust Maximization of Non-Submodular Objectives
Authors	Ilija Bogunovic, Junyao Zhao, Volkan Cevher
Abstract	We study the problem of maximizing a monotone set function subject to a cardinality constraint $k$ in the setting where some number of elements $\tau$ is deleted from the returned set. The focus of this work is on the worst-case adversarial setting. While there exist constant-factor guarantees when the function is submodular, there are no guarantees for non-submodular objectives. In this work, we present a new algorithm Oblivious-Greedy and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives. The obtained theoretical bounds are the first constant-factor bounds that also hold in the linear regime, i.e. when the number of deletions $\tau$ is linear in $k$. Our bounds depend on established parameters such as the submodularity ratio and some novel ones such as the inverse curvature. We bound these parameters for two important objectives including support selection and variance reduction. Finally, we numerically demonstrate the robust performance of Oblivious-Greedy for these two objectives on various datasets.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07073v2
PDF	http://arxiv.org/pdf/1802.07073v2.pdf
PWC	https://paperswithcode.com/paper/robust-maximization-of-non-submodular
Repo
Framework

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification


Title	Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification
Authors	Zilong Zhong, Jonathan Li
Abstract	High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2018-02-10
URL	http://arxiv.org/abs/1802.03495v1
PDF	http://arxiv.org/pdf/1802.03495v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-and-1
Repo
Framework

Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context


Title	Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context
Authors	Lucy Lu Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, Waleed Ammar
Abstract	Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need to de-duplicate entities when merging ontologies. We propose a method for enriching entities in an ontology with external definition and context information, and use this additional information for ontology alignment. We develop a neural architecture capable of encoding the additional information when available, and show that the addition of external data results in an F1-score of 0.69 on the Ontology Alignment Evaluation Initiative (OAEI) largebio SNOMED-NCI subtask, comparable with the entity-level matchers in a SOTA system.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07976v1
PDF	http://arxiv.org/pdf/1806.07976v1.pdf
PWC	https://paperswithcode.com/paper/ontology-alignment-in-the-biomedical-domain
Repo
Framework

Multiagent Soft Q-Learning


Title	Multiagent Soft Q-Learning
Authors	Ermo Wei, Drew Wicke, David Freelan, Sean Luke
Abstract	Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
Tasks	Policy Gradient Methods, Q-Learning
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09817v1
PDF	http://arxiv.org/pdf/1804.09817v1.pdf
PWC	https://paperswithcode.com/paper/multiagent-soft-q-learning
Repo
Framework

Zipf’s law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation


Title	Zipf’s law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation
Authors	Shuiyuan Yu, Chunshan Xu, Haitao Liu
Abstract	Zipf’s law has been found in many human-related fields, including language, where the frequency of a word is persistently found as a power law function of its frequency rank, known as Zipf’s law. However, there is much dispute whether it is a universal law or a statistical artifact, and little is known about what mechanisms may have shaped it. To answer these questions, this study conducted a large scale cross language investigation into Zipf’s law. The statistical results show that Zipf’s laws in 50 languages all share a 3-segment structural pattern, with each segment demonstrating distinctive linguistic properties and the lower segment invariably bending downwards to deviate from theoretical expectation. This finding indicates that this deviation is a fundamental and universal feature of word frequency distributions in natural languages, not the statistical error of low frequency words. A computer simulation based on the dual-process theory yields Zipf’s law with the same structural pattern, suggesting that Zipf’s law of natural languages are motivated by common cognitive mechanisms. These results show that Zipf’s law in languages is motivated by cognitive mechanisms like dual-processing that govern human verbal behaviors.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01855v1
PDF	http://arxiv.org/pdf/1807.01855v1.pdf
PWC	https://paperswithcode.com/paper/zipfs-law-in-50-languages-its-structural
Repo
Framework

Semi-Semantic Line-Cluster Assisted Monocular SLAM for Indoor Environments


Title	Semi-Semantic Line-Cluster Assisted Monocular SLAM for Indoor Environments
Authors	Ting Sun, Dezhen Song, Dit-Yan Yeung, Ming Liu
Abstract	This paper presents a novel method to reduce the scale drift for indoor monocular simultaneous localization and mapping (SLAM). We leverage the prior knowledge that in the indoor environment, the line segments form tight clusters, e.g. many door frames in a straight corridor are of the same shape, size and orientation, so the same edges of these door frames form a tight line segment cluster. We implement our method in the popular ORB-SLAM2, which also serves as our baseline. In the front end we detect the line segments in each frame and incrementally cluster them in the 3D space. In the back end, we optimize the map imposing the constraint that the line segments of the same cluster should be the same. Experimental results show that our proposed method successfully reduces the scale drift for indoor monocular SLAM.
Tasks	Simultaneous Localization and Mapping
Published	2018-11-05
URL	http://arxiv.org/abs/1811.01592v1
PDF	http://arxiv.org/pdf/1811.01592v1.pdf
PWC	https://paperswithcode.com/paper/semi-semantic-line-cluster-assisted-monocular
Repo
Framework

Action Anticipation By Predicting Future Dynamic Images


Title	Action Anticipation By Predicting Future Dynamic Images
Authors	Cristian Rodriguez, Basura Fernando, Hongdong Li
Abstract	Human action-anticipation methods predict what is the future action by observing only a few portion of an action in progress. This is critical for applications where computers have to react to human actions as early as possible such as autonomous driving, human-robotic interaction, assistive robotics among others. In this paper, we present a method for human action anticipation by predicting the most plausible future human motion. We represent human motion using Dynamic Images and make use of tailored loss functions to encourage a generative model to produce accurate future motion prediction. Our method outperforms the currently best performing action-anticipation methods by 4% on JHMDB-21, 5.2% on UT-Interaction and 5.1% on UCF 101-24 benchmarks.
Tasks	Autonomous Driving, motion prediction
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00141v1
PDF	http://arxiv.org/pdf/1808.00141v1.pdf
PWC	https://paperswithcode.com/paper/action-anticipation-by-predicting-future
Repo
Framework

Simultaneous Localization and Layout Model Selection in Manhattan Worlds


Title	Simultaneous Localization and Layout Model Selection in Manhattan Worlds
Authors	Armon Shariati, Bernd Pfrommer, Camillo J. Taylor
Abstract	In this paper, we will demonstrate how Manhattan structure can be exploited to transform the Simultaneous Localization and Mapping (SLAM) problem, which is typically solved by a nonlinear optimization over feature positions, into a model selection problem solved by a convex optimization over higher order layout structures, namely walls, floors, and ceilings. Furthermore, we show how our novel formulation leads to an optimization procedure that automatically performs data association and loop closure and which ultimately produces the simplest model of the environment that is consistent with the available measurements. We verify our method on real world data sets collected with various sensing modalities.
Tasks	Model Selection, Simultaneous Localization and Mapping
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04135v3
PDF	http://arxiv.org/pdf/1809.04135v3.pdf
PWC	https://paperswithcode.com/paper/simultaneous-localization-and-layout-model
Repo
Framework