October 20, 2019

3119 words 15 mins read

Paper Group AWR 275

We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!. COSMO: Contextualized Scene Modeling with Boltzmann Machines. Deep Association Learning for Unsupervised Video Person Re-identification. Concurrent Learning of Semantic Relations. Learning Local RGB-to-CAD Correspondences for Object Pose Estimation. Can You Tell Me H …

We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!


Title	We Built a Fake News & Click-bait Filter: What Happened Next Will Blow Your Mind!
Authors	Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, Ivan Koychev
Abstract	It is completely amazing! Fake news and click-baits have totally invaded the cyber space. Let us face it: everybody hates them for three simple reasons. Reason #2 will absolutely amaze you. What these can achieve at the time of election will completely blow your mind! Now, we all agree, this cannot go on, you know, somebody has to stop it. So, we did this research on fake news/click-bait detection and trust us, it is totally great research, it really is! Make no mistake. This is the best research ever! Seriously, come have a look, we have it all: neural networks, attention mechanism, sentiment lexicons, author profiling, you name it. Lexical features, semantic features, we absolutely have it all. And we have totally tested it, trust us! We have results, and numbers, really big numbers. The best numbers ever! Oh, and analysis, absolutely top notch analysis. Interested? Come read the shocking truth about fake news and click-bait in the Bulgarian cyber space. You won’t believe what we have found!
Tasks
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03786v1
PDF	http://arxiv.org/pdf/1803.03786v1.pdf
PWC	https://paperswithcode.com/paper/we-built-a-fake-news-click-bait-filter-what
Repo	https://github.com/gkaradzhov/ClickbaitRANLP
Framework	none

COSMO: Contextualized Scene Modeling with Boltzmann Machines


Title	COSMO: Contextualized Scene Modeling with Boltzmann Machines
Authors	Ilker Bozcan, Sinan Kalkan
Abstract	Scene modeling is very crucial for robots that need to perceive, reason about and manipulate the objects in their environments. In this paper, we adapt and extend Boltzmann Machines (BMs) for contextualized scene modeling. Although there are many models on the subject, ours is the first to bring together objects, relations, and affordances in a highly-capable generative model. For this end, we introduce a hybrid version of BMs where relations and affordances are introduced with shared, tri-way connections into the model. Moreover, we contribute a dataset for relation estimation and modeling studies. We evaluate our method in comparison with several baselines on object estimation, out-of-context object detection, relation estimation, and affordance estimation tasks. Moreover, to illustrate the generative capability of the model, we show several example scenes that the model is able to generate.
Tasks	Object Detection
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00511v2
PDF	http://arxiv.org/pdf/1807.00511v2.pdf
PWC	https://paperswithcode.com/paper/cosmo-contextualized-scene-modeling-with
Repo	https://github.com/bozcani/COSMO
Framework	tf

Deep Association Learning for Unsupervised Video Person Re-identification


Title	Deep Association Learning for Unsupervised Video Person Re-identification
Authors	Yanbei Chen, Xiatian Zhu, Shaogang Gong
Abstract	Deep learning methods have started to dominate the research progress of video-based person re-identification (re-id). However, existing methods mostly consider supervised learning, which requires exhaustive manual efforts for labelling cross-view pairwise data. Therefore, they severely lack scalability and practicality in real-world video surveillance applications. In this work, to address the video person re-id task, we formulate a novel Deep Association Learning (DAL) scheme, the first end-to-end deep learning method using none of the identity labels in model initialisation and training. DAL learns a deep re-id matching model by jointly optimising two margin-based association losses in an end-to-end manner, which effectively constrains the association of each frame to the best-matched intra-camera representation and cross-camera representation. Existing standard CNNs can be readily employed within our DAL scheme. Experiment results demonstrate that our proposed DAL significantly outperforms current state-of-the-art unsupervised video person re-id methods on three benchmarks: PRID 2011, iLIDS-VID and MARS.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07301v1
PDF	http://arxiv.org/pdf/1808.07301v1.pdf
PWC	https://paperswithcode.com/paper/deep-association-learning-for-unsupervised
Repo	https://github.com/yanbeic/Deep-Association-Learning
Framework	tf

Concurrent Learning of Semantic Relations


Title	Concurrent Learning of Semantic Relations
Authors	Georgios Balikas, Gaël Dias, Rumen Moraliyski, Massih-Reza Amini
Abstract	Discovering whether words are semantically related and identifying the specific semantic relation that holds between them is of crucial importance for NLP as it is essential for tasks like query expansion in IR. Within this context, different methodologies have been proposed that either exclusively focus on a single lexical relation (e.g. hypernymy vs. random) or learn specific classifiers capable of identifying multiple semantic relations (e.g. hypernymy vs. synonymy vs. random). In this paper, we propose another way to look at the problem that relies on the multi-task learning paradigm. In particular, we want to study whether the learning process of a given semantic relation (e.g. hypernymy) can be improved by the concurrent learning of another semantic relation (e.g. co-hyponymy). Within this context, we particularly examine the benefits of semi-supervised learning where the training of a prediction function is performed over few labeled data jointly with many unlabeled ones. Preliminary results based on simple learning strategies and state-of-the-art distributional feature representations show that concurrent learning can lead to improvements in a vast majority of tested situations.
Tasks	Multi-Task Learning
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10076v3
PDF	http://arxiv.org/pdf/1807.10076v3.pdf
PWC	https://paperswithcode.com/paper/concurrent-learning-of-semantic-relations
Repo	https://github.com/Houssam93/MultiTask-Learning-NLP
Framework	none

Learning Local RGB-to-CAD Correspondences for Object Pose Estimation


Title	Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
Authors	Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jana Kosecka
Abstract	We consider the problem of 3D object pose estimation. While much recent work has focused on the RGB domain, the reliance on accurately annotated images limits their generalizability and scalability. On the other hand, the easily available CAD models of objects are rich sources of data, providing a large number of synthetically rendered images. In this paper, we solve this key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation. Our key innovations compared to existing work include removing the need for either real-world textures for CAD models or explicit 3D pose annotations for RGB images. We achieve this through a series of objectives that learn how to select keypoints and enforce viewpoint and modality invariance across RGB images and CAD model renderings. We conduct extensive experiments to demonstrate that the proposed method can reliably estimate object pose in RGB images, as well as generalize to object instances not seen during training.
Tasks	Pose Estimation
Published	2018-11-18
URL	https://arxiv.org/abs/1811.07249v4
PDF	https://arxiv.org/pdf/1811.07249v4.pdf
PWC	https://paperswithcode.com/paper/matching-rgb-images-to-cad-models-for-object
Repo	https://github.com/YoungXIAO13/PoseFromShape
Framework	pytorch

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling


Title	Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
Authors	Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman
Abstract	Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. However, our results are mixed across pretraining tasks and show some concerning trends: In ELMo’s pretrain-then-freeze paradigm, random baselines are worryingly strong and results vary strikingly across target tasks. In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research.
Tasks	Language Modelling, Transfer Learning
Published	2018-12-28
URL	https://arxiv.org/abs/1812.10860v5
PDF	https://arxiv.org/pdf/1812.10860v5.pdf
PWC	https://paperswithcode.com/paper/looking-for-elmos-friends-sentence-level
Repo	https://github.com/nyu-mll/jiant
Framework	pytorch

Enhanced Optic Disk and Cup Segmentation with Glaucoma Screening from Fundus Images using Position encoded CNNs


Title	Enhanced Optic Disk and Cup Segmentation with Glaucoma Screening from Fundus Images using Position encoded CNNs
Authors	Vismay Agrawal, Avinash Kori, Varghese Alex, Ganapathy Krishnamurthi
Abstract	In this manuscript, we present a robust method for glaucoma screening from fundus images using an ensemble of convolutional neural networks (CNNs). The pipeline comprises of first segmenting the optic disk and optic cup from the fundus image, then extracting a patch centered around the optic disk and subsequently feeding to the classification network to differentiate the image as diseased or healthy. In the segmentation network, apart from the image, we make use of spatial co-ordinate (X & Y) space so as to learn the structure of interest better. The classification network is composed of a DenseNet201 and a ResNet18 which were pre-trained on a large cohort of natural images. On the REFUGE validation data (n=400), the segmentation network achieved a dice score of 0.88 and 0.64 for optic disc and optic cup respectively. For the tasking differentiating images affected with glaucoma from healthy images, the area under the ROC curve was observed to be 0.85.
Tasks
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05216v1
PDF	http://arxiv.org/pdf/1809.05216v1.pdf
PWC	https://paperswithcode.com/paper/enhanced-optic-disk-and-cup-segmentation-with
Repo	https://github.com/koriavinash1/Optic-Disk-Cup-Segmentation
Framework	pytorch

DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning


Title	DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning
Authors	Ramon Gonzalez, Karl Iagnemma
Abstract	Terramechanics plays a critical role in the areas of ground vehicles and ground mobile robots since understanding and estimating the variables influencing the vehicle-terrain interaction may mean the success or the failure of an entire mission. This research applies state-of-the-art algorithms in deep learning to two key problems: estimating wheel slip and classifying the terrain being traversed by a ground robot. Three data sets collected by ground robotic platforms (MIT single-wheel testbed, MSL Curiosity rover, and tracked robot Fitorobot) are employed in order to compare the performance of traditional machine learning methods (i.e. Support Vector Machine (SVM) and Multi-layer Perceptron (MLP)) against Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs). This work also shows the impact that certain tuning parameters and the network architecture (MLP, DNN and CNN) play on the performance of those methods. This paper also contributes a deep discussion with the lessons learned in the implementation of DNNs and CNNs and how these methods can be extended to solve other problems.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.07379v1
PDF	http://arxiv.org/pdf/1806.07379v1.pdf
PWC	https://paperswithcode.com/paper/deepterramechanics-terrain-classification-and
Repo	https://github.com/ntseng450/DeepTerra
Framework	none

Training Complex Models with Multi-Task Weak Supervision


Title	Training Complex Models with Multi-Task Weak Supervision
Authors	Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, Christopher Ré
Abstract	Snorkel MeTaL: A framework for training models with multi-task weak supervision
Tasks	Matrix Completion
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02840v2
PDF	http://arxiv.org/pdf/1810.02840v2.pdf
PWC	https://paperswithcode.com/paper/training-complex-models-with-multi-task-weak
Repo	https://github.com/HazyResearch/metal
Framework	pytorch

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search


Title	Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
Authors	Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
Abstract	Monte Carlo Tree Search (MCTS) algorithms have achieved great success on many challenging benchmarks (e.g., Computer Go). However, they generally require a large number of rollouts, making their applications costly. Furthermore, it is also extremely challenging to parallelize MCTS due to its inherent sequential nature: each rollout heavily relies on the statistics (e.g., node visitation counts) estimated from previous simulations to achieve an effective exploration-exploitation tradeoff. In spite of these difficulties, we develop an algorithm, WU-UCT, to effectively parallelize MCTS, which achieves linear speedup and exhibits only limited performance loss with an increasing number of workers. The key idea in WU-UCT is a set of statistics that we introduce to track the number of on-going yet incomplete simulation queries (named as unobserved samples). These statistics are used to modify the UCT tree policy in the selection steps in a principled manner to retain effective exploration-exploitation tradeoff when we parallelize the most time-consuming expansion and simulation steps. Experiments on a proprietary benchmark and the Atari Game benchmark demonstrate the linear speedup and the superior performance of WU-UCT comparing to existing techniques.
Tasks
Published	2018-10-28
URL	https://arxiv.org/abs/1810.11755v5
PDF	https://arxiv.org/pdf/1810.11755v5.pdf
PWC	https://paperswithcode.com/paper/p-mcgs-parallel-monte-carlo-acyclic-graph
Repo	https://github.com/liuanji/P-UCT
Framework	pytorch

Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size


Title	Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size
Authors	Ian Goodfellow
Abstract	A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms. Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation that will fool the model, but security guarantees require a lower bound. CLEVER is a proposed scoring method to estimate a lower bound. Unfortunately, an estimate of a bound is not a bound. In this report, we show that gradient masking, a common problem that causes attack methodologies to provide only a very loose upper bound, causes CLEVER to overestimate the size of perturbation needed to fool the model. In other words, CLEVER does not resolve the key problem with the attack-based methodology, because it fails to provide a lower bound.
Tasks
Published	2018-04-21
URL	http://arxiv.org/abs/1804.07870v1
PDF	http://arxiv.org/pdf/1804.07870v1.pdf
PWC	https://paperswithcode.com/paper/gradient-masking-causes-clever-to
Repo	https://github.com/huanzhang12/CLEVER
Framework	tf

Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration


Title	Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration
Authors	Hyemin Ahn, Sungjoon Choi, Nuri Kim, Geonho Cha, Songhwai Oh
Abstract	In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in user’s commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the command from the human user and estimates the position of the desired object first. To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated. The user’s answer to the question is combined with the initial command and given back to the network, resulting in more accurate estimation. The experiment results show that given unambiguous commands, the proposed method can estimate the position of the requested object with an accuracy of 98.49% based on our test dataset. Given ambiguous language commands, we show that the accuracy of the pick up task increases by 1.94 times after incorporating the information obtained from the interaction.
Tasks
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10799v1
PDF	http://arxiv.org/pdf/1805.10799v1.pdf
PWC	https://paperswithcode.com/paper/interactive-text2pickup-network-for-natural
Repo	https://github.com/hiddenmaze/InteractivePickup
Framework	tf

Semi-Supervised Deep Learning for Abnormality Classification in Retinal Images


Title	Semi-Supervised Deep Learning for Abnormality Classification in Retinal Images
Authors	Bruno Lecouat, Ken Chang, Chuan-Sheng Foo, Balagopal Unnikrishnan, James M. Brown, Houssam Zenati, Andrew Beers, Vijay Chandrasekhar, Jayashree Kalpathy-Cramer, Pavitra Krishnaswamy
Abstract	Supervised deep learning algorithms have enabled significant performance gains in medical image classification tasks. But these methods rely on large labeled datasets that require resource-intensive expert annotation. Semi-supervised generative adversarial network (GAN) approaches offer a means to learn from limited labeled data alongside larger unlabeled datasets, but have not been applied to discern fine-scale, sparse or localized features that define medical abnormalities. To overcome these limitations, we propose a patch-based semi-supervised learning approach and evaluate performance on classification of diabetic retinopathy from funduscopic images. Our semi-supervised approach achieves high AUC with just 10-20 labeled training images, and outperforms the supervised baselines by upto 15% when less than 30% of the training dataset is labeled. Further, our method implicitly enables interpretation of the SSL predictions. As this approach enables good accuracy, resolution and interpretability with lower annotation burden, it sets the pathway for scalable applications of deep learning in clinical imaging.
Tasks	Image Classification
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07832v1
PDF	http://arxiv.org/pdf/1812.07832v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-deep-learning-for-abnormality
Repo	https://github.com/theidentity/Improved-GAN-PyTorch
Framework	pytorch

UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation


Title	UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation
Authors	Pablo Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, Alvaro Jover-Alvarez, Sergio Orts-Escolano, Jose Garcia-Rodriguez
Abstract	Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual grasping, and navigation.
Tasks	Depth Estimation, Object Detection, Semantic Segmentation, Synthetic Data Generation
Published	2018-10-16
URL	https://arxiv.org/abs/1810.06936v2
PDF	https://arxiv.org/pdf/1810.06936v2.pdf
PWC	https://paperswithcode.com/paper/unrealrox-an-extremely-photorealistic-virtual
Repo	https://github.com/3dperceptionlab/unrealrox
Framework	none

Multilingual Constituency Parsing with Self-Attention and Pre-Training


Title	Multilingual Constituency Parsing with Self-Attention and Pre-Training
Authors	Nikita Kitaev, Steven Cao, Dan Klein
Abstract	We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).
Tasks	Constituency Parsing
Published	2018-12-31
URL	https://arxiv.org/abs/1812.11760v2
PDF	https://arxiv.org/pdf/1812.11760v2.pdf
PWC	https://paperswithcode.com/paper/multilingual-constituency-parsing-with-self
Repo	https://github.com/dpfried/rnng-bert
Framework	tf