February 1, 2020

3293 words 16 mins read

Paper Group AWR 232

Lemotif: An Affective Visual Journal Using Deep Neural Networks. DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments. Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework. EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search. An Updated Duet Mo …

Lemotif: An Affective Visual Journal Using Deep Neural Networks


Title	Lemotif: An Affective Visual Journal Using Deep Neural Networks
Authors	X. Alice Li, Devi Parikh
Abstract	We present Lemotif, an integrated natural language processing and image generation system that uses machine learning to (1) parse a text-based input journal entry describing the user’s day for salient themes and emotions and (2) visualize the detected themes and emotions in creative and appealing image motifs. Synthesizing approaches from artificial intelligence and psychology, Lemotif acts as an affective visual journal, encouraging users to regularly write and reflect on their daily experiences through visual reinforcement. By making patterns in emotions and their sources more apparent, Lemotif aims to help users better understand their emotional lives, identify opportunities for action, and track the effectiveness of behavioral changes over time. We verify via human studies that prospective users prefer motifs generated by Lemotif over corresponding baselines, find the motifs representative of their journal entries, and think they would be more likely to journal regularly using a Lemotif-based app.
Tasks	Image Generation
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07766v3
PDF	https://arxiv.org/pdf/1903.07766v3.pdf
PWC	https://paperswithcode.com/paper/lemotif-abstract-visual-depictions-of-your
Repo	https://github.com/xaliceli/lemotif
Framework	tf

DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments


Title	DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments
Authors	Anwesan Pal, Carlos Nieto-Granda, Henrik I. Christensen
Abstract	In recent years, there has been a rapid increase in the number of service robots deployed for aiding people in their daily activities. Unfortunately, most of these robots require human input for training in order to do tasks in indoor environments. Successful domestic navigation often requires access to semantic information about the environment, which can be learned without human guidance. In this paper, we propose a set of DEDUCE - Diverse scEne Detection methods in Unseen Challenging Environments algorithms which incorporate deep fusion models derived from scene recognition systems and object detectors. The five methods described here have been evaluated on several popular recent image datasets, as well as real-world videos acquired through multiple mobile platforms. The final results show an improvement over the existing state-of-the-art visual place recognition systems.
Tasks	Scene Recognition, Visual Place Recognition
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00191v1
PDF	https://arxiv.org/pdf/1908.00191v1.pdf
PWC	https://paperswithcode.com/paper/deduce-diverse-scene-detection-methods-in
Repo	https://github.com/anwesanpal/DEDUCE
Framework	pytorch

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework


Title	Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework
Authors	Zirui Wang, Jiateng Xie, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell
Abstract	Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks. There are two main paradigms for learning such representations: (1) alignment, which maps different independently trained monolingual representations into a shared space, and (2) joint training, which directly learns unified multilingual representations using monolingual and cross-lingual objectives jointly. In this paper, we first conduct direct comparisons of representations learned using both of these methods across diverse cross-lingual tasks. Our empirical results reveal a set of pros and cons for both methods, and show that the relative performance of alignment versus joint training is task-dependent. Stemming from this analysis, we propose a simple and novel framework that combines these two previously mutually-exclusive approaches. Extensive experiments demonstrate that our proposed framework alleviates limitations of both approaches, and outperforms existing methods on the MUSE bilingual lexicon induction (BLI) benchmark. We further show that this framework can generalize to contextualized representations such as Multilingual BERT, and produces state-of-the-art results on the CoNLL cross-lingual NER benchmark.
Tasks	Cross-Lingual Transfer, Transfer Learning
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04708v4
PDF	https://arxiv.org/pdf/1910.04708v4.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-alignment-vs-joint-training-a
Repo	https://github.com/thespectrewithin/joint-align
Framework	pytorch

EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search


Title	EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search
Authors	Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, Xinggang Wang
Abstract	Neural architecture search (NAS) methods have been proposed to release human experts from tedious architecture engineering. However, most current methods are constrained in small-scale search due to the issue of computational resources. Meanwhile, directly applying architectures searched on small datasets to large datasets often bears no performance guarantee. This limitation impedes the wide use of NAS on large-scale tasks. To overcome this obstacle, we propose an elastic architecture transfer mechanism for accelerating large-scale neural architecture search (EAT-NAS). In our implementations, architectures are first searched on a small dataset, e.g., CIFAR-10. The best one is chosen as the basic architecture. The search process on the large dataset, e.g., ImageNet, is initialized with the basic architecture as the seed. The large-scale search process is accelerated with the help of the basic architecture. What we propose is not only a NAS method but a mechanism for architecture-level transfer. In our experiments, we obtain two final models EATNet-A and EATNet-B that achieve competitive accuracies, 74.7% and 74.2% on ImageNet, respectively, which also surpass the models searched from scratch on ImageNet under the same settings. For the computational cost, EAT-NAS takes only less than 5 days on 8 TITAN X GPUs, which is significantly less than the computational consumption of the state-of-the-art large-scale NAS methods.
Tasks	Neural Architecture Search
Published	2019-01-17
URL	http://arxiv.org/abs/1901.05884v3
PDF	http://arxiv.org/pdf/1901.05884v3.pdf
PWC	https://paperswithcode.com/paper/eat-nas-elastic-architecture-transfer-for
Repo	https://github.com/JaminFong/EAT-NAS
Framework	mxnet

An Updated Duet Model for Passage Re-ranking


Title	An Updated Duet Model for Passage Re-ranking
Authors	Bhaskar Mitra, Nick Craswell
Abstract	We propose several small modifications to Duet—a deep neural ranking model—and evaluate the updated model on the MS MARCO passage ranking task. We report significant improvements from the proposed changes based on an ablation study.
Tasks	Passage Re-Ranking
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07666v1
PDF	http://arxiv.org/pdf/1903.07666v1.pdf
PWC	https://paperswithcode.com/paper/an-updated-duet-model-for-passage-re-ranking
Repo	https://github.com/dfcf93/MSMARCO
Framework	none

Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning


Title	Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning
Authors	Yann Labbé, Sergey Zagoruyko, Igor Kalevatykh, Ivan Laptev, Justin Carpentier, Mathieu Aubry, Josef Sivic
Abstract	We address the problem of visually guided rearrangement planning with many movable objects, i.e., finding a sequence of actions to move a set of objects from an initial arrangement to a desired one, while relying on visual inputs coming from an RGB camera. To do so, we introduce a complete pipeline relying on two key contributions. First, we introduce an efficient and scalable rearrangement planning method, based on a Monte-Carlo Tree Search exploration strategy. We demonstrate that because of its good trade-off between exploration and exploitation our method (i) scales well with the number of objects while (ii) finding solutions which require a smaller number of moves compared to the other state-of-the-art approaches. Note that on the contrary to many approaches, we do not require any buffer space to be available. Second, to precisely localize movable objects in the scene, we develop an integrated approach for robust multi-object workspace state estimation from a single uncalibrated RGB camera using a deep neural network trained only with synthetic data. We validate our multi-object visually guided manipulation pipeline with several experiments on a real UR-5 robotic arm by solving various rearrangement planning instances, requiring only 60 ms to compute the plan to rearrange 25 objects. In addition, we show that our system is insensitive to camera movements and can successfully recover from external perturbations. Supplementary video, source code and pre-trained models are available at https://ylabbe.github.io/rearrangement-planning.
Tasks	Calibration
Published	2019-04-23
URL	https://arxiv.org/abs/1904.10348v2
PDF	https://arxiv.org/pdf/1904.10348v2.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-tree-search-for-efficient
Repo	https://github.com/ylabbe/rearrangement-planning
Framework	pytorch

Sequence Modeling with Unconstrained Generation Order


Title	Sequence Modeling with Unconstrained Generation Order
Authors	Dmitrii Emelianenko, Elena Voita, Pavel Serdyukov
Abstract	The dominant approach to sequence generation is to produce a sequence in some predefined order, e.g. left to right. In contrast, we propose a more general model that can generate the output sequence by inserting tokens in any arbitrary order. Our model learns decoding order as a result of its training procedure. Our experiments show that this model is superior to fixed order models on a number of sequence generation tasks, such as Machine Translation, Image-to-LaTeX and Image Captioning.
Tasks	Image Captioning, Machine Translation
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00176v1
PDF	https://arxiv.org/pdf/1911.00176v1.pdf
PWC	https://paperswithcode.com/paper/sequence-modeling-with-unconstrained
Repo	https://github.com/TIXFeniks/neurips2019_intrus
Framework	tf

Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods


Title	Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods
Authors	Xingtong Liu, Ayushi Sinha, Masaru Ishii, Gregory D. Hager, Austin Reiter, Russell H. Taylor, Mathias Unberath
Abstract	We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic videos and a multi-view stereo method, e.g., structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter mean residual error. In a comparison study to recent self-supervised depth estimation methods designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous methods by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.
Tasks	Computed Tomography (CT), Depth Estimation
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07766v2
PDF	https://arxiv.org/pdf/1902.07766v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-for-dense-depth
Repo	https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch
Framework	pytorch

Lidar-Camera Co-Training for Semi-Supervised Road Detection


Title	Lidar-Camera Co-Training for Semi-Supervised Road Detection
Authors	Luca Caltagirone, Lennart Svensson, Mattias Wahde, Martin Sanfridson
Abstract	Recent advances in the field of machine learning and computer vision have enabled the development of fast and accurate road detectors. Commonly such systems are trained within a supervised learning paradigm where both an input sensor’s data and the corresponding ground truth label must be provided. The task of generating labels is commonly carried out by human annotators and it is notoriously time consuming and expensive. In this work, it is shown that a semi-supervised approach known as co-training can provide significant F1-score average improvements compared to supervised learning. In co-training, two classifiers acting on different views of the data cooperatively improve each other’s performance by leveraging unlabeled examples. Depending on the amount of labeled data used, the improvements ranged from 1.12 to 6.10 percentage points for a camera-based road detector and from 1.04 to 8.14 percentage points for a lidar-based road detector. Lastly, the co-training algorithm is validated on the KITTI road benchmark, achieving high performance using only 36 labeled training examples together with several thousands unlabeled ones.
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12597v1
PDF	https://arxiv.org/pdf/1911.12597v1.pdf
PWC	https://paperswithcode.com/paper/lidar-camera-co-training-for-semi-supervised
Repo	https://github.com/luca-caltagirone/cotrain
Framework	pytorch

Fire Now, Fire Later: Alarm-Based Systems for Prescriptive Process Monitoring


Title	Fire Now, Fire Later: Alarm-Based Systems for Prescriptive Process Monitoring
Authors	Stephan A. Fahrenkrog-Petersen, Niek Tax, Irene Teinemaa, Marlon Dumas, Massimiliano de Leoni, Fabrizio Maria Maggi, Matthias Weidlich
Abstract	Predictive process monitoring is a family of techniques to analyze events produced during the execution of a business process in order to predict the future state or the final outcome of running process instances. Existing techniques in this field are able to predict, at each step of a process instance, the likelihood that it will lead to an undesired outcome.These techniques, however, focus on generating predictions and do not prescribe when and how process workers should intervene to decrease the cost of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring with the ability to generate alarms that trigger interventions to prevent an undesired outcome or mitigate its effect. The framework incorporates a parameterized cost model to assess the cost-benefit trade-off of generating alarms. We show how to optimize the generation of alarms given an event log of past process executions and a set of cost model parameters. The proposed approaches are empirically evaluated using a range of real-life event logs. The experimental results show that the net cost of undesired outcomes can be minimized by changing the threshold for generating alarms, as the process instance progresses. Moreover, introducing delays for triggering alarms, instead of triggering them as soon as the probability of an undesired outcome exceeds a threshold, leads to lower net costs.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09568v1
PDF	https://arxiv.org/pdf/1905.09568v1.pdf
PWC	https://paperswithcode.com/paper/fire-now-fire-later-alarm-based-systems-for
Repo	https://github.com/samadeusfp/alarmBasedPrescriptiveProcessMonitoring
Framework	none

Learning Bodily and Temporal Attention in Protective Movement Behavior Detection


Title	Learning Bodily and Temporal Attention in Protective Movement Behavior Detection
Authors	Chongyang Wang, Min Peng, Temitayo A. Olugbade, Nicholas D. Lane, Amanda C. De C. Williams, Nadia Bianchi-Berthouze
Abstract	For people with chronic pain, the assessment of protective behavior during physical functioning is essential to understand their subjective pain-related experiences (e.g., fear and anxiety toward pain and injury) and how they deal with such experiences (avoidance or reliance on specific body joints), with the ultimate goal of guiding intervention. Advances in deep learning (DL) can enable the development of such intervention. Using the EmoPain MoCap dataset, we investigate how attention-based DL architectures can be used to improve the detection of protective behavior by capturing the most informative temporal and body configurational cues characterizing specific movements and the strategies used to perform them. We propose an end-to-end deep learning architecture named BodyAttentionNet (BANet). BANet is designed to learn temporal and bodily parts that are more informative to the detection of protective behavior. The approach addresses the variety of ways people execute a movement (including healthy people) independently of the type of movement analyzed. Through extensive comparison experiments with other state-of-the-art machine learning techniques used with motion capture data, we show statistically significant improvements achieved by using these attention mechanisms. In addition, the BANet architecture requires a much lower number of parameters than the state of the art for comparable if not higher performances.
Tasks	Motion Capture
Published	2019-04-24
URL	https://arxiv.org/abs/1904.10824v3
PDF	https://arxiv.org/pdf/1904.10824v3.pdf
PWC	https://paperswithcode.com/paper/learning-bodily-and-temporal-attention-in
Repo	https://github.com/CodeShareBot/BodyAttentionNetwork
Framework	none

Semantic-Aware Scene Recognition


Title	Semantic-Aware Scene Recognition
Authors	Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós, Álvaro García-Martín
Abstract	Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performance in scene recognition, albeit it is still far below from other recognition tasks (e.g., object or image recognition). In this paper, we describe a novel approach for scene recognition based on an end-to-end multi-modal CNN that combines image and context information by means of an attention module. Context information, in the shape of semantic segmentation, is used to gate features extracted from the RGB image by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations. This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them. Experimental results on four publicly available datasets show that the proposed approach outperforms every other state-of-the-art method while significantly reducing the number of network parameters. All the code and data used along this paper is available at https://github.com/vpulab/Semantic-Aware-Scene-Recognition
Tasks	Scene Recognition, Semantic Segmentation
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02410v3
PDF	https://arxiv.org/pdf/1909.02410v3.pdf
PWC	https://paperswithcode.com/paper/semantic-aware-scene-recognition
Repo	https://github.com/vpulab/Semantic-Aware-Scene-Recognition
Framework	pytorch

Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification


Title	Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification
Authors	S. H. Shabbeer Basha, Shiv Ram Dubey, Viswanath Pulabaigari, Snehasis Mukherjee
Abstract	The Convolutional Neural Networks (CNNs), in domains like computer vision, mostly reduced the need for handcrafted features due to its ability to learn the problem-specific features from the raw input data. However, the selection of dataset-specific CNN architecture, which mostly performed by either experience or expertise is a time-consuming and error-prone process. To automate the process of learning a CNN architecture, this paper attempts at finding the relationship between Fully Connected (FC) layers with some of the characteristics of the datasets. The CNN architectures, and recently datasets also, are categorized as deep, shallow, wide, etc. This paper tries to formalize these terms along with answering the following questions. (i) What is the impact of deeper/shallow architectures on the performance of the CNN w.r.t. FC layers?, (ii) How the deeper/wider datasets influence the performance of CNN w.r.t. FC layers?, and (iii) Which kind of architecture (deeper/ shallower) is better suitable for which kind of (deeper/ wider) datasets. To address these findings, we have performed experiments with three CNN architectures having different depths. The experiments are conducted by varying the number of FC layers. We used four widely used datasets including CIFAR-10, CIFAR-100, Tiny ImageNet, and CRCHistoPhenotypes to justify our findings in the context of the image classification problem. The source code of this research is available at https://github.com/shabbeersh/Impact-of-FC-layers.
Tasks	Image Classification
Published	2019-01-21
URL	https://arxiv.org/abs/1902.02771v3
PDF	https://arxiv.org/pdf/1902.02771v3.pdf
PWC	https://paperswithcode.com/paper/impact-of-fully-connected-layers-on
Repo	https://github.com/shabbeersh/Impact-of-FC-layers
Framework	tf

DFPENet-geology: A Deep Learning Framework for High Precision Recognition and Segmentation of Co-seismic Landslides


Title	DFPENet-geology: A Deep Learning Framework for High Precision Recognition and Segmentation of Co-seismic Landslides
Authors	Qingsong Xu, Chaojun Ouyang, Tianhai Jiang, Xuanmei Fan, Duoxiang Cheng
Abstract	The following lists two main reasons for withdrawal for the public. 1. There are some problems in the method and results, and there is a lot of room for improvement. In terms of method, “Pre-trained Datasets (PD)” represents selecting a small amount from the online test set, which easily causes the model to overfit the online test set and could not obtain robust performance. More importantly, the proposed DFPENet has a high redundancy by combining the Attention Gate Mechanism and Gate Convolution Networks, and we need to revisit the section of geological feature fusion, in terms of results, we need to further improve and refine. 2. arXiv is an open-access repository of electronic preprints without peer reviews. However, for our own research, we need experts to provide comments on my work whether negative or positive. I then would use their comments to significantly improve this manuscript. Therefore, we finally decided to withdraw this manuscript in arXiv, and we will update to arXiv with the final accepted manuscript to facilitate more researchers to use our proposed comprehensive and general scheme to recognize and segment seismic landslides more efficiently.
Tasks	Scene Recognition, Transfer Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10907v2
PDF	https://arxiv.org/pdf/1908.10907v2.pdf
PWC	https://paperswithcode.com/paper/dfpenet-geology-a-deep-learning-framework-for
Repo	https://github.com/xupine/DFPENet
Framework	pytorch

Stroke extraction for offline handwritten mathematical expression recognition


Title	Stroke extraction for offline handwritten mathematical expression recognition
Authors	Chungkwong Chan
Abstract	Offline handwritten mathematical expression recognition is often considered much harder than its online counterpart due to the absence of temporal information. In order to take advantage of the more mature methods for online recognition and save resources, an oversegmentation approach is proposed to recover strokes from textual bitmap images automatically. The proposed algorithm first breaks down the skeleton of a binarized image into junctions and segments, then segments are merged to form strokes, finally stroke order is normalized by using recursive projection and topological sort. Good offline accuracy was obtained in combination with ordinary online recognizers, which are not specially designed for extracted strokes. Given a ready-made state-of-the-art online handwritten mathematical expression recognizer, the proposed procedure correctly recognized 58.22%, 65.65%, and 65.22% of the offline formulas rendered from the datasets of the Competitions on Recognition of Online Handwritten Mathematical Expressions(CROHME) in 2014, 2016, and 2019 respectively. Furthermore, given a trainable online recognition system, retraining it with extracted strokes resulted in an offline recognizer with the same level of accuracy. On the other hand, the speed of the entire pipeline was fast enough to facilitate on-device recognition on mobile phones with limited resources. To conclude, stroke extraction provides an attractive way to build optical character recognition software.
Tasks	Optical Character Recognition
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06749v2
PDF	https://arxiv.org/pdf/1905.06749v2.pdf
PWC	https://paperswithcode.com/paper/stroke-extraction-for-offline-handwritten
Repo	https://github.com/chungkwong/mathocr-myscript-android
Framework	none