January 30, 2020

3028 words 15 mins read

Paper Group ANR 467

Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis. Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning. TextTubes for Detecting Curved Text in the Wild. 3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers. Improved Mix-up with KL-Entropy for Learning From Noisy Labels. Co-Attention Hiera …

Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis


Title	Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis
Authors	Nilesh Pandey, Andreas Savakis
Abstract	We present Poly-GAN, a novel conditional GAN architecture that is motivated by Fashion Synthesis, an application where garments are automatically placed on images of human models at an arbitrary pose. Poly-GAN allows conditioning on multiple inputs and is suitable for many tasks, including image alignment, image stitching, and inpainting. Existing methods have a similar pipeline where three different networks are used to first align garments with the human pose, then perform stitching of the aligned garment and finally refine the results. Poly-GAN is the first instance where a common architecture is used to perform all three tasks. Our novel architecture enforces the conditions at all layers of the encoder and utilizes skip connections from the coarse layers of the encoder to the respective layers of the decoder. Poly-GAN is able to perform a spatial transformation of the garment based on the RGB skeleton of the model at an arbitrary pose. Additionally, Poly-GAN can perform image stitching, regardless of the garment orientation, and inpainting on the garment mask when it contains irregular holes. Our system achieves state-of-the-art quantitative results on Structural Similarity Index metric and Inception Score metric using the DeepFashion dataset.
Tasks	Image Stitching
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02165v1
PDF	https://arxiv.org/pdf/1909.02165v1.pdf
PWC	https://paperswithcode.com/paper/poly-gan-multi-conditioned-gan-for-fashion
Repo
Framework

Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning


Title	Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning
Authors	Yibo Sun, Duyu Tang, Nan Duan, Yeyun Gong, Xiaocheng Feng, Bing Qin, Daxin Jiang
Abstract	Neural semantic parsing has achieved impressive results in recent years, yet its success relies on the availability of large amounts of supervised data. Our goal is to learn a neural semantic parser when only prior knowledge about a limited number of simple rules is available, without access to either annotated programs or execution results. Our approach is initialized by rules, and improved in a back-translation paradigm using generated question-program pairs from the semantic parser and the question generator. A phrase table with frequent mapping patterns is automatically derived, also updated as training progresses, to measure the quality of generated instances. We train the model with model-agnostic meta-learning to guarantee the accuracy and stability on examples covered by rules, and meanwhile acquire the versatility to generalize well on examples uncovered by rules. Results on three benchmark datasets with different domains and programs show that our approach incrementally improves the accuracy. On WikiSQL, our best model is comparable to the SOTA system learned from denotations.
Tasks	Meta-Learning, Semantic Parsing
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05438v1
PDF	https://arxiv.org/pdf/1909.05438v1.pdf
PWC	https://paperswithcode.com/paper/neural-semantic-parsing-in-low-resource
Repo
Framework

TextTubes for Detecting Curved Text in the Wild


Title	TextTubes for Detecting Curved Text in the Wild
Authors	Joël Seytre, Jon Wu, Alessandro Achille
Abstract	We present a detector for curved text in natural images. We model scene text instances as tubes around their medial axes and introduce a parametrization-invariant loss function. We train a two-stage curved text detector, and evaluate it on the curved text benchmarks CTW-1500 and Total-Text. Our approach achieves state-of-the-art results or improves upon them, notably for CTW-1500 by over 8 percentage points in F-score.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.08990v1
PDF	https://arxiv.org/pdf/1912.08990v1.pdf
PWC	https://paperswithcode.com/paper/texttubes-for-detecting-curved-text-in-the
Repo
Framework

3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers


Title	3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers
Authors	Daeyun Shin, Zhile Ren, Erik B. Sudderth, Charless C. Fowlkes
Abstract	We tackle the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewer-centered, multi-layer representation of scene geometry adapted from recent methods for single object shape completion. To improve the accuracy of view-centered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry. Unlike existing approaches that first detect and localize objects in 3D, and then infer object shape using category-specific models, our approach is fully convolutional, end-to-end differentiable, and avoids the resolution and memory limitations of voxel representations. We demonstrate the advantages of multi-layer depth representations and epipolar feature transformers on the reconstruction of a large database of indoor scenes.
Tasks	3D Scene Reconstruction
Published	2019-02-18
URL	https://arxiv.org/abs/1902.06729v2
PDF	https://arxiv.org/pdf/1902.06729v2.pdf
PWC	https://paperswithcode.com/paper/multi-layer-depth-and-epipolar-feature
Repo
Framework

Improved Mix-up with KL-Entropy for Learning From Noisy Labels


Title	Improved Mix-up with KL-Entropy for Learning From Noisy Labels
Authors	Qian Zhang, Feifei Lee, Ya-Gang Wang, Qiu Chen
Abstract	Despite the deep neural networks (DNN) has achieved excellent performance in image classification researches, the training of DNNs needs a large of clean data with accurate annotations. The collect of a dataset is easy, but it is difficult to annotate the collecting data. On the websites, there exist a lot of image data which contains inaccurate annotations, but training on these datasets may make networks easier to over-fit the noisy labels and cause performance degradation. In this work, we propose an improved joint optimization framework, which mixed the mix-up entropy and Kullback-Leibler (KL) entropy as the loss function. The new loss function can give the better fine-tuning after the framework updates both the label annotations. We conduct experiments on CIFAR-10 dataset and Clothing1M dataset. The result shows the advantageous performance of our approach compared with other state-of-the-art methods.
Tasks	Image Classification
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05488v2
PDF	https://arxiv.org/pdf/1908.05488v2.pdf
PWC	https://paperswithcode.com/paper/improved-mix-up-with-kl-entropy-for-learning
Repo
Framework

Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension


Title	Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension
Authors	Xiaorui Zhou, Senlin Luo, Yunfang Wu
Abstract	In reading comprehension, generating sentence-level distractors is a significant task, which requires a deep understanding of the article and question. The traditional entity-centered methods can only generate word-level or phrase-level distractors. Although recently proposed neural-based methods like sequence-to-sequence (Seq2Seq) model show great potential in generating creative text, the previous neural methods for distractor generation ignore two important aspects. First, they didn’t model the interactions between the article and question, making the generated distractors tend to be too general or not relevant to question context. Second, they didn’t emphasize the relationship between the distractor and article, making the generated distractors not semantically relevant to the article and thus fail to form a set of meaningful options. To solve the first problem, we propose a co-attention enhanced hierarchical architecture to better capture the interactions between the article and question, thus guide the decoder to generate more coherent distractors. To alleviate the second problem, we add an additional semantic similarity loss to push the generated distractors more relevant to the article. Experimental results show that our model outperforms several strong baselines on automatic metrics, achieving state-of-the-art performance. Further human evaluation indicates that our generated distractors are more coherent and more educative compared with those distractors generated by baselines.
Tasks	Reading Comprehension, Semantic Similarity, Semantic Textual Similarity
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08648v1
PDF	https://arxiv.org/pdf/1911.08648v1.pdf
PWC	https://paperswithcode.com/paper/co-attention-hierarchical-network-generating
Repo
Framework

Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering


Title	Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
Authors	Vikas Yadav, Steven Bethard, Mihai Surdeanu
Abstract	We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer. This unsupervised sentence selection method can be coupled with any supervised QA approach. We show that the sentences selected by our method improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2’s Reasoning Challenge (ARC) and Multi-Sentence Reading Comprehension (MultiRC). We obtain new state-of-the-art performance on both datasets among approaches that do not use external resources for training the QA system: 56.82% F1 on ARC (41.24% on Challenge and 64.49% on Easy) and 26.1% EM0 on MultiRC. Our justification sentences have higher quality than the justifications selected by a strong information retrieval baseline, e.g., by 5.4% F1 in MultiRC. We also show that our unsupervised selection of justification sentences is more stable across domains than a state-of-the-art supervised sentence selection method.
Tasks	Information Retrieval, Question Answering, Reading Comprehension
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07176v1
PDF	https://arxiv.org/pdf/1911.07176v1.pdf
PWC	https://paperswithcode.com/paper/quick-and-not-so-dirty-unsupervised-selection-1
Repo
Framework

A modified Genetic Algorithm for continuous estimation of CPR quality parameters from wrist-worn inertial sensor data


Title	A modified Genetic Algorithm for continuous estimation of CPR quality parameters from wrist-worn inertial sensor data
Authors	Christian Lins, Björn Friedrich, Andreas Hein, Sebastian Fudickar
Abstract	Cardiopulmonary resuscitation (CPR) is the most important emergency intervention for sudden cardiac arrest. In this paper, a robust sinusoidal model fitting method based on a modified Genetic Algorithm for CPR quality parameters - naming chest compression frequency and depth - as measured by an inertial sensor placed at the wrist is presented. Once included into a smartphone or smartwatch app, the proposed algorithm will enable bystanders to improve CPR (as part of a continuous closed-loop support-system). By evaluating the precision of the model with both, simulated data and data recorded by a Laerdal Resusci Anne mannequin as reference standard, a variance for compression frequency of +-3.7 cpm has been found for the sensor placed at the wrist. Thereby, this previously unconsidered position and consequently the use of smartwatches was shown to be a suitable alternative to the typical placement of phones in the hand for CPR training.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.06250v1
PDF	https://arxiv.org/pdf/1910.06250v1.pdf
PWC	https://paperswithcode.com/paper/a-modified-genetic-algorithm-for-continuous
Repo
Framework

Sample Amplification: Increasing Dataset Size even when Learning is Impossible


Title	Sample Amplification: Increasing Dataset Size even when Learning is Impossible
Authors	Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant
Abstract	Given data drawn from an unknown distribution, $D$, to what extent is it possible to `amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification procedure}$ takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$` samples’'. An amplification procedure is valid if no algorithm can distinguish the set of $m$ samples produced by the amplifier from a set of $m$ independent draws from $D$, with probability greater than $2/3$. Perhaps surprisingly, in many settings, a valid amplification procedure exists, even when the size of the input dataset, $n$, is significantly less than what would be necessary to learn $D$ to non-trivial accuracy. Specifically we consider two fundamental settings: the case where $D$ is an arbitrary discrete distribution supported on $\le k$ elements, and the case where $D$ is a $d$-dimensional Gaussian with unknown mean, and fixed covariance. In the first case, we show that an $\left(n, n + \Theta(\frac{n}{\sqrt{k}})\right)$ amplifier exists. In particular, given $n=O(\sqrt{k})$ samples from $D$, one can output a set of $m=n+1$ datapoints, whose total variation distance from the distribution of $m$ i.i.d. draws from $D$ is a small constant, despite the fact that one would need quadratically more data, $n=\Theta(k)$, to learn $D$ up to small constant total variation distance. In the Gaussian case, we show that an $\left(n,n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples. In both the discrete and Gaussian settings, we show that these results are tight, to constant factors. Beyond these results, we formalize a number of curious directions for future research along this vein.
Tasks
Published	2019-04-26
URL	https://arxiv.org/abs/1904.12053v2
PDF	https://arxiv.org/pdf/1904.12053v2.pdf
PWC	https://paperswithcode.com/paper/sample-amplification-increasing-dataset-size
Repo
Framework

Quantitative Programming by Examples


Title	Quantitative Programming by Examples
Authors	Sumit Gulwani, Kunal Pathak, Arjun Radhakrishna, Ashish Tiwari, Abhishek Udupa
Abstract	Programming-by-Example (PBE) systems synthesize an intended program in some (relatively constrained) domain-specific language from a small number of input-output examples provided by the user. In this paper, we motivate and define the problem of quantitative PBE (qPBE) that relates to synthesizing an intended program over an underlying (real world) programming language that also minimizes a given quantitative cost function. We present a modular approach for solving qPBE that consists of three phases: intent disambiguation, global search, and local search. On two concrete objectives, namely program performance and size, our qPBE procedure achieves $1.53 X$ and $1.26 X$ improvement respectively over the baseline FlashFill PBE system, averaged over $701$ benchmarks. Our detailed experiments validate the design of our procedure and show the value of combining global and local search for qPBE.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05964v1
PDF	https://arxiv.org/pdf/1909.05964v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-programming-by-examples
Repo
Framework

Deep Esophageal Clinical Target Volume Delineation using Encoded 3D Spatial Context of Tumors, Lymph Nodes, and Organs At Risk


Title	Deep Esophageal Clinical Target Volume Delineation using Encoded 3D Spatial Context of Tumors, Lymph Nodes, and Organs At Risk
Authors	Dakai Jin, Dazhou Guo, Tsung-Ying Ho, Adam P. Harrison, Jing Xiao, Chen-kan Tseng, Le Lu
Abstract	Clinical target volume (CTV) delineation from radiotherapy computed tomography (RTCT) images is used to define the treatment areas containing the gross tumor volume (GTV) and/or sub-clinical malignant disease for radiotherapy (RT). High intra- and inter-user variability makes this a particularly difficult task for esophageal cancer. This motivates automated solutions, which is the aim of our work. Because CTV delineation is highly context-dependent–it must encompass the GTV and regional lymph nodes (LNs) while also avoiding excessive exposure to the organs at risk (OARs)–we formulate it as a deep contextual appearance-based problem using encoded spatial contexts of these anatomical structures. This allows the deep network to better learn from and emulate the margin- and appearance-based delineation performed by human physicians. Additionally, we develop domain-specific data augmentation to inject robustness to our system. Finally, we show that a simple 3D progressive holistically nested network (PHNN), which avoids computationally heavy decoding paths while still aggregating features at different levels of context, can outperform more complicated networks. Cross-validated experiments on a dataset of 135 esophageal cancer patients demonstrate that our encoded spatial context approach can produce concrete performance improvements, with an average Dice score of 83.9% and an average surface distance of 4.2 mm, representing improvements of 3.8% and 2.4 mm, respectively, over the state-of-the-art approach.
Tasks	Data Augmentation
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01526v2
PDF	https://arxiv.org/pdf/1909.01526v2.pdf
PWC	https://paperswithcode.com/paper/deep-esophageal-clinical-target-volume
Repo
Framework

Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions


Title	Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions
Authors	Zhishen Huang, Stephen Becker
Abstract	We consider the problem of finding local minimizers in non-convex and non-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-smooth case, which requires different analysis and a different algorithm.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08958v1
PDF	http://arxiv.org/pdf/1901.08958v1.pdf
PWC	https://paperswithcode.com/paper/perturbed-proximal-descent-to-escape-saddle
Repo
Framework

hood2vec: Identifying Similar Urban Areas Using Mobility Networks


Title	hood2vec: Identifying Similar Urban Areas Using Mobility Networks
Authors	Xin Liu, Konstantinos Pelechrinis, Alexandros Labrinidis
Abstract	Which area in NYC is the most similar to Lower East Side? What about the NoHo Arts District in Los Angeles? Traditionally this task utilizes information about the type of places located within the areas and some popularity/quality metric. We take a different approach. In particular, urban dwellers’ time-variant mobility is a reflection of how they interact with their city over time. Hence, in this paper, we introduce an approach, namely hood2vec, to identify the similarity between urban areas through learning a node embedding of the mobility network captured through Foursquare check-ins. We compare the pairwise similarities obtained from hood2vec with the ones obtained from comparing the types of venues in the different areas. The low correlation between the two indicates that the mobility dynamics and the venue types potentially capture different aspects of similarity between urban areas.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.11951v1
PDF	https://arxiv.org/pdf/1907.11951v1.pdf
PWC	https://paperswithcode.com/paper/hood2vec-identifying-similar-urban-areas
Repo
Framework

Object-driven Text-to-Image Synthesis via Adversarial Training


Title	Object-driven Text-to-Image Synthesis via Adversarial Training
Authors	Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao
Abstract	In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes. Following the two-step (layout-image) generation process, a novel object-driven attentive image generator is proposed to synthesize salient objects by paying attention to the most relevant words in the text description and the pre-generated semantic layout. In addition, a new Fast R-CNN based object-wise discriminator is proposed to provide rich object-wise discrimination signals on whether the synthesized object matches the text description and the pre-generated layout. The proposed Obj-GAN significantly outperforms the previous state of the art in various metrics on the large-scale COCO benchmark, increasing the Inception score by 27% and decreasing the FID score by 11%. A thorough comparison between the traditional grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
Tasks	Image Generation
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10740v1
PDF	http://arxiv.org/pdf/1902.10740v1.pdf
PWC	https://paperswithcode.com/paper/object-driven-text-to-image-synthesis-via
Repo
Framework

The SWAX Benchmark: Attacking Biometric Systems with Wax Figures


Title	The SWAX Benchmark: Attacking Biometric Systems with Wax Figures
Authors	Rafael Henrique Vareto, Araceli Marcia Sandanha, William Robson Schwartz
Abstract	A face spoofing attack occurs when an intruder attempts to impersonate someone who carries a gainful authentication clearance. It is a trending topic due to the increasing demand for biometric authentication on mobile devices, high-security areas, among others. This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations. Experiments performed with baseline methods show that despite the progress in recent years, advanced spoofing methods are still vulnerable to high-quality violation attempts.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09642v1
PDF	https://arxiv.org/pdf/1910.09642v1.pdf
PWC	https://paperswithcode.com/paper/the-swax-benchmark-attacking-biometric
Repo
Framework