July 29, 2019

2820 words 14 mins read

Paper Group AWR 163

PageNet: Page Boundary Extraction in Historical Handwritten Documents. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset. 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. DualGAN: Unsupervised Dual Learning for Image- …

PageNet: Page Boundary Extraction in Historical Handwritten Documents


Title	PageNet: Page Boundary Extraction in Historical Handwritten Documents
Authors	Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett
Abstract	When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.
Tasks
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01618v1
PDF	http://arxiv.org/pdf/1709.01618v1.pdf
PWC	https://paperswithcode.com/paper/pagenet-page-boundary-extraction-in
Repo	https://github.com/ctensmeyer/pagenet
Framework	caffe2

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset


Title	STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Authors	Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi
Abstract	In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
Tasks	Image Captioning, Machine Translation
Published	2017-05-02
URL	http://arxiv.org/abs/1705.00823v1
PDF	http://arxiv.org/pdf/1705.00823v1.pdf
PWC	https://paperswithcode.com/paper/stair-captions-constructing-a-large-scale
Repo	https://github.com/STAIR-Lab-CIT/STAIR-captions
Framework	none

3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks


Title	3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks
Authors	Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, Rui Wang
Abstract	We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure.
Tasks	3D Reconstruction
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06375v3
PDF	http://arxiv.org/pdf/1707.06375v3.pdf
PWC	https://paperswithcode.com/paper/3d-shape-reconstruction-from-sketches-via
Repo	https://github.com/aghinsa/SketchTo3D
Framework	tf

Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness


Title	Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness
Authors	Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu
Abstract	The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined groups, and then ask for parity of some statistic of the classifier across these groups. Constraints of this form are susceptible to intentional or inadvertent “fairness gerrymandering”, in which a classifier appears to be fair on each individual group, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness and recently proposed individual notions of fairness, but raises several computational challenges. It is no longer clear how to audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning, which means it is computationally hard in the worst case, even for simple structured subclasses. We then derive two algorithms that provably converge to the best fair classifier, given access to oracles which can solve the agnostic learning problem. The algorithms are based on a formulation of subgroup fairness as a two-player zero-sum game between a Learner and an Auditor. Our first algorithm provably converges in a polynomial number of steps. Our second algorithm enjoys only provably asymptotic convergence, but has the merit of simplicity and faster per-step computation. We implement the simpler algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets.
Tasks
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05144v5
PDF	http://arxiv.org/pdf/1711.05144v5.pdf
PWC	https://paperswithcode.com/paper/preventing-fairness-gerrymandering-auditing
Repo	https://github.com/SaeedSharifiMa/FairDP
Framework	none

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation


Title	DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
Authors	Zili Yi, Hao Zhang, Ping Tan, Minglun Gong
Abstract	Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators. Experiments on multiple image translation tasks with unlabeled data show considerable performance gain of DualGAN over a single GAN. For some tasks, DualGAN can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data.
Tasks	Image-to-Image Translation
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02510v4
PDF	http://arxiv.org/pdf/1704.02510v4.pdf
PWC	https://paperswithcode.com/paper/dualgan-unsupervised-dual-learning-for-image
Repo	https://github.com/lyj0823/GANs
Framework	none

Kernel Graph Convolutional Neural Networks


Title	Kernel Graph Convolutional Neural Networks
Authors	Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Abstract	Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.
Tasks	Graph Classification
Published	2017-10-29
URL	http://arxiv.org/abs/1710.10689v2
PDF	http://arxiv.org/pdf/1710.10689v2.pdf
PWC	https://paperswithcode.com/paper/kernel-graph-convolutional-neural-networks
Repo	https://github.com/giannisnik/cnn-graph-classification
Framework	pytorch

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms


Title	Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Authors	Taejun Kim, Jongpil Lee, Juhan Nam
Abstract	Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.
Tasks	Music Auto-Tagging
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10451v2
PDF	http://arxiv.org/pdf/1710.10451v2.pdf
PWC	https://paperswithcode.com/paper/sample-level-cnn-architectures-for-music-auto
Repo	https://github.com/Dohppak/Music_Genre_Classification
Framework	pytorch

Interactive Exploration and Discovery of Scientific Publications with PubVis


Title	Interactive Exploration and Discovery of Scientific Publications with PubVis
Authors	Franziska Horn
Abstract	With an exponentially growing number of scientific papers published each year, advanced tools for exploring and discovering publications of interest are becoming indispensable. To empower users beyond a simple keyword search provided e.g. by Google Scholar, we present the novel web application PubVis. Powered by a variety of machine learning techniques, it combines essential features to help researchers find the content most relevant to them. An interactive visualization of a large collection of scientific publications provides an overview of the field and encourages the user to explore articles beyond a narrow research focus. This is augmented by personalized content based article recommendations as well as an advanced full text search to discover relevant references. The open sourced implementation of the app can be easily set up and run locally on a desktop computer to provide access to content tailored to the specific needs of individual users. Additionally, a PubVis demo with access to a collection of 10,000 papers can be tested online.
Tasks
Published	2017-06-25
URL	https://arxiv.org/abs/1706.08094v1
PDF	https://arxiv.org/pdf/1706.08094v1.pdf
PWC	https://paperswithcode.com/paper/interactive-exploration-and-discovery-of
Repo	https://github.com/cod3licious/pubvis
Framework	none

Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation


Title	Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation
Authors	Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang
Abstract	Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02447v2
PDF	http://arxiv.org/pdf/1702.02447v2.pdf
PWC	https://paperswithcode.com/paper/region-ensemble-network-improving
Repo	https://github.com/guohengkai/region-ensemble-network
Framework	none

Device Placement Optimization with Reinforcement Learning


Title	Device Placement Optimization with Reinforcement Learning
Authors	Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean
Abstract	The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
Tasks	Language Modelling, Machine Translation
Published	2017-06-13
URL	http://arxiv.org/abs/1706.04972v2
PDF	http://arxiv.org/pdf/1706.04972v2.pdf
PWC	https://paperswithcode.com/paper/device-placement-optimization-with
Repo	https://github.com/indrajeet95/Device-Placement-Optimization-with-Reinforcement-Learning
Framework	tf

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity


Title	CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
Authors	Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab
Abstract	We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2017-04-05
URL	http://arxiv.org/abs/1704.01346v1
PDF	http://arxiv.org/pdf/1704.01346v1.pdf
PWC	https://paperswithcode.com/paper/compilig-at-semeval-2017-task-1-cross
Repo	https://github.com/SimengSun/CIS530-project
Framework	pytorch

Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO


Title	Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO
Authors	Luís F. Simões, Dario Izzo, Evert Haasdijk, A. E. Eiben
Abstract	The design of spacecraft trajectories for missions visiting multiple celestial bodies is here framed as a multi-objective bilevel optimization problem. A comparative study is performed to assess the performance of different Beam Search algorithms at tackling the combinatorial problem of finding the ideal sequence of bodies. Special focus is placed on the development of a new hybridization between Beam Search and the Population-based Ant Colony Optimization algorithm. An experimental evaluation shows all algorithms achieving exceptional performance on a hard benchmark problem. It is found that a properly tuned deterministic Beam Search always outperforms the remaining variants. Beam P-ACO, however, demonstrates lower parameter sensitivity, while offering superior worst-case performance. Being an anytime algorithm, it is then found to be the preferable choice for certain practical applications.
Tasks	bilevel optimization
Published	2017-04-03
URL	http://arxiv.org/abs/1704.00702v1
PDF	http://arxiv.org/pdf/1704.00702v1.pdf
PWC	https://paperswithcode.com/paper/multi-rendezvous-spacecraft-trajectory
Repo	https://github.com/lfsimoes/beam_paco__gtoc5
Framework	none

On the challenges of learning with inference networks on sparse, high-dimensional data


Title	On the challenges of learning with inference networks on sparse, high-dimensional data
Authors	Rahul G. Krishnan, Dawen Liang, Matthew Hoffman
Abstract	We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets – underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation.
Tasks
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06085v1
PDF	http://arxiv.org/pdf/1710.06085v1.pdf
PWC	https://paperswithcode.com/paper/on-the-challenges-of-learning-with-inference
Repo	https://github.com/rahulk90/vae_sparse
Framework	none

Deep Semantic Role Labeling with Self-Attention


Title	Deep Semantic Role Labeling with Self-Attention
Authors	Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, Xiaodong Shi
Abstract	Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.
Tasks	Semantic Role Labeling
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01586v1
PDF	http://arxiv.org/pdf/1712.01586v1.pdf
PWC	https://paperswithcode.com/paper/deep-semantic-role-labeling-with-self
Repo	https://github.com/XMUNLP/Tagger
Framework	tf

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies


Title	Dilated Convolutions for Modeling Long-Distance Genomic Dependencies
Authors	Ankit Gupta, Alexander M. Rush
Abstract	We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.
Tasks
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01278v1
PDF	http://arxiv.org/pdf/1710.01278v1.pdf
PWC	https://paperswithcode.com/paper/dilated-convolutions-for-modeling-long
Repo	https://github.com/harvardnlp/regulatory-prediction
Framework	tf