July 29, 2019

2820 words 14 mins read

Paper Group AWR 163

Paper Group AWR 163

PageNet: Page Boundary Extraction in Historical Handwritten Documents. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset. 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. DualGAN: Unsupervised Dual Learning for Image- …

PageNet: Page Boundary Extraction in Historical Handwritten Documents

Title PageNet: Page Boundary Extraction in Historical Handwritten Documents
Authors Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett
Abstract When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.
Tasks
Published 2017-09-05
URL http://arxiv.org/abs/1709.01618v1
PDF http://arxiv.org/pdf/1709.01618v1.pdf
PWC https://paperswithcode.com/paper/pagenet-page-boundary-extraction-in
Repo https://github.com/ctensmeyer/pagenet
Framework caffe2

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

Title STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Authors Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi
Abstract In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
Tasks Image Captioning, Machine Translation
Published 2017-05-02
URL http://arxiv.org/abs/1705.00823v1
PDF http://arxiv.org/pdf/1705.00823v1.pdf
PWC https://paperswithcode.com/paper/stair-captions-constructing-a-large-scale
Repo https://github.com/STAIR-Lab-CIT/STAIR-captions
Framework none

3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks

Title 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks
Authors Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, Rui Wang
Abstract We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure.
Tasks 3D Reconstruction
Published 2017-07-20
URL http://arxiv.org/abs/1707.06375v3
PDF http://arxiv.org/pdf/1707.06375v3.pdf
PWC https://paperswithcode.com/paper/3d-shape-reconstruction-from-sketches-via
Repo https://github.com/aghinsa/SketchTo3D
Framework tf

Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness

Title Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness
Authors Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu
Abstract The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined groups, and then ask for parity of some statistic of the classifier across these groups. Constraints of this form are susceptible to intentional or inadvertent “fairness gerrymandering”, in which a classifier appears to be fair on each individual group, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness and recently proposed individual notions of fairness, but raises several computational challenges. It is no longer clear how to audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning, which means it is computationally hard in the worst case, even for simple structured subclasses. We then derive two algorithms that provably converge to the best fair classifier, given access to oracles which can solve the agnostic learning problem. The algorithms are based on a formulation of subgroup fairness as a two-player zero-sum game between a Learner and an Auditor. Our first algorithm provably converges in a polynomial number of steps. Our second algorithm enjoys only provably asymptotic convergence, but has the merit of simplicity and faster per-step computation. We implement the simpler algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets.
Tasks
Published 2017-11-14
URL http://arxiv.org/abs/1711.05144v5
PDF http://arxiv.org/pdf/1711.05144v5.pdf
PWC https://paperswithcode.com/paper/preventing-fairness-gerrymandering-auditing
Repo https://github.com/SaeedSharifiMa/FairDP
Framework none

DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

Title DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
Authors Zili Yi, Hao Zhang, Ping Tan, Minglun Gong
Abstract Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators. Experiments on multiple image translation tasks with unlabeled data show considerable performance gain of DualGAN over a single GAN. For some tasks, DualGAN can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data.
Tasks Image-to-Image Translation
Published 2017-04-08
URL http://arxiv.org/abs/1704.02510v4
PDF http://arxiv.org/pdf/1704.02510v4.pdf
PWC https://paperswithcode.com/paper/dualgan-unsupervised-dual-learning-for-image
Repo https://github.com/lyj0823/GANs
Framework none

Kernel Graph Convolutional Neural Networks

Title Kernel Graph Convolutional Neural Networks
Authors Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Abstract Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.
Tasks Graph Classification
Published 2017-10-29
URL http://arxiv.org/abs/1710.10689v2
PDF http://arxiv.org/pdf/1710.10689v2.pdf
PWC https://paperswithcode.com/paper/kernel-graph-convolutional-neural-networks
Repo https://github.com/giannisnik/cnn-graph-classification
Framework pytorch

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

Title Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Authors Taejun Kim, Jongpil Lee, Juhan Nam
Abstract Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.
Tasks Music Auto-Tagging
Published 2017-10-28
URL http://arxiv.org/abs/1710.10451v2
PDF http://arxiv.org/pdf/1710.10451v2.pdf
PWC https://paperswithcode.com/paper/sample-level-cnn-architectures-for-music-auto
Repo https://github.com/Dohppak/Music_Genre_Classification
Framework pytorch

Interactive Exploration and Discovery of Scientific Publications with PubVis

Title Interactive Exploration and Discovery of Scientific Publications with PubVis
Authors Franziska Horn
Abstract With an exponentially growing number of scientific papers published each year, advanced tools for exploring and discovering publications of interest are becoming indispensable. To empower users beyond a simple keyword search provided e.g. by Google Scholar, we present the novel web application PubVis. Powered by a variety of machine learning techniques, it combines essential features to help researchers find the content most relevant to them. An interactive visualization of a large collection of scientific publications provides an overview of the field and encourages the user to explore articles beyond a narrow research focus. This is augmented by personalized content based article recommendations as well as an advanced full text search to discover relevant references. The open sourced implementation of the app can be easily set up and run locally on a desktop computer to provide access to content tailored to the specific needs of individual users. Additionally, a PubVis demo with access to a collection of 10,000 papers can be tested online.
Tasks
Published 2017-06-25
URL https://arxiv.org/abs/1706.08094v1
PDF https://arxiv.org/pdf/1706.08094v1.pdf
PWC https://paperswithcode.com/paper/interactive-exploration-and-discovery-of
Repo https://github.com/cod3licious/pubvis
Framework none

Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Title Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation
Authors Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang
Abstract Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets.
Tasks Hand Pose Estimation, Pose Estimation
Published 2017-02-08
URL http://arxiv.org/abs/1702.02447v2
PDF http://arxiv.org/pdf/1702.02447v2.pdf
PWC https://paperswithcode.com/paper/region-ensemble-network-improving
Repo https://github.com/guohengkai/region-ensemble-network
Framework none

Device Placement Optimization with Reinforcement Learning

Title Device Placement Optimization with Reinforcement Learning
Authors Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean
Abstract The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
Tasks Language Modelling, Machine Translation
Published 2017-06-13
URL http://arxiv.org/abs/1706.04972v2
PDF http://arxiv.org/pdf/1706.04972v2.pdf
PWC https://paperswithcode.com/paper/device-placement-optimization-with
Repo https://github.com/indrajeet95/Device-Placement-Optimization-with-Reinforcement-Learning
Framework tf

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

Title CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
Authors Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab
Abstract We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2017-04-05
URL http://arxiv.org/abs/1704.01346v1
PDF http://arxiv.org/pdf/1704.01346v1.pdf
PWC https://paperswithcode.com/paper/compilig-at-semeval-2017-task-1-cross
Repo https://github.com/SimengSun/CIS530-project
Framework pytorch

Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO

Title Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO
Authors Luís F. Simões, Dario Izzo, Evert Haasdijk, A. E. Eiben
Abstract The design of spacecraft trajectories for missions visiting multiple celestial bodies is here framed as a multi-objective bilevel optimization problem. A comparative study is performed to assess the performance of different Beam Search algorithms at tackling the combinatorial problem of finding the ideal sequence of bodies. Special focus is placed on the development of a new hybridization between Beam Search and the Population-based Ant Colony Optimization algorithm. An experimental evaluation shows all algorithms achieving exceptional performance on a hard benchmark problem. It is found that a properly tuned deterministic Beam Search always outperforms the remaining variants. Beam P-ACO, however, demonstrates lower parameter sensitivity, while offering superior worst-case performance. Being an anytime algorithm, it is then found to be the preferable choice for certain practical applications.
Tasks bilevel optimization
Published 2017-04-03
URL http://arxiv.org/abs/1704.00702v1
PDF http://arxiv.org/pdf/1704.00702v1.pdf
PWC https://paperswithcode.com/paper/multi-rendezvous-spacecraft-trajectory
Repo https://github.com/lfsimoes/beam_paco__gtoc5
Framework none

On the challenges of learning with inference networks on sparse, high-dimensional data

Title On the challenges of learning with inference networks on sparse, high-dimensional data
Authors Rahul G. Krishnan, Dawen Liang, Matthew Hoffman
Abstract We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets – underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation.
Tasks
Published 2017-10-17
URL http://arxiv.org/abs/1710.06085v1
PDF http://arxiv.org/pdf/1710.06085v1.pdf
PWC https://paperswithcode.com/paper/on-the-challenges-of-learning-with-inference
Repo https://github.com/rahulk90/vae_sparse
Framework none

Deep Semantic Role Labeling with Self-Attention

Title Deep Semantic Role Labeling with Self-Attention
Authors Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, Xiaodong Shi
Abstract Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.
Tasks Semantic Role Labeling
Published 2017-12-05
URL http://arxiv.org/abs/1712.01586v1
PDF http://arxiv.org/pdf/1712.01586v1.pdf
PWC https://paperswithcode.com/paper/deep-semantic-role-labeling-with-self
Repo https://github.com/XMUNLP/Tagger
Framework tf

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

Title Dilated Convolutions for Modeling Long-Distance Genomic Dependencies
Authors Ankit Gupta, Alexander M. Rush
Abstract We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.
Tasks
Published 2017-10-03
URL http://arxiv.org/abs/1710.01278v1
PDF http://arxiv.org/pdf/1710.01278v1.pdf
PWC https://paperswithcode.com/paper/dilated-convolutions-for-modeling-long
Repo https://github.com/harvardnlp/regulatory-prediction
Framework tf
comments powered by Disqus