Paper Group AWR 163
PageNet: Page Boundary Extraction in Historical Handwritten Documents. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset. 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. DualGAN: Unsupervised Dual Learning for Image- …
PageNet: Page Boundary Extraction in Historical Handwritten Documents
Title | PageNet: Page Boundary Extraction in Historical Handwritten Documents |
Authors | Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett |
Abstract | When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents. |
Tasks | |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01618v1 |
http://arxiv.org/pdf/1709.01618v1.pdf | |
PWC | https://paperswithcode.com/paper/pagenet-page-boundary-extraction-in |
Repo | https://github.com/ctensmeyer/pagenet |
Framework | caffe2 |
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Title | STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset |
Authors | Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi |
Abstract | In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions. |
Tasks | Image Captioning, Machine Translation |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.00823v1 |
http://arxiv.org/pdf/1705.00823v1.pdf | |
PWC | https://paperswithcode.com/paper/stair-captions-constructing-a-large-scale |
Repo | https://github.com/STAIR-Lab-CIT/STAIR-captions |
Framework | none |
3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks
Title | 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks |
Authors | Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, Rui Wang |
Abstract | We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure. |
Tasks | 3D Reconstruction |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06375v3 |
http://arxiv.org/pdf/1707.06375v3.pdf | |
PWC | https://paperswithcode.com/paper/3d-shape-reconstruction-from-sketches-via |
Repo | https://github.com/aghinsa/SketchTo3D |
Framework | tf |
Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness
Title | Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness |
Authors | Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu |
Abstract | The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined groups, and then ask for parity of some statistic of the classifier across these groups. Constraints of this form are susceptible to intentional or inadvertent “fairness gerrymandering”, in which a classifier appears to be fair on each individual group, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness and recently proposed individual notions of fairness, but raises several computational challenges. It is no longer clear how to audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning, which means it is computationally hard in the worst case, even for simple structured subclasses. We then derive two algorithms that provably converge to the best fair classifier, given access to oracles which can solve the agnostic learning problem. The algorithms are based on a formulation of subgroup fairness as a two-player zero-sum game between a Learner and an Auditor. Our first algorithm provably converges in a polynomial number of steps. Our second algorithm enjoys only provably asymptotic convergence, but has the merit of simplicity and faster per-step computation. We implement the simpler algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets. |
Tasks | |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05144v5 |
http://arxiv.org/pdf/1711.05144v5.pdf | |
PWC | https://paperswithcode.com/paper/preventing-fairness-gerrymandering-auditing |
Repo | https://github.com/SaeedSharifiMa/FairDP |
Framework | none |
DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
Title | DualGAN: Unsupervised Dual Learning for Image-to-Image Translation |
Authors | Zili Yi, Hao Zhang, Ping Tan, Minglun Gong |
Abstract | Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators. Experiments on multiple image translation tasks with unlabeled data show considerable performance gain of DualGAN over a single GAN. For some tasks, DualGAN can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data. |
Tasks | Image-to-Image Translation |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.02510v4 |
http://arxiv.org/pdf/1704.02510v4.pdf | |
PWC | https://paperswithcode.com/paper/dualgan-unsupervised-dual-learning-for-image |
Repo | https://github.com/lyj0823/GANs |
Framework | none |
Kernel Graph Convolutional Neural Networks
Title | Kernel Graph Convolutional Neural Networks |
Authors | Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, Michalis Vazirgiannis |
Abstract | Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets. |
Tasks | Graph Classification |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10689v2 |
http://arxiv.org/pdf/1710.10689v2.pdf | |
PWC | https://paperswithcode.com/paper/kernel-graph-convolutional-neural-networks |
Repo | https://github.com/giannisnik/cnn-graph-classification |
Framework | pytorch |
Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Title | Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms |
Authors | Taejun Kim, Jongpil Lee, Juhan Nam |
Abstract | Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates. |
Tasks | Music Auto-Tagging |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10451v2 |
http://arxiv.org/pdf/1710.10451v2.pdf | |
PWC | https://paperswithcode.com/paper/sample-level-cnn-architectures-for-music-auto |
Repo | https://github.com/Dohppak/Music_Genre_Classification |
Framework | pytorch |
Interactive Exploration and Discovery of Scientific Publications with PubVis
Title | Interactive Exploration and Discovery of Scientific Publications with PubVis |
Authors | Franziska Horn |
Abstract | With an exponentially growing number of scientific papers published each year, advanced tools for exploring and discovering publications of interest are becoming indispensable. To empower users beyond a simple keyword search provided e.g. by Google Scholar, we present the novel web application PubVis. Powered by a variety of machine learning techniques, it combines essential features to help researchers find the content most relevant to them. An interactive visualization of a large collection of scientific publications provides an overview of the field and encourages the user to explore articles beyond a narrow research focus. This is augmented by personalized content based article recommendations as well as an advanced full text search to discover relevant references. The open sourced implementation of the app can be easily set up and run locally on a desktop computer to provide access to content tailored to the specific needs of individual users. Additionally, a PubVis demo with access to a collection of 10,000 papers can be tested online. |
Tasks | |
Published | 2017-06-25 |
URL | https://arxiv.org/abs/1706.08094v1 |
https://arxiv.org/pdf/1706.08094v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-exploration-and-discovery-of |
Repo | https://github.com/cod3licious/pubvis |
Framework | none |
Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation
Title | Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation |
Authors | Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang |
Abstract | Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets. |
Tasks | Hand Pose Estimation, Pose Estimation |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02447v2 |
http://arxiv.org/pdf/1702.02447v2.pdf | |
PWC | https://paperswithcode.com/paper/region-ensemble-network-improving |
Repo | https://github.com/guohengkai/region-ensemble-network |
Framework | none |
Device Placement Optimization with Reinforcement Learning
Title | Device Placement Optimization with Reinforcement Learning |
Authors | Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean |
Abstract | The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods. |
Tasks | Language Modelling, Machine Translation |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.04972v2 |
http://arxiv.org/pdf/1706.04972v2.pdf | |
PWC | https://paperswithcode.com/paper/device-placement-optimization-with |
Repo | https://github.com/indrajeet95/Device-Placement-Optimization-with-Reinforcement-Learning |
Framework | tf |
CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
Title | CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity |
Authors | Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab |
Abstract | We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01346v1 |
http://arxiv.org/pdf/1704.01346v1.pdf | |
PWC | https://paperswithcode.com/paper/compilig-at-semeval-2017-task-1-cross |
Repo | https://github.com/SimengSun/CIS530-project |
Framework | pytorch |
Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO
Title | Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO |
Authors | Luís F. Simões, Dario Izzo, Evert Haasdijk, A. E. Eiben |
Abstract | The design of spacecraft trajectories for missions visiting multiple celestial bodies is here framed as a multi-objective bilevel optimization problem. A comparative study is performed to assess the performance of different Beam Search algorithms at tackling the combinatorial problem of finding the ideal sequence of bodies. Special focus is placed on the development of a new hybridization between Beam Search and the Population-based Ant Colony Optimization algorithm. An experimental evaluation shows all algorithms achieving exceptional performance on a hard benchmark problem. It is found that a properly tuned deterministic Beam Search always outperforms the remaining variants. Beam P-ACO, however, demonstrates lower parameter sensitivity, while offering superior worst-case performance. Being an anytime algorithm, it is then found to be the preferable choice for certain practical applications. |
Tasks | bilevel optimization |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00702v1 |
http://arxiv.org/pdf/1704.00702v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-rendezvous-spacecraft-trajectory |
Repo | https://github.com/lfsimoes/beam_paco__gtoc5 |
Framework | none |
On the challenges of learning with inference networks on sparse, high-dimensional data
Title | On the challenges of learning with inference networks on sparse, high-dimensional data |
Authors | Rahul G. Krishnan, Dawen Liang, Matthew Hoffman |
Abstract | We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets – underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation. |
Tasks | |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06085v1 |
http://arxiv.org/pdf/1710.06085v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-challenges-of-learning-with-inference |
Repo | https://github.com/rahulk90/vae_sparse |
Framework | none |
Deep Semantic Role Labeling with Self-Attention
Title | Deep Semantic Role Labeling with Self-Attention |
Authors | Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, Xiaodong Shi |
Abstract | Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU. |
Tasks | Semantic Role Labeling |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01586v1 |
http://arxiv.org/pdf/1712.01586v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-semantic-role-labeling-with-self |
Repo | https://github.com/XMUNLP/Tagger |
Framework | tf |
Dilated Convolutions for Modeling Long-Distance Genomic Dependencies
Title | Dilated Convolutions for Modeling Long-Distance Genomic Dependencies |
Authors | Ankit Gupta, Alexander M. Rush |
Abstract | We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites. |
Tasks | |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01278v1 |
http://arxiv.org/pdf/1710.01278v1.pdf | |
PWC | https://paperswithcode.com/paper/dilated-convolutions-for-modeling-long |
Repo | https://github.com/harvardnlp/regulatory-prediction |
Framework | tf |