Paper Group ANR 1492
Multi-Scale Convolutions for Learning Context Aware Feature Representations. The Zero Resource Speech Challenge 2019: TTS without T. Distributionally Robust Language Modeling. Interpretable Automated Machine Learning in Maana(TM) Knowledge Platform. Cross-Domain Adaptation for Animal Pose Estimation. A Summarization System for Scientific Documents. …
Multi-Scale Convolutions for Learning Context Aware Feature Representations
Title | Multi-Scale Convolutions for Learning Context Aware Feature Representations |
Authors | Nikolai Ufer, Kam To Lui, Katja Schwarz, Paul Warkentin, Björn Ommer |
Abstract | Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised metric learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode implicitly more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to jointly predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching. |
Tasks | Metric Learning |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06978v1 |
https://arxiv.org/pdf/1906.06978v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-convolutions-for-learning-context |
Repo | |
Framework | |
The Zero Resource Speech Challenge 2019: TTS without T
Title | The Zero Resource Speech Challenge 2019: TTS without T |
Authors | Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux |
Abstract | We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker’s voice. We describe the metrics used for evaluation, a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 10 teams and discuss the main results. |
Tasks | |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11469v2 |
https://arxiv.org/pdf/1904.11469v2.pdf | |
PWC | https://paperswithcode.com/paper/the-zero-resource-speech-challenge-2019-tts |
Repo | |
Framework | |
Distributionally Robust Language Modeling
Title | Distributionally Robust Language Modeling |
Authors | Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang |
Abstract | Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews. |
Tasks | Language Modelling |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02060v1 |
https://arxiv.org/pdf/1909.02060v1.pdf | |
PWC | https://paperswithcode.com/paper/distributionally-robust-language-modeling |
Repo | |
Framework | |
Interpretable Automated Machine Learning in Maana(TM) Knowledge Platform
Title | Interpretable Automated Machine Learning in Maana(TM) Knowledge Platform |
Authors | Alexander Elkholy, Fangkai Yang, Steven Gustafson |
Abstract | Machine learning is becoming an essential part of developing solutions for many industrial applications, but the lack of interpretability hinders wide industry adoption to rapidly build, test, deploy and validate machine learning models, in the sense that the insight of developing machine learning solutions are not structurally encoded, justified and transferred. In this paper we describe Maana Meta-learning Service, an interpretable and interactive automated machine learning service residing in Maana Knowledge Platform that performs machine-guided, user assisted pipeline search and hyper-parameter tuning and generates structured knowledge about decisions for pipeline profiling and selection. The service is shipped with Maana Knowledge Platform and is validated using benchmark dataset. Furthermore, its capability of deriving knowledge from pipeline search facilitates various inference tasks and transferring to similar data science projects. |
Tasks | Meta-Learning |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02168v1 |
https://arxiv.org/pdf/1905.02168v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-automated-machine-learning-in |
Repo | |
Framework | |
Cross-Domain Adaptation for Animal Pose Estimation
Title | Cross-Domain Adaptation for Animal Pose Estimation |
Authors | Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai |
Abstract | In this paper, we are interested in pose estimation of animals. Animals usually exhibit a wide range of variations on poses and there is no available animal pose dataset for training and testing. To address this problem, we build an animal pose dataset to facilitate training and evaluation. Considering the heavy labor needed to label dataset and it is impossible to label data for all concerned animal species, we, therefore, proposed a novel cross-domain adaptation method to transform the animal pose knowledge from labeled animal classes to unlabeled animal classes. We use the modest animal pose dataset to adapt learned knowledge to multiple animals species. Moreover, humans also share skeleton similarities with some animals (especially four-footed mammals). Therefore, the easily available human pose dataset, which is of a much larger scale than our labeled animal dataset, provides important prior knowledge to boost up the performance on animal pose estimation. Experiments show that our proposed method leverages these pieces of prior knowledge well and achieves convincing results on animal pose estimation. |
Tasks | Animal Pose Estimation, Domain Adaptation, Pose Estimation |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05806v2 |
https://arxiv.org/pdf/1908.05806v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-adaptation-for-animal-pose |
Repo | |
Framework | |
A Summarization System for Scientific Documents
Title | A Summarization System for Scientific Documents |
Authors | Shai Erera, Michal Shmueli-Scheuer, Guy Feigenblat, Ora Peled Nakash, Odellia Boni, Haggai Roitman, Doron Cohen, Bar Weiner, Yosi Mass, Or Rivlin, Guy Lev, Achiya Jerbi, Jonathan Herzig, Yufang Hou, Charles Jochim, Martin Gleize, Francesca Bonin, David Konopnicki |
Abstract | We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts. |
Tasks | |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11152v1 |
https://arxiv.org/pdf/1908.11152v1.pdf | |
PWC | https://paperswithcode.com/paper/a-summarization-system-for-scientific |
Repo | |
Framework | |
Replication of the Keyword Extraction part of the paper “‘Without the Clutter of Unimportant Words’: Descriptive Keyphrases for Text Visualization”
Title | Replication of the Keyword Extraction part of the paper “‘Without the Clutter of Unimportant Words’: Descriptive Keyphrases for Text Visualization” |
Authors | Shibamouli Lahiri |
Abstract | “Keyword Extraction” refers to the task of automatically identifying the most relevant and informative phrases in natural language text. As we are deluged with large amounts of text data in many different forms and content - emails, blogs, tweets, Facebook posts, academic papers, news articles - the task of “making sense” of all this text by somehow summarizing them into a coherent structure assumes paramount importance. Keyword extraction - a well-established problem in Natural Language Processing - can help us here. In this report, we construct and test three different hypotheses (all related to the task of keyword extraction) that take us one step closer to understanding how to meaningfully identify and extract “descriptive” keyphrases. The work reported here was done as part of replicating the study by Chuang et al. [3]. |
Tasks | Keyword Extraction |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.07818v1 |
https://arxiv.org/pdf/1908.07818v1.pdf | |
PWC | https://paperswithcode.com/paper/190807818 |
Repo | |
Framework | |
Development of Mobile-Interfaced Machine Learning-Based Predictive Models for Improving Students Performance in Programming Courses
Title | Development of Mobile-Interfaced Machine Learning-Based Predictive Models for Improving Students Performance in Programming Courses |
Authors | Temitayo Matthew Fagbola, Ibrahim Adepoju Adeyanju, Olatayo Olaniyan, Adebimpe Esan, Bolaji Omodunbi, Ayodele Oloyede, Funmilola Egbetola |
Abstract | Student performance modelling (SPM) is a critical step to assessing and improving students performances in their learning discourse. However, most existing SPM are based on statistical approaches, which on one hand are based on probability, depicting that results are based on estimation; and on the other hand, actual influences of hidden factors that are peculiar to students, lecturers, learning environment and the family, together with their overall effect on student performance have not been exhaustively investigated. In this paper, Student Performance Models (SPM) for improving students performance in programming courses were developed using M5P Decision Tree (MDT) and Linear Regression Classifier (LRC). The data used was gathered using a structured questionnaire from 295 students in 200 and 300 levels of study who offered Web programming, C or JAVA at Federal University, Oye-Ekiti, Nigeria between 2012 and 2016. Hidden factors that are significant to students performance in programming were identified. The relevant data gathered, normalized, coded and prepared as variable and factor datasets, and fed into the MDT algorithm and LRC to develop the predictive models. The evaluation results obtained indicate that the variable-based LRC produced the best model in terms of MAE, RMSE, RAE and the RRSE having yielded the least values in all the evaluations conducted. Further results obtained established the strong significance of attitude of students and lecturers, fearful perception of students, erratic power supply, university facilities, student health and students attendance to the performance of students in programming courses. The variable-based LRC model presented in this paper could provide baseline information about students performance thereby offering better decision making towards improving teaching/learning outcomes in programming courses. |
Tasks | Decision Making |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.06252v1 |
http://arxiv.org/pdf/1901.06252v1.pdf | |
PWC | https://paperswithcode.com/paper/development-of-mobile-interfaced-machine |
Repo | |
Framework | |
A Top-down Approach to Articulated Human Pose Estimation and Tracking
Title | A Top-down Approach to Articulated Human Pose Estimation and Tracking |
Authors | Guanghan Ning, Ping Liu, Xiaochuan Fan, Chi Zhang |
Abstract | Both the tasks of multi-person human pose estimation and pose tracking in videos are quite challenging. Existing methods can be categorized into two groups: top-down and bottom-up approaches. In this paper, following the top-down approach, we aim to build a strong baseline system with three modules: human candidate detector, single-person pose estimator and human pose tracker. Firstly, we choose a generic object detector among state-of-the-art methods to detect human candidates. Then, the cascaded pyramid network is used to estimate the corresponding human pose. Finally, we use a flow-based pose tracker to render keypoint-association across frames, i.e., assigning each human candidate a unique and temporally-consistent id, for the multi-target pose tracking purpose. We conduct extensive ablative experiments to validate various choices of models and configurations. We take part in two ECCV 18 PoseTrack challenges: pose estimation and pose tracking. |
Tasks | Pose Estimation, Pose Tracking |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07680v1 |
http://arxiv.org/pdf/1901.07680v1.pdf | |
PWC | https://paperswithcode.com/paper/a-top-down-approach-to-articulated-human-pose |
Repo | |
Framework | |
Pairwise Teacher-Student Network for Semi-Supervised Hashing
Title | Pairwise Teacher-Student Network for Semi-Supervised Hashing |
Authors | Shifeng Zhang, Jianmin Li, Bo Zhang |
Abstract | Hashing method maps similar high-dimensional data to binary hashcodes with smaller hamming distance, and it has received broad attention due to its low storage cost and fast retrieval speed. Pairwise similarity is easily obtained and widely used for retrieval, and most supervised hashing algorithms are carefully designed for the pairwise supervisions. As labeling all data pairs is difficult, semi-supervised hashing is proposed which aims at learning efficient codes with limited labeled pairs and abundant unlabeled ones. Existing methods build graphs to capture the structure of dataset, but they are not working well for complex data as the graph is built based on the data representations and determining the representations of complex data is difficult. In this paper, we propose a novel teacher-student semi-supervised hashing framework in which the student is trained with the pairwise information produced by the teacher network. The network follows the smoothness assumption, which achieves consistent distances for similar data pairs so that the retrieval results are similar for neighborhood queries. Experiments on large-scale datasets show that the proposed method reaches impressive gain over the supervised baselines and is superior to state-of-the-art semi-supervised hashing methods. |
Tasks | |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00643v1 |
http://arxiv.org/pdf/1902.00643v1.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-teacher-student-network-for-semi |
Repo | |
Framework | |
P-CapsNets: a General Form of Convolutional Neural Networks
Title | P-CapsNets: a General Form of Convolutional Neural Networks |
Authors | Zhenhua Chen, Xiwen Li, Chuhua Wang, David Crandall |
Abstract | We propose Pure CapsNets (P-CapsNets) which is a generation of normal CNNs structurally. Specifically, we make three modifications to current CapsNets. First, we remove routing procedures from CapsNets based on the observation that the coupling coefficients can be learned implicitly. Second, we replace the convolutional layers in CapsNets to improve efficiency. Third, we package the capsules into rank-3 tensors to further improve efficiency. The experiment shows that P-CapsNets achieve better performance than CapsNets with varied routing procedures by using significantly fewer parameters on MNIST&CIFAR10. The high efficiency of P-CapsNets is even comparable to some deep compressing models. For example, we achieve more than 99% percent accuracy on MNIST by using only 3888 parameters. We visualize the capsules as well as the corresponding correlation matrix to show a possible way of initializing CapsNets in the future. We also explore the adversarial robustness of P-CapsNets compared to CNNs. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08367v1 |
https://arxiv.org/pdf/1912.08367v1.pdf | |
PWC | https://paperswithcode.com/paper/p-capsnets-a-general-form-of-convolutional |
Repo | |
Framework | |
Joint Face Super-Resolution and Deblurring Using a Generative Adversarial Network
Title | Joint Face Super-Resolution and Deblurring Using a Generative Adversarial Network |
Authors | Jung Un Yun, In Kyu Park |
Abstract | Facial image super-resolution (SR) is an important preprocessing for facial image analysis, face recognition, and image-based 3D face reconstruction. Recent convolutional neural network (CNN) based method has shown excellent performance by learning mapping relation using pairs of low-resolution (LR) and high-resolution (HR) facial images. However, since the HR facial image reconstruction using CNN is conventionally aimed to increase the PSNR and SSIM metrics, the reconstructed HR image might not be realistic even with high scores. An adversarial framework is proposed in this study to reconstruct the HR facial image by simultaneously generating an HR image with and without blur. First, the spatial resolution of the LR facial image is increased by eight times using a five-layer CNN. Then, the encoder extracts the features of the up-scaled image. These features are finally sent to two branches (decoders) to generate an HR facial image with and without blur. In addition, local and global discriminators are combined to focus on the reconstruction of HR facial structures. Experiment results show that the proposed algorithm generates a realistic HR facial image. Furthermore, the proposed method can generate a variety of different facial images. |
Tasks | 3D Face Reconstruction, Deblurring, Face Recognition, Face Reconstruction, Image Reconstruction, Image Super-Resolution, Super-Resolution |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/1912.10427v1 |
https://arxiv.org/pdf/1912.10427v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-face-super-resolution-and-deblurring |
Repo | |
Framework | |
On the space-time expressivity of ResNets
Title | On the space-time expressivity of ResNets |
Authors | Johannes Müller |
Abstract | Residual networks (ResNets) are a deep learning architecture that substantially improved the state of the art performance in certain supervised learning tasks. Since then, they have received continuously growing attention. ResNets have a recursive structure $x_{k+1} = x_k + R_k(x_k)$ where $R_k$ is a neural network called a residual block. This structure can be seen as the Euler discretisation of an associated ordinary differential equation (ODE) which is called a neural ODE. Recently, ResNets were proposed as the space-time approximation of ODEs which are not of this neural type. To elaborate this connection we show that by increasing the number of residual blocks as well as their expressivity the solution of an arbitrary ODE can be approximated in space and time simultaneously by deep ReLU ResNets. Further, we derive estimates on the complexity of the residual blocks required to obtain a prescribed accuracy under certain regularity assumptions. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09599v4 |
https://arxiv.org/pdf/1910.09599v4.pdf | |
PWC | https://paperswithcode.com/paper/universal-flow-approximation-with-deep |
Repo | |
Framework | |
Attractive versus truncated repulsive supercooled liquids: The dynamics is encoded in the pair correlation function
Title | Attractive versus truncated repulsive supercooled liquids: The dynamics is encoded in the pair correlation function |
Authors | François P. Landes, Giulio Biroli, Olivier Dauchot, Andrea J. Liu, David R. Reichman |
Abstract | We compare glassy dynamics in two liquids that differ in the form of their interaction potentials. Both systems have the same repulsive interactions but one has also an attractive part in the potential. These two systems exhibit very different dynamics despite having nearly identical pair correlation functions. We demonstrate that a properly weighted integral of the pair correlation function, which amplifies the subtle differences between the two systems, correctly captures their dynamical differences. The weights are obtained from a standard machine learning algorithm. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01103v2 |
https://arxiv.org/pdf/1906.01103v2.pdf | |
PWC | https://paperswithcode.com/paper/attractive-vs-truncated-repulsive-supercooled |
Repo | |
Framework | |
PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud
Title | PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud |
Authors | Xin Kong, Guangyao Zhai, Baoquan Zhong, Yong Liu |
Abstract | In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage-1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage-2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network’s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw. |
Tasks | Autonomous Driving, Data Augmentation, Semantic Segmentation |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01643v1 |
https://arxiv.org/pdf/1909.01643v1.pdf | |
PWC | https://paperswithcode.com/paper/pass3d-precise-and-accelerated-semantic |
Repo | |
Framework | |