October 19, 2019

2971 words 14 mins read

Paper Group ANR 240

On-demand Relational Concept Analysis. Depth Reconstruction of Translucent Objects from a Single Time-of-Flight Camera using Deep Residual Networks. Clustering - What Both Theoreticians and Practitioners are Doing Wrong. Sample Compression for Real-Valued Learners. Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case …

On-demand Relational Concept Analysis


Title	On-demand Relational Concept Analysis
Authors	Alexandre Bazin, Jessie Carbonnel, Marianne Huchard, Giacomo Kahn
Abstract	Formal Concept Analysis and its associated conceptual structures have been used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, for instance to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept and its neighbourhood in extended concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators. We illustrate it on an example.
Tasks
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07847v1
PDF	http://arxiv.org/pdf/1803.07847v1.pdf
PWC	https://paperswithcode.com/paper/on-demand-relational-concept-analysis
Repo
Framework

Depth Reconstruction of Translucent Objects from a Single Time-of-Flight Camera using Deep Residual Networks


Title	Depth Reconstruction of Translucent Objects from a Single Time-of-Flight Camera using Deep Residual Networks
Authors	Seongjong Song, Hyunjung Shim
Abstract	We propose a novel approach to recovering the translucent objects from a single time-of-flight (ToF) depth camera using deep residual networks. When recording the translucent objects using the ToF depth camera, their depth values are severely contaminated due to complex light interactions with the surrounding environment. While existing methods suggested new capture systems or developed the depth distortion models, their solutions were less practical because of strict assumptions or heavy computational complexity. In this paper, we adopt the deep residual networks for modeling the ToF depth distortion caused by translucency. To fully utilize both the local and semantic information of objects, multi-scale patches are used to predict the depth value. Based on the quantitative and qualitative evaluation on our benchmark database, we show the effectiveness and robustness of the proposed algorithm.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10917v1
PDF	http://arxiv.org/pdf/1809.10917v1.pdf
PWC	https://paperswithcode.com/paper/depth-reconstruction-of-translucent-objects
Repo
Framework

Clustering - What Both Theoreticians and Practitioners are Doing Wrong


Title	Clustering - What Both Theoreticians and Practitioners are Doing Wrong
Authors	Shai Ben-David
Abstract	Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computational tasks, for clustering, dif- ferent algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm, and their pa- rameters (like the number of clusters) may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool-selection for a given clustering task. Practitioners pick the algorithms they use without awareness to the implications of their choices and the vast majority of theory of clustering papers focus on providing savings to the resources needed to solve optimization problems that arise from picking some concrete clustering objective. Saving that pale in com- parison to the costs of mismatch between those objectives and the intended use of clustering results. I argue the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.
Tasks	Model Selection
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08838v1
PDF	http://arxiv.org/pdf/1805.08838v1.pdf
PWC	https://paperswithcode.com/paper/clustering-what-both-theoreticians-and
Repo
Framework

Sample Compression for Real-Valued Learners


Title	Sample Compression for Real-Valued Learners
Authors	Steve Hanneke, Aryeh Kontorovich, Menachem Sadigurschi
Abstract	We give an algorithmically efficient version of the learner-to-compression scheme conversion in Moran and Yehudayoff (2016). In extending this technique to real-valued hypotheses, we also obtain an efficient regression-to-bounded sample compression converter. To our knowledge, this is the first general compressed regression result (regardless of efficiency or boundedness) guaranteeing uniform approximate reconstruction. Along the way, we develop a generic procedure for constructing weak real-valued learners out of abstract regressors; this may be of independent interest. In particular, this result sheds new light on an open question of H. Simon (1997). We show applications to two regression problems: learning Lipschitz and bounded-variation functions.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08254v1
PDF	http://arxiv.org/pdf/1805.08254v1.pdf
PWC	https://paperswithcode.com/paper/sample-compression-for-real-valued-learners
Repo
Framework

Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation


Title	Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation
Authors	Johnny Tian-Zheng Wei, Khiem Pham, Brian Dillon, Brendan O’Connor
Abstract	Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar of English. From a French to English parallel corpus, we analyze the parseability and grammatical constructions occurring in output from a seq2seq translation model. Over 93% of the model translations are parseable, suggesting that it learns to generate conforming to a grammar. The model has trouble learning the distribution of rarer syntactic rules, and we pinpoint several constructions that differentiate translations between the references and our model.
Tasks	Machine Translation
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02035v1
PDF	http://arxiv.org/pdf/1809.02035v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-syntactic-properties-of-seq2seq
Repo
Framework

ATHENA: Automated Tuning of Genomic Error Correction Algorithms using Language Models


Title	ATHENA: Automated Tuning of Genomic Error Correction Algorithms using Language Models
Authors	Mustafa Abdallah, Ashraf Mahgoub, Saurabh Bagchi, Somali Chaterji
Abstract	The performance of most error-correction algorithms that operate on genomic sequencer reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of finding the best values of these configuration parameters to optimize error correction. We perform this in a data-driven manner, due to the observation that different configuration parameters are optimal for different datasets, i.e., from different instruments and organisms. We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. Through the use of N-Gram and Recurrent Neural Network (RNN) language modeling, we validate the intuition that the EC performance can be computed quantitatively and efficiently using the perplexity metric, prevalent in NLP. After training the language model, we show that the perplexity metric calculated for runtime data has a strong negative correlation with the correction of the erroneous NGS reads. Therefore, we use the perplexity metric to guide a hill climbing-based search, converging toward the best $k$-value. Our approach is suitable for both de novo and comparative sequencing (resequencing), eliminating the need for a reference genome to serve as the ground truth. This is important because the use of a reference genome often carries forward the biases along the stages of the pipeline.
Tasks	Language Modelling
Published	2018-12-30
URL	http://arxiv.org/abs/1812.11467v1
PDF	http://arxiv.org/pdf/1812.11467v1.pdf
PWC	https://paperswithcode.com/paper/athena-automated-tuning-of-genomic-error
Repo
Framework

Data-driven polynomial chaos expansion for machine learning regression


Title	Data-driven polynomial chaos expansion for machine learning regression
Authors	E. Torre, S. Marelli, P. Embrechts, B. Sudret
Abstract	We present a regression technique for data-driven problems based on polynomial chaos expansion (PCE). PCE is a popular technique in the field of uncertainty quantification (UQ), where it is typically used to replace a runnable but expensive computational model subject to random inputs with an inexpensive-to-evaluate polynomial function. The metamodel obtained enables a reliable estimation of the statistics of the output, provided that a suitable probabilistic model of the input is available. Machine learning (ML) regression is a research field that focuses on providing purely data-driven input-output maps, with the focus on pointwise prediction accuracy. We show that a PCE metamodel purely trained on data can yield pointwise predictions whose accuracy is comparable to that of other ML regression models, such as neural networks and support vector machines. The comparisons are performed on benchmark datasets available from the literature. The methodology also enables the quantification of the output uncertainties, and is robust to noise. Furthermore, it enjoys additional desirable properties, such as good performance for small training sets and simplicity of construction, with only little parameter tuning required.
Tasks
Published	2018-08-09
URL	http://arxiv.org/abs/1808.03216v2
PDF	http://arxiv.org/pdf/1808.03216v2.pdf
PWC	https://paperswithcode.com/paper/data-driven-polynomial-chaos-expansion-for
Repo
Framework

Process Discovery using Classification Tree Hidden Semi-Markov Model


Title	Process Discovery using Classification Tree Hidden Semi-Markov Model
Authors	Yihuang Kang, Vladimir Zadorozhny
Abstract	Various and ubiquitous information systems are being used in monitoring, exchanging, and collecting information. These systems are generating massive amount of event sequence logs that may help us understand underlying phenomenon. By analyzing these logs, we can learn process models that describe system procedures, predict the development of the system, or check whether the changes are expected. In this paper, we consider a novel technique that models these sequences of events in temporal-probabilistic manners. Specifically, we propose a probabilistic process model that combines hidden semi-Markov model and classification trees learning. Our experimental result shows that the proposed approach can answer a kind of question-“what are the most frequent sequence of system dynamics relevant to a given sequence of observable events?". For example, “Given a series of medical treatments, what are the most relevant patients’ health condition pattern changes at different times?”
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04415v1
PDF	http://arxiv.org/pdf/1807.04415v1.pdf
PWC	https://paperswithcode.com/paper/process-discovery-using-classification-tree
Repo
Framework

SlimNets: An Exploration of Deep Model Compression and Acceleration


Title	SlimNets: An Exploration of Deep Model Compression and Acceleration
Authors	Ini Oguntola, Subby Olubeko, Christopher Sweeney
Abstract	Deep neural networks have achieved increasingly accurate results on a wide variety of complex tasks. However, much of this improvement is due to the growing use and availability of computational resources (e.g use of GPUs, more layers, more parameters, etc). Most state-of-the-art deep networks, despite performing well, over-parameterize approximate functions and take a significant amount of time to train. With increased focus on deploying deep neural networks on resource constrained devices like smart phones, there has been a push to evaluate why these models are so resource hungry and how they can be made more efficient. This work evaluates and compares three distinct methods for deep model compression and acceleration: weight pruning, low rank factorization, and knowledge distillation. Comparisons on VGG nets trained on CIFAR10 show that each of the models on their own are effective, but that the true power lies in combining them. We show that by combining pruning and knowledge distillation methods we can create a compressed network 85 times smaller than the original, all while retaining 96% of the original model’s accuracy.
Tasks	Model Compression
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00496v1
PDF	http://arxiv.org/pdf/1808.00496v1.pdf
PWC	https://paperswithcode.com/paper/slimnets-an-exploration-of-deep-model
Repo
Framework

Best of many worlds: Robust model selection for online supervised learning


Title	Best of many worlds: Robust model selection for online supervised learning
Authors	Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter L. Bartlett
Abstract	We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings. We show that by incorporating a probabilistic framework of structural risk minimization into existing adaptive algorithms, we can robustly learn not only the presence of stochastic structure when it exists (leading to constant as opposed to $\mathcal{O}(\sqrt{T})$ regret), but also the correct model order. We thus obtain regret bounds that are competitive with the regret of an optimal algorithm that possesses strong side information about both the complexity of the optimal contextual tree expert and whether the process generating the data is stochastic or adversarial. These are the first constructive guarantees on simultaneous adaptivity to the model and the presence of stochasticity.
Tasks	Model Selection
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08562v1
PDF	http://arxiv.org/pdf/1805.08562v1.pdf
PWC	https://paperswithcode.com/paper/best-of-many-worlds-robust-model-selection
Repo
Framework

Understanding and Predicting the Memorability of Outdoor Natural Scenes


Title	Understanding and Predicting the Memorability of Outdoor Natural Scenes
Authors	Jiaxin Lu, Mai Xu, Ren Yang, Zulin Wang
Abstract	Memorability measures how easily an image is to be memorized after glancing, which may contribute to designing magazine covers, tourism publicity materials, and so forth. Recent works have shed light on the visual features that make generic images, object images or face photographs memorable. However, these methods are not able to effectively predict the memorability of outdoor natural scene images. To overcome this shortcoming of previous works, in this paper, we provide an attempt to answer: “what exactly makes outdoor natural scenes memorable”. To this end, we first establish a large-scale outdoor natural scene image memorability (LNSIM) database, containing 2,632 outdoor natural scene images with their ground truth memorability scores and the multi-label scene category annotations. Then, similar to previous works, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of outdoor natural scenes. In particular, we find that the high-level feature of scene category is rather correlated with outdoor natural scene memorability, and the deep features learnt by deep neural network (DNN) are also effective in predicting the memorability scores. Moreover, combining the deep features with the category feature can further boost the performance of memorability prediction. Therefore, we propose an end-to-end DNN based outdoor natural scene memorability (DeepNSM) predictor, which takes advantage of the learned category-related features. Then, the experimental results validate the effectiveness of our DeepNSM model, exceeding the state-of-the-art methods. Finally, we try to understand the reason of the good performance for our DeepNSM model, and also study the cases that our DeepNSM model succeeds or fails to accurately predict the memorability of outdoor natural scenes.
Tasks
Published	2018-10-09
URL	https://arxiv.org/abs/1810.06679v5
PDF	https://arxiv.org/pdf/1810.06679v5.pdf
PWC	https://paperswithcode.com/paper/understanding-and-predicting-the-memorability
Repo
Framework

Crowdsourcing for Reminiscence Chatbot Design


Title	Crowdsourcing for Reminiscence Chatbot Design
Authors	Svetlana Nikitina, Florian Daniel, Marcos Baez, Fabio Casati
Abstract	In this work-in-progress paper we discuss the challenges in identifying effective and scalable crowd-based strategies for designing content, conversation logic, and meaningful metrics for a reminiscence chatbot targeted at older adults. We formalize the problem and outline the main research questions that drive the research agenda in chatbot design for reminiscence and for relational agents for older adults in general.
Tasks	Chatbot
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12346v1
PDF	http://arxiv.org/pdf/1805.12346v1.pdf
PWC	https://paperswithcode.com/paper/crowdsourcing-for-reminiscence-chatbot-design
Repo
Framework

OpenEDGAR: Open Source Software for SEC EDGAR Analysis


Title	OpenEDGAR: Open Source Software for SEC EDGAR Analysis
Authors	Michael J Bommarito II, Daniel Martin Katz, Eric M Detterman
Abstract	OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License at https://github.com/LexPredict/openedgar.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04973v1
PDF	http://arxiv.org/pdf/1806.04973v1.pdf
PWC	https://paperswithcode.com/paper/openedgar-open-source-software-for-sec-edgar
Repo
Framework

SGAD: Soft-Guided Adaptively-Dropped Neural Network


Title	SGAD: Soft-Guided Adaptively-Dropped Neural Network
Authors	Zhisheng Wang, Fangxuan Sun, Jun Lin, Zhongfeng Wang, Bo Yuan
Abstract	Deep neural networks (DNNs) have been proven to have many redundancies. Hence, many efforts have been made to compress DNNs. However, the existing model compression methods treat all the input samples equally while ignoring the fact that the difficulties of various input samples being correctly classified are different. To address this problem, DNNs with adaptive dropping mechanism are well explored in this work. To inform the DNNs how difficult the input samples can be classified, a guideline that contains the information of input samples is introduced to improve the performance. Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper. Compared with the 32 layers residual neural networks, the presented SGAD can reduce the FLOPs by 77% with less than 1% drop in accuracy on CIFAR-10.
Tasks	Model Compression
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01430v1
PDF	http://arxiv.org/pdf/1807.01430v1.pdf
PWC	https://paperswithcode.com/paper/sgad-soft-guided-adaptively-dropped-neural
Repo
Framework

Hunting the Ethereum Smart Contract: Color-inspired Inspection of Potential Attacks


Title	Hunting the Ethereum Smart Contract: Color-inspired Inspection of Potential Attacks
Authors	TonTon Hsien-De Huang
Abstract	Blockchain and Cryptocurrencies are gaining unprecedented popularity and understanding. Meanwhile, Ethereum is gaining a significant popularity in the blockchain community, mainly due to the fact that it is designed in a way that enables developers to write smart contract and decentralized applications (Dapps). This new paradigm of applications opens the door to many possibilities and opportunities. However, the security of Ethereum smart contracts has not received much attention; several Ethereum smart contracts malfunctioning have recently been reported. Unlike many previous works that have applied static and dynamic analyses to find bugs in smart contracts, we do not attempt to define and extract any features; instead we focus on reducing the expert’s labor costs. We first present a new in-depth analysis of potential attacks methodology and then translate the bytecode of solidity into RGB color code. After that, we transform them to a fixed-sized encoded image. Finally, the encoded image is fed to convolutional neural network (CNN) for automatic feature extraction and learning, detecting compiler bugs of Ethereum smart contract.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01868v1
PDF	http://arxiv.org/pdf/1807.01868v1.pdf
PWC	https://paperswithcode.com/paper/hunting-the-ethereum-smart-contract-color
Repo
Framework