Paper Group ANR 201
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle. Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds. Deep learning-based development of personalized human head model with non-uniform conductivity for brain stimulation. Greedy AutoAugment. On Solving Minimax Optimization Locally: A Follow-the-Ri …
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Title | Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle |
Authors | Michael Kaufmann, Kornilios Kourtis, Celestine Mendler-Dünner, Adrian Schüpbach, Thomas Parnell |
Abstract | Distributed machine learning training is one of the most common and important workloads running on data centers today, but it is rarely executed alone. Instead, to reduce costs, computing resources are consolidated and shared by different applications. In this scenario, elasticity and proper load balancing are vital to maximize efficiency, fairness, and utilization. Currently, most distributed training frameworks do not support the aforementioned properties. A few exceptions that do support elasticity, imitate generic distributed frameworks and use micro-tasks. In this paper we illustrate that micro-tasks are problematic for machine learning applications, because they require a high degree of parallelism which hinders the convergence of distributed training at a pure algorithmic level (i.e., ignoring overheads and scalability limitations). To address this, we propose Chicle, a new elastic distributed training framework which exploits the nature of machine learning algorithms to implement elasticity and load balancing without micro-tasks. We use Chicle to train deep neural network as well as generalized linear models, and show that Chicle achieves performance competitive with state of the art rigid frameworks, while efficiently enabling elastic execution and dynamic load balancing. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04885v1 |
https://arxiv.org/pdf/1909.04885v1.pdf | |
PWC | https://paperswithcode.com/paper/addressing-algorithmic-bottlenecks-in-elastic |
Repo | |
Framework | |
Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds
Title | Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds |
Authors | Xiaocheng Li, Yinyu Ye |
Abstract | We study an online linear programming (OLP) problem under a random input model in which the columns of the constraint matrix along with the corresponding coefficients in the objective function are generated i.i.d. from an unknown distribution and revealed sequentially over time. Virtually all current online algorithms were based on learning the dual optimal solutions/prices of the linear programs (LP), and their analyses were focused on the aggregate objective value and solving the packing LP where all coefficients in the constraint matrix and objective are nonnegative. However, two major open questions are: (i) Does the set of LP optimal dual prices in OLP converge to those of the “offline” LP, and (ii) Could the results be extended to general LP problems where the coefficients can be either positive or negative. We resolve these two questions by establishing convergence results for the dual prices under moderate regularity conditions for general LP problems. Then we propose a new type of OLP algorithm, Action-History-Dependent Learning Algorithm, which improves the previous algorithm performances by taking into account the past input data as well as and decisions/actions already made. We derive an $O(\log n \log \log n)$ regret bound for the proposed algorithm, against the $O(\sqrt{n})$ bound for typical dual-price learning algorithms, and show that no dual-based thresholding algorithm achieves a worst-case regret smaller than $O(\log n)$, where n is the number of decision variables. Numerical experiments demonstrate the superior performance of the proposed algorithms and the effectiveness of our action-history-dependent design. Our results also indicate that, for solving online optimization problems with constraints, it’s better to utilize a non-stationary policy rather than the stationary one. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05499v2 |
https://arxiv.org/pdf/1909.05499v2.pdf | |
PWC | https://paperswithcode.com/paper/online-linear-programming-dual-convergence |
Repo | |
Framework | |
Deep learning-based development of personalized human head model with non-uniform conductivity for brain stimulation
Title | Deep learning-based development of personalized human head model with non-uniform conductivity for brain stimulation |
Authors | Essam A. Rashed, Jose Gomez-Tames, Akimasa Hirata |
Abstract | Electromagnetic stimulation of the human brain is a key tool for the neurophysiological characterization and diagnosis of several neurological disorders. Transcranial magnetic stimulation (TMS) is one procedure that is commonly used clinically. However, personalized TMS requires a pipeline for accurate head model generation to provide target-specific stimulation. This process includes intensive segmentation of several head tissues based on magnetic resonance imaging (MRI), which has significant potential for segmentation error, especially for low-contrast tissues. Additionally, a uniform electrical conductivity is assigned to each tissue in the model, which is an unrealistic assumption based on conventional volume conductor modeling. This paper proposes a novel approach to the automatic estimation of electric conductivity in the human head for volume conductor models without anatomical segmentation. A convolutional neural network is designed to estimate personalized electrical conductivity values based on anatomical information obtained from T1- and T2-weighted MRI scans. This approach can avoid the time-consuming process of tissue segmentation and maximize the advantages of position-dependent conductivity assignment based on water content values estimated from MRI intensity values. The computational results of the proposed approach provide similar but smoother electric field results for the brain when compared to conventional approaches. |
Tasks | |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.02420v2 |
https://arxiv.org/pdf/1910.02420v2.pdf | |
PWC | https://paperswithcode.com/paper/non-uniform-conductivity-estimation-for |
Repo | |
Framework | |
Greedy AutoAugment
Title | Greedy AutoAugment |
Authors | Alireza Naghizadeh, Mohammadsajad Abavisani, Dimitris N. Metaxas |
Abstract | A major problem in data augmentation is the number of possibilities in the search space of operations. The search space includes mixtures of all of the possible data augmentation techniques, the magnitude of these operations, and the probability of applying data augmentation for each image. In this paper, we propose Greedy AutoAugment as a highly efficient searching algorithm to find the best augmentation policies. We combine the searching process with a simple procedure to increase the size of training data. Our experiments show that the proposed method can be used as a reliable addition to the ANN infrastructures for increasing the accuracy of classification results. |
Tasks | Data Augmentation |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00704v1 |
https://arxiv.org/pdf/1908.00704v1.pdf | |
PWC | https://paperswithcode.com/paper/greedy-autoaugment |
Repo | |
Framework | |
On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach
Title | On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach |
Authors | Yuanhao Wang, Guodong Zhang, Jimmy Ba |
Abstract | Many tasks in modern machine learning can be formulated as finding equilibria in \emph{sequential} games. In particular, two-player zero-sum sequential games, also known as minimax optimization, have received growing interest. It is tempting to apply gradient descent to solve minimax optimization given its popularity and success in supervised learning. However, it has been noted that naive application of gradient descent fails to find some local minimax and can converge to non-local-minimax points. In this paper, we propose \emph{Follow-the-Ridge} (FR), a novel algorithm that provably converges to and only converges to local minimax. We show theoretically that the algorithm addresses the notorious rotational behaviour of gradient dynamics, and is compatible with preconditioning and \emph{positive} momentum. Empirically, FR solves toy minimax problems and improves the convergence of GAN training compared to the recent minimax optimization algorithms. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07512v2 |
https://arxiv.org/pdf/1910.07512v2.pdf | |
PWC | https://paperswithcode.com/paper/on-solving-minimax-optimization-locally-a-1 |
Repo | |
Framework | |
RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion
Title | RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion |
Authors | Jie Li, Yu Liu, Dong Gong, Qinfeng Shi, Xia Yuan, Chunxia Zhao, Ian Reid |
Abstract | RGB images differentiate from depth images as they carry more details about the color and texture information, which can be utilized as a vital complementary to depth for boosting the performance of 3D semantic scene completion (SSC). SSC is composed of 3D shape completion (SC) and semantic scene labeling while most of the existing methods use depth as the sole input which causes the performance bottleneck. Moreover, the state-of-the-art methods employ 3D CNNs which have cumbersome networks and tremendous parameters. We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks. The novel factorized convolution layer is effective for reducing the network parameters, and the proposed multi-scale fusion mechanism for depth and color image can improve the completion and segmentation accuracy simultaneously. Our method demonstrates excellent performance on two public datasets. Compared with the latest method SSCNet, we achieve 5.9% gains in SC-IoU and 5.7% gains in SSC-IOU, albeit with only 21% network parameters and 16.6% FLOPs employed compared with that of SSCNet. |
Tasks | Scene Labeling |
Published | 2019-03-02 |
URL | http://arxiv.org/abs/1903.00620v2 |
http://arxiv.org/pdf/1903.00620v2.pdf | |
PWC | https://paperswithcode.com/paper/rgbd-based-dimensional-decomposition-residual |
Repo | |
Framework | |
CoAPI: An Efficient Two-Phase Algorithm Using Core-Guided Over-Approximate Cover for Prime Compilation of Non-Clausal Formulae
Title | CoAPI: An Efficient Two-Phase Algorithm Using Core-Guided Over-Approximate Cover for Prime Compilation of Non-Clausal Formulae |
Authors | Weilin Luo, Hai Wan, Hongzhen Zhong, Ou Wei |
Abstract | Prime compilation, i.e., the generation of all prime implicates or implicants (primes for short) of formulae, is a prominent fundamental issue for AI. Recently, the prime compilation for non-clausal formulae has received great attention. The state-of-the-art approaches generate all primes along with a prime cover constructed by prime implicates using dual rail encoding. However, the dual rail encoding potentially expands search space. In addition, constructing a prime cover, which is necessary for their methods, is time-consuming. To address these issues, we propose a novel two-phase method – CoAPI. The two phases are the key to construct a cover without using dual rail encoding. Specifically, given a non-clausal formula, we first propose a core-guided method to rewrite the non-clausal formula into a cover constructed by over-approximate implicates in the first phase. Then, we generate all the primes based on the cover in the second phase. In order to reduce the size of the cover, we provide a multi-order based shrinking method, with a good tradeoff between the small size and efficiency, to compress the size of cover considerably. The experimental results show that CoAPI outperforms state-of-the-art approaches. Particularly, for generating all prime implicates, CoAPI consumes about one order of magnitude less time. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03085v1 |
https://arxiv.org/pdf/1906.03085v1.pdf | |
PWC | https://paperswithcode.com/paper/coapi-an-efficient-two-phase-algorithm-using |
Repo | |
Framework | |
Penalized k-means algorithms for finding the correct number of clusters in a dataset
Title | Penalized k-means algorithms for finding the correct number of clusters in a dataset |
Authors | Behzad Kamgar-Parsi, Behrooz Kamgar-Parsi |
Abstract | In many applications we want to find the number of clusters in a dataset. A common approach is to use the penalized k-means algorithm with an additive penalty term linear in the number of clusters. An open problem is estimating the value of the coefficient of the penalty term. Since estimating the value of the coefficient in a principled manner appears to be intractable for general clusters, we investigate “ideal clusters”, i.e. identical spherical clusters with no overlaps and no outlier background noise. In this paper: (a) We derive, for the case of ideal clusters, rigorous bounds for the coefficient of the additive penalty. Unsurprisingly, the bounds depend on the correct number of clusters, which we want to find in the first place. We further show that additive penalty, even for this simplest case of ideal clusters, typically produces a weak and often ambiguous signature for the correct number of clusters. (b) As an alternative, we examine the k-means with multiplicative penalty, and show that this parameter-free formulation has a stronger, and less often ambiguous, signature for the correct number of clusters. We also empirically investigate certain types of deviations from ideal cluster assumption and show that combination of k-means with additive and multiplicative penalties can resolve ambiguous solutions. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06741v1 |
https://arxiv.org/pdf/1911.06741v1.pdf | |
PWC | https://paperswithcode.com/paper/penalized-k-means-algorithms-for-finding-the |
Repo | |
Framework | |
Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel
Title | Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel |
Authors | Ye Zhu, Kai Ming Ting |
Abstract | We identify a fundamental issue in the popular Stochastic Neighbour Embedding (SNE and t-SNE), i.e., the “learned” similarity of any two points in high-dimensional space is not defined and cannot be computed. It underlines two previously unexplored issues in the algorithm which have undermined the quality of its final visualisation output and its ability to process large datasets. The issues are:(a) the reference probability in high-dimensional space is set based on entropy which has undefined relation with local density; and (b) the use of data independent kernel which leads to the need to determine n bandwidths for a dataset of n points. This paper establishes a principle to set the reference probability via a data-dependent kernel which has a well-defined kernel characteristic that linked directly to local density. A solution based on a recent data-dependent kernel called Isolation Kernel addresses the fundamental issue as well as its two ensuing issues. As a result, it significantly improves the quality of the final visualisation output and removes one obstacle that prevents t-SNE from processing large datasets. The solution is extremely simple, i.e., simply replacing the existing data independent kernel with Isolation Kernel, leaving the rest of the t-SNE procedure unchanged. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09744v2 |
https://arxiv.org/pdf/1906.09744v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-stochastic-neighbour-embedding |
Repo | |
Framework | |
Characterising Third Party Cookie Usage in the EU after GDPR
Title | Characterising Third Party Cookie Usage in the EU after GDPR |
Authors | Xuehui Hu, Nishanth Sastry |
Abstract | The recently introduced General Data Protection Regulation (GDPR) requires that when obtaining information online that could be used to identify individuals, their consents must be obtained. Among other things, this affects many common forms of cookies, and users in the EU have been presented with notices asking their approvals for data collection. This paper examines the prevalence of third party cookies before and after GDPR by using two datasets: accesses to top 500 websites according to Alexa.com, and weekly data of cookies placed in users’ browsers by websites accessed by 16 UK and China users across one year. We find that on average the number of third parties dropped by more than 10% after GDPR, but when we examine real users’ browsing histories over a year, we find that there is no material reduction in long-term numbers of third party cookies, suggesting that users are not making use of the choices offered by GDPR for increased privacy. Also, among websites which offer users a choice in whether and how they are tracked, accepting the default choices typically ends up storing more cookies on average than on websites which provide a notice of cookies stored but without giving users a choice of which cookies, or those that do not provide a cookie notice at all. We also find that top non-EU websites have fewer cookie notices, suggesting higher levels of tracking when visiting international sites. Our findings have deep implications both for understanding compliance with GDPR as well as understanding the evolution of tracking on the web. |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01267v1 |
https://arxiv.org/pdf/1905.01267v1.pdf | |
PWC | https://paperswithcode.com/paper/characterising-third-party-cookie-usage-in |
Repo | |
Framework | |
Dual-Attention Graph Convolutional Network
Title | Dual-Attention Graph Convolutional Network |
Authors | Xueya Zhang, Tong Zhang, Wenting Zhao, Zhen Cui, Jian Yang |
Abstract | Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connection-attention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms. |
Tasks | Text Classification |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12486v1 |
https://arxiv.org/pdf/1911.12486v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-attention-graph-convolutional-network |
Repo | |
Framework | |
Semi-Supervised Learning for Text Classification by Layer Partitioning
Title | Semi-Supervised Learning for Text Classification by Layer Partitioning |
Authors | Alexander Hanbo Li, Abhinav Sethy |
Abstract | Most recent neural semi-supervised learning algorithms rely on adding small perturbation to either the input vectors or their representations. These methods have been successful on computer vision tasks as the images form a continuous manifold, but are not appropriate for discrete input such as sentence. To adapt these methods to text input, we propose to decompose a neural network $M$ into two components $F$ and $U$ so that $M = U\circ F$. The layers in $F$ are then frozen and only the layers in $U$ will be updated during most time of the training. In this way, $F$ serves as a feature extractor that maps the input to high-level representation and adds systematical noise using dropout. We can then train $U$ using any state-of-the-art SSL algorithms such as $\Pi$-model, temporal ensembling, mean teacher, etc. Furthermore, this gradually unfreezing schedule also prevents a pretrained model from catastrophic forgetting. The experimental results demonstrate that our approach provides improvements when compared to state of the art methods especially on short texts. |
Tasks | Text Classification |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11756v1 |
https://arxiv.org/pdf/1911.11756v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-for-text |
Repo | |
Framework | |
Behavioral Petri Net Mining and Automated Analysis for Human-Computer Interaction Recommendations in Multi-Application Environments
Title | Behavioral Petri Net Mining and Automated Analysis for Human-Computer Interaction Recommendations in Multi-Application Environments |
Authors | Julian Theis, Houshang Darabi |
Abstract | Process Mining is a famous technique which is frequently applied to Software Development Processes, while being neglected in Human-Computer Interaction (HCI) recommendation applications. Organizations usually train employees to interact with required IT systems. Often, employees, or users in general, develop their own strategies for solving repetitive tasks and processes. However, organizations find it hard to detect whether employees interact efficiently with IT systems or not. Hence, we have developed a method which detects inefficient behavior assuming that at least one optimal HCI strategy is known. This method provides recommendations to gradually adapt users’ behavior towards the optimal way of interaction considering satisfaction of users. Based on users’ behavior logs tracked by a Java application suitable for multi-application and multi-instance environments, we demonstrate the applicability for a specific task in a common Windows environment utilizing realistic simulated behaviors of users. |
Tasks | |
Published | 2019-02-23 |
URL | http://arxiv.org/abs/1902.08740v2 |
http://arxiv.org/pdf/1902.08740v2.pdf | |
PWC | https://paperswithcode.com/paper/behavioral-petri-net-mining-and-automated |
Repo | |
Framework | |
Thick-Net: Parallel Network Structure for Sequential Modeling
Title | Thick-Net: Parallel Network Structure for Sequential Modeling |
Authors | Yu-Xuan Li, Jin-Yuan Liu, Liang Li, Xiang Guan |
Abstract | Recurrent neural networks have been widely used in sequence learning tasks. In previous studies, the performance of the model has always been improved by either wider or deeper structures. However, the former becomes more prone to overfitting, while the latter is difficult to optimize. In this paper, we propose a simple new model named Thick-Net, by expanding the network from another dimension: thickness. Multiple parallel values are obtained via more sets of parameters in each hidden state, and the maximum value is selected as the final output among parallel intermediate outputs. Notably, Thick-Net can efficiently avoid overfitting, and is easier to optimize than the vanilla structures due to the large dropout affiliated with it. Our model is evaluated on four sequential tasks including adding problem, permuted sequential MNIST, text classification and language modeling. The results of these tasks demonstrate that our model can not only improve accuracy with faster convergence but also facilitate a better generalization ability. |
Tasks | Language Modelling, Text Classification |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08074v1 |
https://arxiv.org/pdf/1911.08074v1.pdf | |
PWC | https://paperswithcode.com/paper/thick-net-parallel-network-structure-for |
Repo | |
Framework | |
Single-shot Channel Pruning Based on Alternating Direction Method of Multipliers
Title | Single-shot Channel Pruning Based on Alternating Direction Method of Multipliers |
Authors | Chengcheng Li, Zi Wang, Xiangyang Wang, Hairong Qi |
Abstract | Channel pruning has been identified as an effective approach to constructing efficient network structures. Its typical pipeline requires iterative pruning and fine-tuning. In this work, we propose a novel single-shot channel pruning approach based on alternating direction methods of multipliers (ADMM), which can eliminate the need for complex iterative pruning and fine-tuning procedure and achieve a target compression ratio with only one run of pruning and fine-tuning. To the best of our knowledge, this is the first study of single-shot channel pruning. The proposed method introduces filter-level sparsity during training and can achieve competitive performance with a simple heuristic pruning criterion (L1-norm). Extensive evaluations have been conducted with various widely-used benchmark architectures and image datasets for object classification purpose. The experimental results on classification accuracy show that the proposed method can outperform state-of-the-art network pruning works under various scenarios. |
Tasks | Network Pruning, Object Classification |
Published | 2019-02-18 |
URL | http://arxiv.org/abs/1902.06382v1 |
http://arxiv.org/pdf/1902.06382v1.pdf | |
PWC | https://paperswithcode.com/paper/single-shot-channel-pruning-based-on |
Repo | |
Framework | |