Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
Several important tasks in machine learning can be formulated as combinatorial optimization problems. For example, learning the structure of a Bayesian network from data can be formulated as a combinatorial optimization problem, where a score is defined that measures how well a candidate structure is supported by the observed data and the task is to find the structure with the lowest score. As a second example, learning a decision tree from labeled data can be formulated as a combinatorial optimization problem, where the aim is to find the decision tree that best predicts the data subject to regularization constraints. Both of these problems are NP-Hard in general to solve optimally but are also NP-Hard to solve approximately to within a reasonable factor. Thus, advanced search techniques are needed. This research proposal is an investigation into formulating and improving constraint programming and other advanced constraint-based search approaches for solving combinatorial optimization problems that arise in machine learning.
In a constraint programming approach, one models a problem by specifying constraints on acceptable solutions and search is then used to find a solution that satisfies the constraints and optimizes a cost function. An important feature of constraint programming is that one can first focus on a declarative constraint model and then develop an efficient algorithm for that model, either a complete and optimal algorithm based on backtracking search or an incomplete and approximate algorithm based on local search. In general, the research will be application-driven and will address practical, important problems. The research will be guided by the two important applications alluded to above: learning the structure of a Bayesian network from data and learning a decision tree from data subject to regularization constraints. The scientific approach will include: developing improved constraint models, improving the upper and lower bounds used during the search, investigating alternative search spaces, investigating the incorporation of prior domain knowledge in the form of constraints, methods for model averaging by generating the k-best models, extending the methods to other directed acyclic probabilistic graphical models such as sigmoid belief networks, and extending the methods to other decision tree models such as multi-variate decision trees.
The primary goals of the research projects are to develop faster algorithms for finding solutions, algorithms that find optimal or higher quality solutions, and algorithms that are more widely applicable in practice. Any improvements to the underlying solving algorithms have the potential to improve many applications of these machine learning approaches. A secondary goal of this work is to further develop constraint programming techniques that have general applicability to similar optimization problems.