Recent from talks
Nothing was collected or created yet.
Inductive logic programming
View on Wikipedia
Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.
- Schema: positive examples + negative examples + background knowledge ⇒ hypothesis.
Inductive logic programming is particularly useful in bioinformatics and natural language processing.[citation needed]
History
[edit]Building on earlier work on Inductive inference, Gordon Plotkin was the first to formalise induction in a clausal setting around 1970, adopting an approach of generalising from examples.[1][2] In 1981, Ehud Shapiro introduced several ideas that would shape the field in his new approach of model inference, an algorithm employing refinement and backtracing to search for a complete axiomatisation of given examples.[1][3] His first implementation was the Model Inference System in 1981:[4][5] a Prolog program that inductively inferred Horn clause logic programs from positive and negative examples.[1] The term Inductive Logic Programming was first introduced in a paper by Stephen Muggleton in 1990, defined as the intersection of machine learning and logic programming.[1] Muggleton and Wray Buntine introduced predicate invention and inverse resolution in 1988.[1][6]
Several inductive logic programming systems that proved influential appeared in the early 1990s. FOIL, introduced by Ross Quinlan in 1990[7] was based on upgrading propositional learning algorithms AQ and ID3.[8] Golem, introduced by Muggleton and Feng in 1990, went back to a restricted form of Plotkin's least generalisation algorithm.[8][9] The Progol system, introduced by Muggleton in 1995, first implemented inverse entailment, and inspired many later systems.[8][10][11] Aleph, a descendant of Progol introduced by Ashwin Srinivasan in 2001, is still one of the most widely used systems as of 2022[update].[10]
At around the same time, the first practical applications emerged, particularly in bioinformatics, where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins.[12] Unlike the focus on automatic programming inherent in the early work, these fields used inductive logic programming techniques from a viewpoint of relational data mining. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field.[13]
Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the Metagol system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.[14] This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.[15]
Setting
[edit]Inductive logic programming has adopted several different learning settings, the most common of which are learning from entailment and learning from interpretations.[16] In both cases, the input is provided in the form of background knowledge B, a logical theory (commonly in the form of clauses used in logic programming), as well as positive and negative examples, denoted and respectively. The output is given as a hypothesis H, itself a logical theory that typically consists of one or more clauses.
The two settings differ in the format of examples presented.
Learning from entailment
[edit]As of 2022[update], learning from entailment is by far the most popular setting for inductive logic programming.[16] In this setting, the positive and negative examples are given as finite sets and of positive and negated ground literals, respectively. A correct hypothesis H is a set of clauses satisfying the following requirements, where the turnstile symbol stands for logical entailment:[16][17][18] Completeness requires any generated hypothesis H to explain all positive examples , and consistency forbids generation of any hypothesis H that is inconsistent with the negative examples , both given the background knowledge B.
In Muggleton's setting of concept learning,[19] "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "Necessity", which postulates that B does not entail , does not impose a restriction on H, but forbids any generation of a hypothesis as long as the positive facts are explainable without it. "Weak consistency", which states that no contradiction can be derived from , forbids generation of any hypothesis H that contradicts the background knowledge B. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.[19]
Learning from interpretations
[edit]In learning from interpretations, the positive and negative examples are given as a set of complete or partial Herbrand structures, each of which are themselves a finite set of ground literals. Such a structure e is said to be a model of the set of clauses if for any substitution and any clause in such that , also holds. The goal is then to output a hypothesis that is complete, meaning every positive example is a model of , and consistent, meaning that no negative example is a model of .[16]
Approaches to ILP
[edit]An inductive logic programming system is a program that takes as an input logic theories and outputs a correct hypothesis H with respect to theories . A system is complete if and only if for any input logic theories any correct hypothesis H with respect to these input theories can be found with its hypothesis search procedure. Inductive logic programming systems can be roughly divided into two classes, search-based and meta-interpretative systems.
Search-based systems exploit that the space of possible clauses forms a complete lattice under the subsumption relation, where one clause subsumes another clause if there is a substitution such that , the result of applying to , is a subset of . This lattice can be traversed either bottom-up or top-down.
Bottom-up search
[edit]Bottom-up methods to search the subsumption lattice have been investigated since Plotkin's first work on formalising induction in clausal logic in 1970.[1][2] Techniques used include least general generalisation, based on anti-unification, and inverse resolution, based on inverting the resolution inference rule.
Least general generalisation
[edit]A least general generalisation algorithm takes as input two clauses and and outputs the least general generalisation of and , that is, a clause that subsumes and , and that is subsumed by every other clause that subsumes and . The least general generalisation can be computed by first computing all selections from and , which are pairs of literals sharing the same predicate symbol and negated/unnegated status. Then, the least general generalisation is obtained as the disjunction of the least general generalisations of the individual selections, which can be obtained by first-order syntactical anti-unification.[20]
To account for background knowledge, inductive logic programming systems employ relative least general generalisations, which are defined in terms of subsumption relative to a background theory. In general, such relative least general generalisations are not guaranteed to exist; however, if the background theory B is a finite set of ground literals, then the negation of B is itself a clause. In this case, a relative least general generalisation can be computed by disjoining the negation of B with both and and then computing their least general generalisation as before.[21]
Relative least general generalisations are the foundation of the bottom-up system Golem.[8][9]
Inverse resolution
[edit]Inverse resolution is an inductive reasoning technique that involves inverting the resolution operator.
Inverse resolution takes information about the resolvent of a resolution step to compute possible resolving clauses. Two types of inverse resolution operator are in use in inductive logic programming: V-operators and W-operators. A V-operator takes clauses and as input and returns a clause such that is the resolvent of and . A W-operator takes two clauses and and returns three clauses , and such that is the resolvent of and and is the resolvent of and .[22]
Inverse resolution was first introduced by Stephen Muggleton and Wray Buntine in 1988 for use in the inductive logic programming system Cigol.[6] By 1993, this spawned a surge of research into inverse resolution operators and their properties.[22]
Top-down search
[edit]The ILP systems Progol,[11] Hail [23] and Imparo [24] find a hypothesis H using the principle of the inverse entailment[11] for theories B, E, H: . First they construct an intermediate theory F called a bridge theory satisfying the conditions and . Then as , they generalize the negation of the bridge theory F with anti-entailment.[25] However, the operation of anti-entailment is computationally more expensive since it is highly nondeterministic. Therefore, an alternative hypothesis search can be conducted using the inverse subsumption (anti-subsumption) operation instead, which is less non-deterministic than anti-entailment.
Questions of completeness of a hypothesis search procedure of specific inductive logic programming system arise. For example, the Progol hypothesis search procedure based on the inverse entailment inference rule is not complete by Yamamoto's example.[26] On the other hand, Imparo is complete by both anti-entailment procedure [27] and its extended inverse subsumption [28] procedure.
Metainterpretive learning
[edit]Rather than explicitly searching the hypothesis graph, metainterpretive or meta-level systems encode the inductive logic programming program as a meta-level logic program which is then solved to obtain an optimal hypothesis. Formalisms used to express the problem specification include Prolog and answer set programming, with existing Prolog systems and answer set solvers used for solving the constraints.[29]
And example of a Prolog-based system is Metagol, which is based on a meta-interpreter in Prolog, while ASPAL and ILASP are based on an encoding of the inductive logic programming problem in answer set programming.[29]
Evolutionary learning
[edit]Evolutionary algorithms in ILP use a population-based approach to evolve hypotheses, refining them through selection, crossover, and mutation. Methods like EvoLearner have been shown to outperform traditional approaches on structured machine learning benchmarks. [30]
List of implementations
[edit]- 1BC and 1BC2: first-order naive Bayesian classifiers:
- ACE (A Combined Engine)
- Aleph
- Atom Archived 2014-03-26 at the Wayback Machine
- Claudien
- DL-Learner Archived 2019-08-15 at the Wayback Machine
- DMax
- FastLAS (Fast Learning from Answer Sets)
- FOIL (First Order Inductive Learner)
- Golem
- ILASP (Inductive Learning of Answer Set Programs)
- Imparo[27]
- Inthelex (INcremental THEory Learner from EXamples) Archived 2011-11-28 at the Wayback Machine
- Lime
- Metagol
- Mio
- MIS (Model Inference System) by Ehud Shapiro
- Ontolearn
- Popper
- PROGOL
- RSD
- Warmr (now included in ACE)
- ProGolem [31][32]
Probabilistic inductive logic programming
[edit]Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning probabilistic logic programs. It can be considered as a form of statistical relational learning within the formalism of probabilistic logic programming.[33][34]
Given
- background knowledge as a probabilistic logic program B, and
- a set of positive and negative examples and
the goal of probabilistic inductive logic programming is to find a probabilistic logic program such that the probability of positive examples according to is maximized and the probability of negative examples is minimized.[34]
This problem has two variants: parameter learning and structure learning. In the former, one is given the structure (the clauses) of H and the goal is to infer the probabilities annotations of the given clauses, while in the latter the goal is to infer both the structure and the probability parameters of H. Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.[34]
Parameter Learning
[edit]Parameter learning for languages following the distribution semantics has been performed by using an expectation-maximisation algorithm or by gradient descent. An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed. Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.[34]
Structure Learning
[edit]Structure learning was pioneered by Daphne Koller and Avi Pfeffer in 1997,[35] where the authors learn the structure of first-order rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlying graphical model in a preliminary step and then applying expectation-maximisation.[34]
In 2008, De Raedt et al. presented an algorithm for performing theory compression on ProbLog programs, where theory compression refers to a process of removing as many clauses as possible from the theory in order to maximize the probability of a given set of positive and negative examples. No new clause can be added to the theory.[34][36]
In the same year, Meert, W. et al. introduced a method for learning parameters and structure of ground probabilistic logic programs by considering the Bayesian networks equivalent to them and applying techniques for learning Bayesian networks.[37][34]
ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system FOIL with ProbLog. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow one to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.[38][34]
In 2011, Elena Bellodi and Fabrizio Riguzzi introduced SLIPCASE, which performs a beam search among probabilistic logic programs by iteratively refining probabilistic theories and optimizing the parameters of each theory using expectation-maximisation.[39] Its extension SLIPCOVER, proposed in 2014, uses bottom clauses generated as in Progol to guide the refinement process, thus reducing the number of revisions and exploring the search space more effectively. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with a beam search, while the space of theories is searched greedily.[40][34]
See also
[edit]References
[edit]- ^ a b c d e f Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Springer. pp. 174–177. ISBN 978-3-540-62927-6.
- ^ a b Plotkin, G.D. (1970). Automatic Methods of Inductive Inference (PDF) (PhD). University of Edinburgh. hdl:1842/6656.
- ^ Shapiro, Ehud Y. (1981). Inductive inference of theories from facts (PDF) (Technical report). Department of Computer Science, Yale University. 192. Reprinted in Lassez, J.-L.; Plotkin, G., eds. (1991). Computational logic : essays in honor of Alan Robinson. MIT Press. pp. 199–254. ISBN 978-0-262-12156-9.
- ^ Shapiro, Ehud Y. (1981). "The model inference system" (PDF). Proceedings of the 7th international joint conference on Artificial intelligence. Vol. 2. Morgan Kaufmann. p. 1064.
- ^ Shapiro, Ehud Y. (1983). Algorithmic program debugging. MIT Press. ISBN 0-262-19218-7.
- ^ a b Muggleton, S.H.; Buntine, W. (1988). "Machine invention of first-order predicate by inverting resolution". Proceedings of the 5th International Conference on Machine Learning. pp. 339–352. doi:10.1016/B978-0-934613-64-4.50040-2. ISBN 978-0-934613-64-4.
- ^ Quinlan, J. R. (August 1990). "Learning logical definitions from relations". Machine Learning. 5 (3): 239–266. doi:10.1007/bf00117105. ISSN 0885-6125.
- ^ a b c d Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Springer. pp. 354–358. ISBN 978-3-540-62927-6.
- ^ a b Muggleton, Stephen H.; Feng, Cao (1990). "Efficient Induction of Logic Programs". In Arikawa, Setsuo; Goto, Shigeki; Ohsuga, Setsuo; Yokomori, Takashi (eds.). Algorithmic Learning Theory, First International Workshop, ALT '90, Tokyo, Japan, October 8-10, 1990, Proceedings. Springer/Ohmsha. pp. 368–381.
- See also Muggleton, Stephen H.; Feng, Cao (1992). "Efficient Induction of Logic Programs". In Muggleton, Stephen H. (ed.). Inductive Logic Programming. The APIS Series. Vol. 38. Academic Press. p. 281. ISBN 978-0-12-509715-4.
- ^ a b Cropper & Dumančić 2022, p. 808.
- ^ a b c Muggleton, S.H. (1995). "Inverting entailment and Progol". New Generation Computing. 13 (3–4): 245–286. CiteSeerX 10.1.1.31.1630. doi:10.1007/bf03037227. S2CID 12643399.
- ^ Džeroski, Sašo (2001), Džeroski, Sašo; Lavrač, Nada (eds.), "Relational Data Mining Applications: An Overview", Relational Data Mining, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 339–364, doi:10.1007/978-3-662-04599-2_14, ISBN 978-3-642-07604-6, retrieved 2023-11-27
- ^ De Raedt, Luc (2008), Logical and Relational Learning, Cognitive Technologies, Berlin, Heidelberg: Springer, p. 14, Bibcode:2008lrl..book.....D, doi:10.1007/978-3-540-68856-3, ISBN 978-3-540-20040-6
- ^ Muggleton, Stephen H.; Lin, Dianhuan; Pahlavi, Niels; Tamaddoni-Nezhad, Alireza (2013-05-01). "Meta-interpretive learning: application to grammatical inference". Machine Learning. 94 (1): 25–49. doi:10.1007/s10994-013-5358-3. ISSN 0885-6125. S2CID 254738603.
- ^ Cropper, Andrew; Dumančić, Sebastijan; Evans, Richard; Muggleton, Stephen (2022). "Inductive logic programming at 30". Machine Learning. 111 (1): 147–172. doi:10.1007/s10994-021-06089-1. ISSN 0885-6125.
- ^ a b c d Cropper & Dumančić 2022, p. 779–782.
- ^ Džeroski, Sašo (1996). "Inductive Logic Programming and Knowledge Discovery in Databases" (PDF). In Fayyad, U.M.; Piatetsky-Shapiro, G.; Smith, P.; Uthurusamy, R. (eds.). Advances in Knowledge Discovery and Data Mining. MIT Press. pp. 117–152 See §5.2.4. Archived from the original (PDF) on 2021-09-27. Retrieved 2021-09-27.
- ^ De Raedt, Luc (1997). "Logical settings for concept-learning". Artificial Intelligence. 95 (1): 187–201. doi:10.1016/S0004-3702(97)00041-6.
- ^ a b Muggleton, Stephen (1999). "Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic". Artificial Intelligence. 114 (1–2): 283–296. doi:10.1016/s0004-3702(99)00067-3.; here: Sect.2.1
- ^ Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Springer. p. 255. ISBN 978-3-540-62927-6.
- ^ Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Springer. p. 286. ISBN 978-3-540-62927-6.
- ^ a b Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Springer. p. 197. ISBN 978-3-540-62927-6.
- ^ Ray, O.; Broda, K.; Russo, A.M. (2003). "Hybrid abductive inductive learning". Proceedings of the 13th international conference on inductive logic programming. LNCS. Vol. 2835. Springer. pp. 311–328. CiteSeerX 10.1.1.212.6602. doi:10.1007/978-3-540-39917-9_21. ISBN 978-3-540-39917-9.
- ^ Kimber, T.; Broda, K.; Russo, A. (2009). "Induction on failure: learning connected Horn theories". Proceedings of the 10th international conference on logic programing and nonmonotonic reasoning. LNCS. Vol. 575. Springer. pp. 169–181. doi:10.1007/978-3-642-04238-6_16. ISBN 978-3-642-04238-6.
- ^ Yamamoto, Yoshitaka; Inoue, Katsumi; Iwanuma, Koji (2012). "Inverse subsumption for complete explanatory induction" (PDF). Machine Learning. 86: 115–139. doi:10.1007/s10994-011-5250-y. S2CID 11347607.
- ^ Yamamoto, Akihiro (1997). "Which hypotheses can be found with inverse entailment?". International Conference on Inductive Logic Programming. Lecture Notes in Computer Science. Vol. 1297. Springer. pp. 296–308. CiteSeerX 10.1.1.54.2975. doi:10.1007/3540635149_58. ISBN 978-3-540-69587-5.
- ^ a b Kimber, Timothy (2012). Learning definite and normal logic programs by induction on failure (PhD). Imperial College London. ethos 560694. Archived from the original on 2022-10-21. Retrieved 2022-10-21.
- ^ Toth, David (2014). "Imparo is complete by inverse subsumption". arXiv:1407.3836 [cs.AI].
- ^ a b Cropper & Dumančić 2022, p. 795.
- ^ Heindorf, Stefan; Blübaum, Lukas; Düsterhus, Nick; Werner, Till; Golani, Varun Nandkumar; Demir, Caglar; Ngonga Ngomo, Axel-Cyrille (2022). EvoLearner: Learning Description Logics with Evolutionary Algorithms. WWW. arXiv:2111.04879.
- ^ Muggleton, Stephen; Santos, Jose; Tamaddoni-Nezhad, Alireza (2009). "ProGolem: a system based on relative minimal generalization". International Conference on Inductive Logic Programming. Springer. pp. 131–148. CiteSeerX 10.1.1.297.7992. doi:10.1007/978-3-642-13840-9_13. ISBN 978-3-642-13840-9.
- ^ Santos, Jose; Nassif, Houssam; Page, David; Muggleton, Stephen; Sternberg, Mike (2012). "Automated identification of features of protein-ligand interactions using Inductive Logic Programming: a hexose binding case study". BMC Bioinformatics. 13: 162. doi:10.1186/1471-2105-13-162. PMC 3458898. PMID 22783946.
- ^ De Raedt, Luc; Kersting, Kristian (2008), Probabilistic Inductive Logic Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1–27, doi:10.1007/978-3-540-78652-8_1, ISBN 978-3-540-78651-1, retrieved 2023-12-09
- ^ a b c d e f g h i Riguzzi, Fabrizio; Bellodi, Elena; Zese, Riccardo (2014-09-18). "A History of Probabilistic Inductive Logic Programming". Frontiers in Robotics and AI. 1. doi:10.3389/frobt.2014.00006. ISSN 2296-9144.
- ^ Koller, Daphne; Pfeffer, Avi (August 1997). Learning probabilities for noisy first-order rules (PDF). IJCAI.
- ^ De Raedt, L.; Kersting, K.; Kimmig, A.; Revoredo, K.; Toivonen, H. (March 2008). "Compressing probabilistic Prolog programs". Machine Learning. 70 (2–3): 151–168. doi:10.1007/s10994-007-5030-x. hdl:10138/143971. ISSN 0885-6125.
- ^ Blockeel, Hendrik; Meert, Wannes (2007), "Towards Learning Non-recursive LPADs by Transforming Them into Bayesian Networks", Inductive Logic Programming, Lecture Notes in Computer Science, vol. 4455, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 94–108, doi:10.1007/978-3-540-73847-3_16, ISBN 978-3-540-73846-6, retrieved 2023-12-09
- ^ De Raedt, Luc; Thon, Ingo (2011), Frasconi, Paolo; Lisi, Francesca A. (eds.), "Probabilistic Rule Learning", Inductive Logic Programming, vol. 6489, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 47–58, doi:10.1007/978-3-642-21295-6_9, ISBN 978-3-642-21294-9, S2CID 11727522, retrieved 2023-12-09
- ^ Bellodi, Elena; Riguzzi, Fabrizio (2012), "Learning the Structure of Probabilistic Logic Programs", Inductive Logic Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 61–75, doi:10.1007/978-3-642-31951-8_10, ISBN 978-3-642-31950-1, retrieved 2023-12-09
- ^ Bellodi, Elena; Riguzzi, Fabrizio (2014-01-15). "Structure learning of probabilistic logic programs by searching the clause space". Theory and Practice of Logic Programming. 15 (2): 169–212. arXiv:1309.2080. doi:10.1017/s1471068413000689. hdl:11392/2269014. ISSN 1471-0684. S2CID 17669522.
- Cropper, Andrew; Dumančić, Sebastijan (2022-06-15). "Inductive Logic Programming At 30: A New Introduction". Journal of Artificial Intelligence Research. 74: 765-850. arXiv:2008.07912. doi:10.1613/jair.1.13507. ISSN 1076-9757.
This article incorporates text from a free content work. Licensed under CC-BY 4.0 (license statement/permission). Text taken from A History of Probabilistic Inductive Logic Programming, Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese, Frontiers Media.
Further reading
[edit]- Muggleton, S.; De Raedt, L. (1994). "Inductive Logic Programming: Theory and methods". The Journal of Logic Programming. 19–20: 629–679. doi:10.1016/0743-1066(94)90035-3.
- Lavrac, N.; Dzeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. New York: Ellis Horwood. ISBN 978-0-13-457870-5. Archived from the original on 2004-09-06. Retrieved 2004-09-22.
- Visual example of inducing the grandparenthood relation by the Atom system. http://john-ahlgren.blogspot.com/2014/03/inductive-reasoning-visualized.html Archived 2014-03-26 at the Wayback Machine
Inductive logic programming
View on GrokipediaFundamentals
Definition and objectives
Inductive logic programming (ILP) is a subfield of machine learning that uses logic programming as a uniform representation for examples, background knowledge, and hypotheses, with the goal of inducing first-order clausal theories from positive and negative examples.[2] This approach combines techniques from machine learning and logic programming to automatically synthesize logic programs that generalize over observed data while remaining consistent with prior domain knowledge.[6] As a form of symbolic machine learning, ILP emphasizes the learning of relational and structured knowledge, distinguishing it from statistical or neural methods that often prioritize pattern recognition over explicit rule derivation.[7] The primary objectives of ILP are to produce generalizable hypotheses that cover positive examples, avoid negative examples, and incorporate background knowledge to ensure logical consistency and completeness.[2] These systems aim to handle noisy or incomplete data by allowing for approximate generalizations, such as through weak consistency where hypotheses need only cover most examples rather than all.[8] Additionally, ILP seeks to generate human-interpretable rules in the form of definite clauses, facilitating applications in domains requiring explainable AI, like scientific discovery and knowledge base construction.[7] In contrast to deductive logic programming, which derives specific conclusions from established axioms and facts through logical inference, ILP focuses on induction: learning plausible general rules from limited, empirical observations to predict unseen cases.[2] For example, given positive examples of birds that fly (e.g., tweety(X), bird(X), flies(X)) and negative examples of non-flying birds (e.g., penguin(X), bird(X), not flies(X)), along with background knowledge defining birds, an ILP system might induce the clause:flies(X) :- bird(X).
flies(X) :- bird(X).
Background concepts
In logic programming, knowledge is represented using first-order logic structures. Terms are the basic building blocks, consisting of constants (e.g., names likejohn), variables (e.g., X), or compound terms formed by function symbols applied to other terms (e.g., father(john)).[9] Atoms are predicate symbols applied to terms, representing atomic propositions or relations, such as parent(john, mary).[10] Predicates denote relations with a fixed arity, like parent/2 for binary parent-child relations.[9] Clauses express logical implications as head ← body, where the head is an atom and the body is a conjunction of atoms or conditions; facts are clauses without a body.[10] Horn clauses, a key subset, have at most one positive literal in the head, enabling efficient inference and forming the core of many logic programs.[9] Unification is the matching process that finds the most general substitution making two terms identical, essential for resolution-based proof procedures.[10]
Prolog serves as a foundational implementation language for logic programming, particularly in inductive applications. It is declarative, allowing programmers to specify what relations hold rather than how to compute them, with execution driven by logical inference.[11] Prolog programs consist of Horn clauses, and queries are resolved using SLD resolution (Selective Linear Definite clause resolution), a backward-chaining mechanism that selects a clause, unifies the goal with its head, and recursively resolves the body, exploring depth-first search with backtracking.[11] This operational semantics ensures soundness and completeness for definite clause programs, making Prolog suitable for symbolic reasoning tasks.[10]
Inductive inference in machine learning involves learning general patterns from specific examples to predict unseen data. Generalization refers to a model's ability to perform well on new instances, balancing fit to training data with broader applicability.[12] Overfitting occurs when a model captures noise or idiosyncrasies in the training set, leading to poor generalization, often due to excessive complexity.[12] Empirical risk minimization (ERM) is a foundational principle, selecting a hypothesis that minimizes the average loss on observed data, providing a proxy for true risk minimization under suitable conditions.[13]
In inductive logic programming, background knowledge (denoted as B) plays a crucial role by supplying domain-specific axioms, such as predefined predicates and rules, to constrain the hypothesis space and guide learning from examples.[14] This prior knowledge ensures hypotheses are logically consistent with established facts, reducing overfitting by limiting generalizations to semantically valid ones, and enabling the use of relational structures for more expressive models.[14]
A brief example illustrates these concepts in the family domain. Consider the following Prolog program defining parent relations:
parent(charles1, james1).
parent(elizabeth, james1).
parent(charles2, charles1).
parent(charles1, james1).
parent(elizabeth, james1).
parent(charles2, charles1).
parent/2 is a predicate with facts as ground Horn clauses; querying parent(charles1, X) unifies to yield X = james1 via SLD resolution.[15]
Formal Framework
Learning from entailment
In inductive logic programming (ILP), learning from entailment is a foundational task that seeks to induce a hypothesis , typically represented as a set of logical clauses, from a background theory , a set of positive examples , and a set of negative examples . The objective is to find an such that logically entails every positive example while entailing none of the negative examples. Examples are usually ground definite clauses or atoms, and entailment is assessed via proof procedures in the underlying logic, such as SLD-resolution in Prolog. This setup contrasts with other ILP paradigms by relying on proof-theoretic semantics rather than explicit model enumeration.[1] The coverage condition requires that the hypothesis generalizes the positive examples: , ensuring completeness in explaining the observed positives. Complementarily, the correctness condition demands consistency with the negatives: , preventing overgeneralization that would misclassify negatives as true. A hypothesis satisfying both conditions is deemed correct, though in practice, coverage is often quantified as the difference between entailed positives and entailed negatives to guide search in systems like Aleph. These conditions formalize the inductive bias toward logically sound generalizations within the space defined by .[1] In deterministic settings, learning from entailment assumes complete and noise-free data, aiming for exact solutions where entailment is binary (true or false based on provability). This is common in systems like Progol and Metagol, which enforce strict adherence to the coverage and correctness conditions. Stochastic variants extend this framework to handle uncertainty or incomplete information by introducing probabilistic interpretations of clauses or examples, such as annotating rules with probabilities in a Bayesian network. For instance, probabilistic ILP computes a probability via marginal inference, allowing coverage to be measured as expected log-likelihood rather than strict entailment, which accommodates noisy or partial data where deterministic proofs may fail.[1] A classic illustrative example involves learning a rule for flight from bird-related facts. Suppose includes , , and , with positive example and negative example . A suitable hypothesis is , which satisfies coverage by entailing (since Tweety is a bird but not a penguin) and correctness by not entailing (due to the penguin exception). This demonstrates how exceptions in refine the hypothesis to balance generality and specificity.Learning from interpretations
In inductive logic programming, learning from interpretations is a framework where examples are provided as complete Herbrand interpretations—sets of ground atoms representing explicit models of the world—rather than partial examples or proofs. Given background knowledge (a set of logical clauses), a set of positive interpretations (each is a set of ground atoms denoting true facts in a positive example), and a set of negative interpretations (similarly denoting undesired facts), the task is to induce a hypothesis (a set of clauses) such that every positive interpretation satisfies the theory, i.e., for all , and no negative interpretation satisfies it, i.e., for all .[2] This setting contrasts with learning from entailment by focusing on model satisfaction in given interpretations rather than deriving unobserved consequences from partial background facts.[1] The induced hypothesis ensures that all positive interpretations are models of , while negative interpretations are not, allowing to generalize over complete world descriptions without assuming incompleteness in the examples.[16] This model-intersection approach enables the hypothesis to capture relational structures directly from the provided interpretations, supporting scalable induction in domains where examples represent full configurations.[17] A key advantage of learning from interpretations lies in its suitability for handling explicit world states, particularly in relational data where examples depict complete snapshots, such as graphs or structured databases, avoiding the need to infer hidden facts.[1] For instance, it can induce rules for chess endgames, like the King-Rook vs. King (KRK) scenario, from positive interpretations (board states where a checkmate move is legal) and negative ones (illegal moves), using background knowledge of piece positions to learn constraints such as "a rook attacks if no pieces intervene."[1] This facilitates learning descriptive rules that directly classify or predict based on full relational configurations. This framework also connects to descriptive complexity theory in database theory, where induced hypotheses correspond to logical queries (e.g., Datalog programs) that define views over relational databases, characterizing the computational resources needed to express and learn such queries from example interpretations.[18]Historical Development
Early foundations
The roots of inductive logic programming lie in early artificial intelligence research on inductive inference and structural generalization. In 1970, Gordon D. Plotkin introduced the concept of the least general generalization (LGG), a technique for computing the most specific hypothesis that subsumes two or more given structures or clauses, enabling structural matching in logical representations.[19] This work provided a foundational mechanism for generalization in first-order logic, influencing subsequent approaches to learning from examples without relying on statistical methods. Plotkin's LGG was particularly suited to handling relational data, setting the stage for logic-based induction by addressing subsumption under theta-subsumption.[20] Building on these ideas, several early systems emerged in the mid-1970s that demonstrated practical inductive learning in logical frameworks. The INDUCE system, developed by Ryszard S. Michalski in 1975, was an interactive program for inducing concepts in the predicate calculus from positive and negative examples, using covering rules to generate generalizations while preserving discrimination against counterexamples. Similarly, Tom M. Mitchell's version spaces approach, presented in 1977, formalized candidate elimination for maintaining a set of consistent hypotheses (the version space) bounded by general and specific boundaries, applied to rule learning in attribute-value representations that could extend to logical forms.[21] These systems highlighted the feasibility of heuristic search for inductive generalization, though limited to simpler representations compared to full first-order logic. ILP's development was also shaped by influences from deductive paradigms, including resolution theorem proving and program synthesis. Resolution, formalized by J. Alan Robinson in 1965, provided the inference engine for logic programming, inspiring inverse resolution techniques for "running deduction backwards" to hypothesize rules from observed derivations. Early program synthesis efforts, such as those exploring automated construction of logic programs from specifications in the late 1960s and 1970s, further bridged deduction and induction by treating learning as a form of program completion. Preceding explicit ILP formulations, Ehud Y. Shapiro's 1983 work on algorithmic program debugging employed abductive reasoning to diagnose and correct faulty logic programs by hypothesizing minimal revisions consistent with test cases, demonstrating induction's role in program refinement.[22] A pivotal event in consolidating these strands occurred at the AAAI Workshop on Automating Software Design in 1988, where researchers discussed inductive methods for synthesizing logic programs, catalyzing the formalization of ILP as a distinct field in the early 1990s. This workshop emphasized integrating background knowledge with examples for software design tasks, highlighting the need for first-order inductive frameworks beyond propositional learning.Key advancements
The term "Inductive Logic Programming" (ILP) was first introduced by Stephen Muggleton in 1990. A seminal 1994 survey by Muggleton and Luc de Raedt established ILP as a subfield focused on the inductive construction of first-order clausal theories from examples and background knowledge.[6] This work synthesized earlier efforts, such as the least general generalization (LGG) concept, into a unified framework that emphasized learning in the presence of logical background theories.[6] A major advancement came with the development of the Progol system by Stephen Muggleton in 1995, which introduced inverse entailment as a principled method for hypothesis generation in ILP.[23] Progol operationalized the inversion of logical deduction to produce concise hypotheses from positive and negative examples, demonstrating practical utility in domains like drug design and enabling the learning of rules with high predictive accuracy.[23] Concurrently, theoretical foundations were strengthened by David Haussler's 1989 demonstration of the PAC-learnability of conjunctive concepts in structural domains, providing a probably approximately correct (PAC) framework for bounding the sample complexity of learning simple logic clauses.[24] This was extended by William W. Cohen in 1995, who established PAC-learnability results for classes of recursive logic programs, addressing challenges in learning non-monotonic and recursive structures while highlighting efficiency boundaries for polynomial-time algorithms.[25] Key milestones included the initiation of annual ILP workshops in 1991, which fostered collaboration and propelled the field from niche research to a recognized area of machine learning.[26] In the 2000s, ILP integrated with statistical learning, giving rise to probabilistic ILP (PILP), which combined logical induction with probabilistic inference to handle uncertainty in relational data, as exemplified in early systems like PRISM. By the 2020s, advancements emphasized scalability for big data, with systems like ILASP enabling efficient learning of answer set programs over large datasets through constraint-driven search and brave induction techniques.[27] These developments, including FastLAS's incorporation of domain-specific optimization for faster hypothesis scoring, have expanded ILP's applicability to real-world scenarios involving millions of examples, such as bioinformatics and knowledge graph completion.[28] From 2021 to 2025, ILP has advanced with neural-symbolic hybrids and self-supervised techniques, enhancing applications in program synthesis and knowledge discovery, as seen in ongoing work like the IJCLR conference series.[29]Core Approaches
Bottom-up methods
Bottom-up methods in inductive logic programming operate by starting from specific training examples and systematically generalizing them to form more abstract hypotheses, guided by the background knowledge. This approach generates candidate clauses by covering individual examples or small sets of them, then iteratively generalizes these to broader rules that explain the data while respecting the logical structure of the background theory.[30] The process aligns with learning from entailment, where hypotheses must derive the positive examples as logical consequences.[30] A foundational technique in bottom-up generalization is the least general generalisation (LGG), an algorithm that computes the most specific hypothesis subsuming two given clauses via anti-unification. Developed by Plotkin in 1970, LGG identifies the shared structure between the clauses—such as common predicates and arguments—and replaces non-identical subterms with new variables to form a generalization that is the least general one capable of covering both inputs.[19] The formal steps involve recursively traversing the clause structures: for each pair of corresponding terms, if they are identical, retain the term; if they differ but share the same functor, generalize their arguments; otherwise, introduce a variable to abstract the difference. This produces a clause that theta-subsumes the originals without overgeneralizing beyond their common features.[19] Inverse resolution extends bottom-up generalization by inverting the forward resolution step in theorem proving to derive hypotheses backward from examples and background knowledge. Introduced by Muggleton and Buntine in 1988, it includes operators such as absorption (folding a resolvent into a parent clause), identification (merging clauses with identical bodies), and relative least generalisation (computing LGG relative to background clauses).[31] For instance, given a proof deriving an example fact, inverse resolution reconstructs intermediate clauses by reversing resolutions, yielding rules that explain the example while compressing the theory.[31] These methods excel at handling positive examples, as they build hypotheses directly from data coverage, ensuring completeness in exhaustive searches.[14] However, they are prone to the clause explosion problem, where the space of possible generalizations grows exponentially in complex domains with many examples or rich background knowledge, limiting scalability.[14] A classic example illustrates LGG in action: given background knowledge that Tweety and Oscar are birds (bird(tweety). bird(oscar).), and positive examples flies(tweety). and flies(oscar)., the LGG computes the generalization flies(X) :- bird(X). by unifying the heads and bodies, replacing constants with a variable X.[19] This rule covers both examples with minimal generality, forming a concise hypothesis.
Top-down methods
Top-down methods in inductive logic programming (ILP) initiate the hypothesis search from a maximally general clause, such as a headless clause consisting solely of the target predicate, and progressively specialize it to fit the training examples. This approach employs a refinement operator to generate more specific hypotheses, typically through search strategies like beam search or hill-climbing, ensuring the hypothesis space is explored in a downward direction along the generality ordering (e.g., θ-subsumption). The process continues until the hypothesis adequately covers positive examples while excluding negatives, often using a covering algorithm that learns one clause at a time and removes covered positives iteratively. Refinement operators in top-down methods primarily involve adding new literals to the clause body, instantiating variables with constants or terms from the background knowledge, or occasionally splitting clauses to improve coverage. These operators are constrained by a language bias, such as mode declarations that specify allowable predicate arguments and types, to limit the search space and ensure syntactic well-formedness. For instance, adding a literal likebird(X) to a clause specializes it by requiring the subject to satisfy the bird relation, thereby reducing overgeneralization. Such operators draw from resolution-based techniques in logic programming, where unification guides the addition of compatible literals.[32]
A seminal system exemplifying top-down methods is FOIL, developed by Quinlan in 1990, which extends propositional rule learners like ID3 to first-order Horn clauses. FOIL employs a greedy search, selecting literals that maximize an information gain heuristic—essentially the reduction in uncertainty about whether an example is positive or negative, computed as , where and are positives and negatives covered after adding literal , and subscript 0 denotes before addition. This heuristic prioritizes literals that cover many uncovered positives while excluding negatives. FOIL's algorithm follows a separate-and-conquer strategy, as outlined in the following pseudocode:
FOIL(POS, NEG, Target, Arity):
LearnedRules ← ∅
While POS ≠ ∅ do:
[Clause](/page/Clause) ← Target(V1, ..., VArity) ← // Start with empty body
LocalPOS ← POS, LocalNEG ← NEG
While LocalNEG ≠ ∅ and refinement possible:
Select literal L with highest gain over LocalPOS and LocalNEG
Add L to body of [Clause](/page/Clause)
Update LocalPOS ← {e ∈ LocalPOS | [Clause](/page/Clause) covers e}
Update LocalNEG ← {e ∈ LocalNEG | [Clause](/page/Clause) covers e}
Add [Clause](/page/Clause) to LearnedRules
POS ← POS \ LocalPOS // Remove covered positives
Return LearnedRules
FOIL(POS, NEG, Target, Arity):
LearnedRules ← ∅
While POS ≠ ∅ do:
[Clause](/page/Clause) ← Target(V1, ..., VArity) ← // Start with empty body
LocalPOS ← POS, LocalNEG ← NEG
While LocalNEG ≠ ∅ and refinement possible:
Select literal L with highest gain over LocalPOS and LocalNEG
Add L to body of [Clause](/page/Clause)
Update LocalPOS ← {e ∈ LocalPOS | [Clause](/page/Clause) covers e}
Update LocalNEG ← {e ∈ LocalNEG | [Clause](/page/Clause) covers e}
Add [Clause](/page/Clause) to LearnedRules
POS ← POS \ LocalPOS // Remove covered positives
Return LearnedRules
flies/1 predicate, with positive examples like flies(eagle) and negative examples like flies(penguin). Starting from the general clause flies(X)., which covers all examples indiscriminately, the learner adds the literal bird(X) via refinement, yielding flies(X) :- [bird](/page/Bird)(X).. This specializes the hypothesis to cover flying birds (positives) while excluding non-birds (some negatives), though it may still cover exceptional birds like penguins; further clauses or refinements (e.g., in multi-clause theories) handle such cases by learning exceptions separately. This stepwise specialization demonstrates how top-down methods balance coverage and specificity.[32]
Meta-interpretive learning
Meta-interpretive learning (MIL) is a paradigm within inductive logic programming that enables the automated construction of logic programs by interpreting positive and negative examples through a meta-interpreter, where the resulting proofs are transformed into the learned hypothesis program. This approach leverages higher-order logic to support predicate invention and recursive learning, allowing the system to introduce novel predicates not present in the background knowledge while ensuring the hypothesis covers the examples and avoids the negatives.[33] MIL builds on principles of inverse resolution by using meta-level substitutions to derive clauses, but operates at a higher-order level for more expressive hypothesis spaces. The core mechanism involves applying a set of predefined meta-rules, which are higher-order templates constraining the form of learnable clauses, such as the chain rule ∃PQR ∀xyz P(x,y) ← Q(x,z), R(z,y) or the tail-recursive rule ∃PQ ∀xyz P(x,y) ← Q(x,z), P(z,y).[33] These meta-rules are instantiated via meta-substitution during proof construction in the meta-interpreter, effectively folding and unfolding background predicates to build recursive definitions while respecting order constraints to ensure termination and decidability within the dyadic datalog fragment. Predicate invention occurs naturally when existential variables in meta-rules are substituted with new higher-order Skolem constants, enabling the creation of auxiliary predicates that compress and generalize the examples.[34] A prominent implementation is the Metagol system, which operationalizes MIL in Prolog and has demonstrated efficiency in learning tasks requiring invention, such as defining "ancestor" from kinship examples by inventing a "parent" predicate.[33] Metagol has been extended in variants like MetagolO to optimize for resource efficiency, as in learning an O(n log n) quicksort program from input-output traces of list sorting examples.[35] In this case, MetagolO invents recursive predicates for partitioning and appending, constructing a hypothesis like:sort([],[]) :- !.
sort([X|Xs],Ys) :-
partition(Xs,X,Ls,Gs),
sort(Ls,Ls1),
sort(Gs,Gs1),
append(Ls1,[X|Gs1],Ys).
sort([],[]) :- !.
sort([X|Xs],Ys) :-
partition(Xs,X,Ls,Gs),
sort(Ls,Ls1),
sort(Gs,Gs1),
append(Ls1,[X|Gs1],Ys).
