I. Introduction
Probabilistic soft logic (PSL) is a machine learning framework for developing probabilistic models, developed jointly by the University of California, Santa Cruz and the University of Maryland. At present, its complex environment construction method and Groovy language expression have brought many difficulties to beginners like authors, and many dependencies make small errors of models that have been built frequently.
After efforts, the author constructs a single jar package and adds a coding mechanism to enable it to support data in various languages. There are three main contributions:
1. Package the original complex dependencies of PSL into a single jar package, and add the dependencies to start using.
2. Converting all the unfamiliar Groovy language model building methods into Java support, only one. java file can be used to build the model.
3. Encoding mechanism is added to enable PSL to handle languages other than English.
Links: https://pan.baidu.com/s/1PybpNoPpvk4jmSMw7Rm_7A Password: g1cx
There are three files in the link folder:
PSL_swust1.0.jar Modified PSL Model
Modified version of the official example of SimpleAcquaintances.zip PSL (excluding weight learning and functions)
A modified version of the official Entity_resolution.zip PSL example (including weight learning and functions)
2. Examples
Take SimpleAcquaintances.java in SimpleAcquaintances as an example.
1. configuration items
/* * ======[Configuration Items]====== */ Tool tool = new Tool(); DataStore datastore; HashMap<String, Partition> partitions = new HashMap<String, Partition>(); String path = tool.getPath(new SimpleAcquaintances().getClass()) + "/../data/";// Change SimpleAcquaintances to the current class name String[] paths = tool.getFiles(path); PSLMODEL psl = new PSLMODEL(paths, "H2");// When installing postgresql database, H2 can be changed to postgresql datastore = psl.getDatastore(); psl.transcoding = false;//Whether or not to encode data (this value only determines whether or not the data is encoded, predicates are encoded by default)
SimpleAcquaintances need to be renamed to the current class name in order to obtain the data folder of the current project folder. In addition, when installing and configuring postgreSQL, you can change H2 to PostgreSQL to use the postgreSQL database (H2 is the model database, running in memory). When the transcoding item is set to true, the data will be coded (at this time, the predicate is set to UniqueIntID attribute, which improves the calculation efficiency of the model). The data of various languages can be processed. However, after encoding, the PSL similarity calculation function can not be reasonably used (because the data after encoding is not the original string, the similarity function is seldom used, and the custom function can be used normally). )
2. Define partitions
// Weight learning partition // partitions.put("learn_obs", datastore.getPartition("learn_obs")); // partitions.put("learn_target", // datastore.getPartition("learn_target")); // partitions.put("learn_truth", datastore.getPartition("learn_truth")); // Experimental zoning datastore = psl.getDatastore(); partitions.put("obs", datastore.getPartition("obs")); partitions.put("target", datastore.getPartition("target")); partitions.put("truth", datastore.getPartition("truth")); psl.setPartitions(partitions);
When weighted learning is needed (with training data), weighted learning partitions need to be defined. Objects represent known data partitions, target represents the target data storage partition to be inferred (when using Lazy Inference reasoning, you can not load data into it), and true is the real data partition.
3. Definition of predicate (function)
HashMap<String, ConstantType[]> p = new HashMap<String, ConstantType[]>(); HashMap<String, ExternalFunction> f = new HashMap<String, ExternalFunction>(); // Adding predicates p.put("Lived", new ConstantType[] { ConstantType.UniqueStringID,ConstantType.UniqueStringID }); p.put("Likes", new ConstantType[] { ConstantType.UniqueStringID,ConstantType.UniqueStringID }); p.put("Knows", new ConstantType[] { ConstantType.UniqueStringID,ConstantType.UniqueStringID }); // Add function // f.put("SameInitials", new SameInitials()); // f.put("SameNumTokens", new SameNumTokens()); psl.definePredicates(p, f);// Predicate and function input model
Predicate definitions can be replaced and added at will. Common attributes include UniqueString ID, UniqueIntID, String, etc. Functions can define PSL's own similarity function (when transcoding is false).
4. Rule Definition
String[] rules = { "20.0: ( LIVED(P1, L) & (P1 != P2) & LIVED(P2, L) ) >> KNOWS(P1, P2) ^2", "5.0: ( (L1 != L2) & (P1 != P2) & LIVED(P2, L2) & LIVED(P1, L1) ) >> ~( KNOWS(P1, P2) ) ^2", "10.0: ( LIKES(P2, L) & (P1 != P2) & LIKES(P1, L) ) >> KNOWS(P1, P2) ^2", "5.0: ( KNOWS(P1, P2) & KNOWS(P2, P3) & (P1 != P3) ) >> KNOWS(P1, P3) ^2", "1.0 * KNOWS(P1, P2) + -1.0 * KNOWS(P2, P1) = 0.0 .", "5.0: ~( KNOWS(P1, P2) ) ^2" }; psl.defineRules(rules);// Rule Input Model
Weight: rule body > rule header, ^ 2 represents square optimization, adding rules according to the rule format in two examples.
5. Importing data
/* * ======[Import data]====== * Where "1-2" means transcoding one or two columns of data * It only works when transcoding = true, meaning that only 1,2 columns are transcoded. */ psl.loadData("Lived", path + "Lived_obs.txt", "obs", "1-2"); psl.loadDataTruth("Likes", path + "likes_obs.txt", "obs", "1-2"); psl.loadData("Knows", path + "knows_obs.txt", "obs", "1-2"); psl.loadData("Knows", path + "knows_targets.txt", "target","1-2"); psl.loadDataTruth("Knows", path + "knows_truth.txt", "truth","1-2"); // ArrayList<String[]> likepe = tool.fileToArrayList(path + "likes_obs.txt", "1-2-3"); // psl.insertDataTruth("Likes", likepe, "obs"); // psl.insertData("Likes", likepe, "obs");
Four methods are provided to load data, load Data Truth, insert Data and insert Data Truth. Load Data ("predicate", "predicate corresponds to file path", "import partition", "1-2"), load Data and load Data Truth differ from the last list of probability values of load Data Truth; insert Data, insert Data Truth need to convert files into List data, which is suitable for storing multiple predicates in a data file. Data "1-2-3" represents the data to be retrieved as predicates, and the n-th item of "1-2-...-n" for each data retrieved by insert Data Truth is the probability value. Among them, "1-2" means to transcode one or two columns of data, which only works when transcoding = true. It means to transcode only one or two columns of data file, and to increase multiple columns, separated by "-".
6. Weight learning
// psl.learnWeights("learn_target", "Lived-Likes", "learn_obs", "learn_truth","MaxLikelihoodMPE");
("Training data target partition", "Closed predicate" (i.e., atoms that will not be generated in the reasoning process as known data), "Training data known data partition", "Real data partition", "Weight learning method")
When there is training data, the weight of rules can be optimized by weight learning. Five weight optimization methods are realized:
"LazyMaxLikelihoodMPE",
"MaxLikelihoodMPE",
"MaxPiecewisePseudoLikelihood",
"MaxPseudoLikelihood",
"SimplexSampler"
Replacement can be used.
7. Print Output Model
psl.printModel();
You can view the models that have been defined.
8. Running Reasoning
// psl.runLazyInference("known data partition", "target partition" (storage results)"; // psl.runLazyInference("obs", "target"); // psl.runInference(""known data partition","closed predicate 1-closed predicate 2","target partition (including defined target atoms)"); psl.runInference("obs","Lived-Likes" , "target");
Two kinds of reasoning methods, Lazympe Inference and MPE Inference.
9. Data Output
psl.writeOutput("target", "Knows", path + "/result/knows_inffer.txt");
("target partition", "data to be output corresponds to predicate 1-data to be output corresponds to predicate 2", output path).
10. Evaluate the experimental results
psl.evalResults("target", "truth", "Knows", path + "/result/evalResults.txt");
("target partition", "real data partition", "target predicate 1-target predicate 2"), evaluation result output path). It is worth mentioning that "target predicate 1 - target predicate 2" needs to contain predicates corresponding to the data contained in all real data partitions.
11. Close the model
psl.closeModel();
When the reasoning is complete, close the model.