The FACTORIE Benchmark uses the FACTORIE toolkit for deployable probabilistic modeling to extract topics using Latent Dirichlet Allocation (LDA). This task has been selected from among the examples distributed with FACTORIE because its input data is readily available, unlike that of many other examples. Specifically, the task’s input is drawn from a set of NIPS conference papers; it has only been minimally modified to conform to FACTORIE’s expectations. Each workload of the FACTORIE Benchmark extracts topics from a subset of papers.


According to Tim Vieira, one of FACTORIE’s developers, "[it] is (currently) very stateful so parallelism is a bit difficult". The FACTORIE Benchmark is thus both externally and internally single‐threaded.


We are grateful to Tim Vieira for his support in setting up FACTORIE and in identifying possible input data. We are furthermore deeply grateful to the late Sam Roweis for making publicly available this data.