Skip to main content

Reviewing Elicit

During the semester, the library staff at the Université Libre de Bruxelles offered a series of 15 minute introductions of research tools for students. One of the introduction mini-courses that I attended was about an AI literature review tool, called Elicit. After the session, I did two sessions of playing around with the tool, the first one taking in March and the other one dated a few days ago. In this post, I review the experiences that I have with Elicit so far.

But first, why did I try Elicit? At the time, I wondered what would happen if we picked up research literature based on textual patterns tied to the keywords we put in. One of the challenges that I experienced in finding literature through traditional keyword matching is 1/ finding the keywords that researchers use in their historiography of Chinese overseas students and 2/selecting threads of literature that focus on the subtopics I cared about. So, the idea of using an LLM that draws higher-level overarching connections between different bodies of literature and makes selection based on a defined context sounded very promising.

After the crash course, I played around with one function of Elicit (free version), which is "finding papers." Basically, you type in a research question and you are guided to the next page that lists the top ten articles, accompanied by an AI summary based on the articles. The prompt I used to experiment is how researchers have defined mobility and migration in the context of Chinese overseas students in the 20th century. 

The results I got were... Weird. Firstly, most of the selected articles had nothing to do with the time period I am interested in. Secondly, there were book reviews in my selection, which - yeah - is about research, but is not research. 

What likely happened is the limitations of the data coming through. The LLM-based literature search in Elicit seems to be similar to the traditional keyword search in a university database. It likely relies on matching keywords in titles and abstracts, with the key difference that it also ranks results based on the similarity between the research question and the abstracts in terms of word patterns (which the top 10 articles is based on). Such set-up can be very powerful, except if the pool of texts that the model is trained on is limited, and I suspect that because of Elicit's narrow focus on research papers, the training data cannot cover all of the relevant literature produced in the humanities and in particular history, which relies more on books and chapters than natural and health sciences.

In other words, when using LLM- based search, I still face the same problem of “keyword gaps” that I had in traditional keyword search, except I had the additional burden of figuring out algorithmic idiosyncrasies to “prompt my way around them.”

Did I use part of the end result? No.

Will I re-use Elicit? No.