Microsoft-BCS/BCS-IRSG Karen Spärck Jones Award 2020

I am happy to announce that the winner of the 2020 Microsoft-BCS/BCS IRSG Karen Spärck Jones award (to be presented at ECIR 2021 next year) is Dr. Ahmed Hassan Awadallah (Principal Research Manager at Microsoft AI Research in Redmond, WA, USA).

Ahmed has accepted the award. He will give a talk at ECIR 2021 (originally in Lucca, now online only).

I would like to thank the eight independent judges for their valued contributions. 

Report from ECIR 2020 (online)

This post summarizes some impressions from my virtual attendance of ECIR 2020, which was convened as an online-only conference rather than in Lisbon due to the 2019/2020 Coronavirus pandemic.


I set up three machines with monitors side by side and one laptop with one full-screen sublime window open to take notes. Also important is a bottle of water to drink and some chocolate within easy reach, as a four-day sitting marathon doing binge technical talk viewing may be challenging – not something I have done before. Zoom lets you sign on multiple times into different rooms or even the same room, and I also had windows with Slack open (chat) as well
as the conference time-table. [I actually discovered some of this set-up only during Wednesday/day 2, as I was still an online conferencing newbie on Tuesday, busy with running our tutorial.]
Interestingly, I ended up not using the proceedings much, but if you are interested they are free online for one year (e.g. Proc. ECIR Vol. 1: LNCS 12035, ).


The conference has been growing for a while and is still growing. Topic areas include deep learning, entities in search, evaluation,
recommendation, information extraction, retrieval, multimedia, queries, question answering, bias, reproducability, multilinguality. From all these areas, deep learning constituted the largest (perceived) body of contributions, and I would say the work on replication the most unique and among the most exciting; IR research is based not on measuring laws of nature but on assessing methods embodied as software artifacts, which incorporate a plentitude of important decisions — therefore, reproducting and replicating past work is even more important than in the natural sciences.
Nine papers were selected for IR Journal publication (Springer).

Three tutorials were offered, “Principle-to-Program: Neural Methods for Similar Question Retrieval in Online Communities”, “Text Meets Space: Geographic Content Extraction, Resolution and Information Retrieval” and “The Role of Entity Repositories in Information Retrieval”.

The four Workshops were the “International Workshop on Algorithmic Bias in Search and Recommendation (Bias 2020)”, “Bibliometric-Enhanced Information Retrieval [10th Anniversary Workshop Edition]” the “3rd International Workshop on Narrative Extraction from Texts: Text2Story” and the “First International
Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages (SIIRH)”.

According to the organizers, “457 submissions were fielded across all tracks from 57 different countries” (55+46+10+8+12): 55 long papers (26% acceptance rate), 46 short papers (28% acceptance rate), 10 demonstration papers (30% acceptance rate), 8 reproducibility papers (38% acceptance rate) and 12 invited CLEF papers were presented.


My collaborators Ross Purves (Zurich), Katie McDonough (Turing Institute), Bruno Martins (Lisbon) and I ran a tutorial entitled Text Meets Space, which was a four-hour event stretched out over a full day to make space for breaks as well as opening keynotes. Tutorials at ECIR 2020 were not recorded, unlike the main conference, and I did not mind the least as it was the first time for us running this (with the benefit of hindsight, things worked very well).

Check out our slides and materials on GitHub

In the keynote slot, I attended Vanessa Murdoch’s talk on doing science in an industry setting, which was very insightful as she has seen different environments at Yahoo, Microsoft and now Amazon. Being the co-organizer of one event means that I could not sneak out and also attend some of the great events in parallel, like the workshops or the other two tutorials. The parallel bibliographic IR workshop had its 10th anniversary and it featured an interesting line-up; sadly, I later learned they sadly got “Zoom-bombed” (through a combination of human error and poor security design of Zoom: it puts passwords in URLs!) and experienced e-harrassment and vitriol by hooligans defacing their screen; as a lesson for organizers: lock your Zoom rooms, don’t share links including passwords, or avoid Zoom altogether.


Udo Kruschwitz (virtually) handed out the Microsoft-BCS Karen Spark Jones Award 2019 to Chirag Shah from the University of Washington, who gave the award lecture “Task-Based Intelligent Retrieval and Recommendation”. Chirag has tirelessly taken the position in his research that any search activity ought to be seen in the context of a specific task that it is part of.
Chirag’s talk also pointed out the importance of making people
aware of what they do not know that they do not know (he created
the term “Information Fostering” for system behavior that exploits user knowledge to improve that kind of awareness). Microsoft have kindly extended the award budget, and Udo and the rest of the BCS-IRSG committee have handed over the honour of chairing the award committee to me, so if you have a raising star researcher in your lab that is within 10 years of their Ph.D., consider nominating them!

Fard, Thonet and Gaussier introduced a minimally-supervised deep learning based method for clustering where cluster formation is informed by seeds. The method is applied to 5 data-sets, including Reuters 31578 (there was one other paper that used Reuters data — RCV1/2 — also by Grenoble researchers, Doinychko and Amini). Meng et al’s ReadNet is a neural model for readability scoring (featuring a nice synopsis of past work on p.37). Rebuffel et al., an interdisciplinary team of Sorbonne, BNP Paribas and Criteo researchers, presented a transformer-based model for NLG from structured data, developed as part of the H2020 AI4EU project. Successful academic-industry collaboration was a recurring topic this year, with several KTP projects, EU projects and commercially-funded collaborations presenting their successful output. Kato et al. is a very relevant paper for the finance sector that is interested in company named entities. It proposes models to score entities by various criteria, e.g. a country may be assessed by its crime rate, inter alia (such entity ranking has been part of INEX and TREC). Gerritse, Hasibi and de Vries’ paper on entity ranking can likewise be
applied to companies. Giannopoulos, Kaffes and Kostoulas’ paper “Learning Advanced Similarities and Training Features for Topponym Interlinking”. Give a pair t1, t2 of place names,
do they correspond to the same real-world spatial entity? This is of
course location named entity disambiguation without full resolution, and as such an alternative may be to just resolve toponyms and then check for equality of the resulting spatial footprints. Their approach is to define a “meta-similarity” (“LGM-Sim”). Saier and Farber model citation contexts in order to improve recommendations
for scientific papers by including claim evidence. Sikka et al. presented a predictive model to estimate a piece of code’s
asymptotic complexity, of course an impossible task (still, F1=.65).
Brochier et al. (U Lyon)’s “Inductive Document Network Embedding
with Topic-Word Attention” introduces “Topic-Word Attention” (TWA), a concept for the interplay between word and topic representations, a bit of progress on the topic model front (see example output p.338). Lu, Du and Nie’s VCGN-BERT extends classic BERT with graph convolutional networks for better text classification (p. 405 ff.) through incorporating global and local information about the vocabulary. Camara and Hauff is an important paper, as it shows how and how much BERT can contribute to core retrieval, and how to analyze this using the framework proposed by Rennings et al. last year (Fang’s “retrieval heuristics” + “diagnostic datasets”).

I liked the ECIR programme for its variety, which ranged from Arabic applications to medical retrieval, from disaster tweets to style transfer in NMT, and submissions from 57 countries, including several acceptances with authors from the US, China, India and South Korea elevate ECIR’s standing as a not-just-European conference.


Jimmy Lin’s reproducibility talk tried to get retrieval system setups
to work again after several years, and that was a great example of
technical debt and software ‘rot’ (‘evolution of the system environment’, as some might call it). He pointed out that from Robertson et al. (1994)’s BM25 model to Lucene’s adoption of the same (2015) it took the community 21 years – TWO DECADES! – and he raised the very valid question how we may be able to expedite tech transfer between R&D labs and open source packages that were not prima facie spun out from the former.


Unni Krishnan presented a collaboration between Microsoft and academic researchers on creating plausible but synthetic query logs in order to enable data sharing for open research on query auto-completion research and
development. To apply query auto-complete algorithms, and therefore, to do research in this space, what is needed is a set of partial query strings that represent states in a query entry process by a user, e.g.

kung f
kung fu
kung fu panda

Notably, the paper introduces the notion of a surrogate log based on
abstract(ed) QAC logs <4, 2, 9> (a tuple comprising the length of the original query words): matching these signatures leads to finding corresponding target signatures in a synthetic target corpus for a seed signature computed from the unsharable source log. The method proposed covers sampling queries from seeds to accomodate empirically found power law distributions, a language model to find good replacement and substitution words that are used to emulate user typos and a comparison strategy. The authors demonstrated the efficacy of their proposal on 3 data-sets: Wiki-Clickstream and two 2018 logs from the Microsoft Bing Web search engine. The evaluation shows the power law property of the natural logs are retained, and that Heap’s law, N-gram frequency, empirical entropy
are consistent with the non-synthetic logs (another paper, Jaiswal, Liu and Frommholz, also tackles auto-complete, but of image queries). Krishnan et al.’s paper won the Signal Industry Award because it enabled research that would otherwise not happen or be hidden from the open scientific process. Query auto-complete is a practical problem of many information access systems, and making plausible, even synthetic, data-sets available that can be shared builds a bridge between companies and academic researchers to evaluate their methods on a common reference, whilst protecting companies that would like to share log data from privacy issues as they also have to protect their users. It is interesting that this work came out of SocialNUI, the Microsoft Research Centre for Social Natural User Interfaces, a partnership between Microsoft
Research, Microsoft Australia, the University of Melbourne and the Victorian State Government between 2014-2018, hosted by the Interaction Design Lab group in the School of Computing and Information Systems at the University of Melbourne, which makes it a success story in academic-industry collaboration.
[Disclosure: I was on the ECIR 2020 Signal Industry Award selection committee.]

Zhong et al. presented work on mathematical formula search done by the CiteSeerX team (there were also papers on finding tables and chemical compounds). Uprety et al. model users’ decision making, in particular uncertainty pertaining to it, using the physics behind quantum physics. Witschel, Riesen and Grether present a question answering system over knowledge graphs: questions are translated into Cypher, an open-source graph query language (p.789).

Wang et al. is a great paper on searching news archives, in particular how to answer event questions for which the temporal dimension must be modelled to do well.

The reproducibility track was dominated by Jimmy Lin’s group, which presented several papers there. My favorite, Lin and Zhang, re-ran old experiments using popular IR engines, and found out that due to technical debt of the platforms, often systems could no longer be executed after a few years; interestingly, this didn’t apply to the Terrier platform, as it is written in Java, whose virtual machine insulates the software from the (changing) external environment. (The reproducibility track at ECIR is always worth stopping by.)

Ghanem et al. presented a method to detect irony in several language with F1 between 74%-80%. Hashemi et al. presented ANTIQUE, an open-domain question answering test collection (|Q|=2,626; |Rel|=34k). They also presented some baseline benchmark results for open-domain QA models (vol. II, p.171), e.g. BERT P@1=70.92. Ishigaki et al. presented a new neural abstractive summarization model, which is query-informed (vol. II, p.210). Machida et al.,
another paper on summarization, is extractive and uses question-answer pairs (vol. II, p.291).

Once in a while you can find a really clever idea explored in a paper as
Papadakos and Kalipolitis did in my opinion with their study of
antonyms in Web queries (vol. II, p.356): using query pairs containing antonyms like “capitalism and war” versus “capitalism and peace” they explore how considering/bringing in the “opposite” query can help systems help users in getting a better idea of the conceptual result space.

Sanchez et al. is an interesting application, namely keeping tabs on
the evolution of legislation. The team of authors from Signal AI and UCL use a combination of learning to rank and BERT (part II, p.372).
Researchers at Bloomberg London presented “Identifying Notable News Stories” (Saravanou1, Stefanoni and Meij), a paper in which new stories are compared to past (known) notable events. Devezas and Nunes gave an online demo of Army ANT, a software to conveniently conduct IR experiments that is essentially a Python wrapper around collections, indices and retrieval evaluations. Froebe et al. is a demo of a search engine for German police reports, where a news story about a crime (its URL) is the query and the top-k police
reports are retrieved to support the veracity of the story (I also learned from the paper that there was a hostage situation in a cinema I used to go to in 2016, which had eluded me completely due to my expatriate existence). Martins and Mourão’s system Revisionista.PT tracks post-publication edits to keep track of 140k articles by 12 Portuguese news outlets.

At the very interesting Industry Day, Pedro Noguriera gave a talk about Farfetch on faceted search in the fashion industry (I should say Levi Strauss was a sponsor of ECIR 2020, so fashion definitely met retrieval this year). Farfetch provides a global high-end fashion search to access luxury fashion to people elsewhere. 4,500+ staff, average spent per order $608 (!). A high-end, high-margin niche market play, where trends change fast, but they seem to be doing rather well.

Ashlee Edwards a scientist at Netflix presented some of her research and
product innovation work in the areas of consumer insight, design and product
management. Her background includes studies on stress caused by software,
subtitle usability for blind viewers of movies online.

Overall, I found attending the conference very useful, but not quite fun. Four days
of technical material is challenging to endure in a strange place with a jet-lag,
but sitting through it alone at home in your room frankly makes one’s back hurt,
and is even more lonely than working from home, where at least you interact
more virtually. I was looking forward to this event as much as to Lisbon
itself, and no Webinar can make up for consuming pastéis de nata in the
conference break and meeting your Portugese friends again over a dinner with
Fado in a down-to-earth restaurant with Lisbon’s signature blue-white tiled
walls. Normally, one returns home energized and with a set of photos, but a
virtual event does not give you much back emotionally: for example, it is hard
or close to impossible to meet new people at a virtual conference (who would
just double click on someone’s name on Slack to talk to a complete stranger?),
so I feel for all those student for whom this may have the first conference,
that they did not get the rite of passage that comes with attending your first
one and giving that first ever talk. I liked Tuesday and Friday best, because
the interactive tutorial presentations were engaging and the industry day was
lighter and therefore perhaps easier to sit through remotely.
For completeness’ sake, I should also point out that online-only
conferences of course have advantages: they might be easier to attend for
students struggling with visa issues or small travel budgets, especially if
they are based far from Europe. The environment is less burdened with flights
and attendees of presentations may join that were not planning to attend the
physical conference (possible because the organizers opened up the event to
the world, which increases the exposure for the work published).
The ECIR organizers deserve a big thanks as they rescued our ECIR 2020
conference by turning it into an online event spontaneously. Organizing any
event is hard enough, but they did it twice, first the real conference and
then the online incarnation (not to mention handing down 1,600+ pages of
proceedings down on us!).