Goodreads and the Hard Problem of Book Recommendations
Even aside from the problems of reviewers who have unrealistic standards, who haven't read the book, or who hate the author - Goodreads would still have a hard problem.
When you're one of the leading websites in your niche, there probably is such a thing as bad publicity, and Goodreads was getting it a couple months ago.
First-time author Cecilia Rabess had her debut novel slammed with bad reviews six months before it was to be released. After a plot summary went viral on Twitter (apparently, a Black protagonist falls in love with an initially-bigoted coworker), an Internet flash mob attacked the book as inherently racist and "review-bombed" it with a flood of simultaneous bad reviews. The book did get some buzz, but very few sales. Even more recently, established author Elizabeth Gilbert withdrew her latest book from publication after it was slammed with bad pre-release reviews by people objecting to a novel set in Russia being published during the invasion of Ukraine.
(And then after the publication of these news articles, Goodreads went down for several hours for unclear reasons, and The Onion chimed in with a satire.)
On the one hand, this one author has seen her first novel flop, and another author has decided to not publish her novel. That's horrible. We could blame the crowd of anonymous commenters and "reviewers" for that, and insist that Goodreads patch this individual issue and stop things like this from happening. I agree, I'm disturbed by the level of constraints they want to put on future authors, well beyond what would help the story by prompting further thinking. Or, we could blame the publishing industry for prioritizing commercial considerations above good stories; I can see some avenues by which they might have helped create the environment where this could happen.
But let's go even deeper than this. Let's talk about the inherent difficulties with what Goodreads has set themselves to do, which they'd still be faced with even if they magically found a solution to the problem of "review-bombing" online mobs.
Even aside from the problems of reviewers who have unrealistic standards, reviewers who haven't read the book, and reviewers who just hate the author - even pretending that all reviewers are talking about books they've read sincerely and in good faith, Goodreads would still have a hard problem.
The trademark of Goodreads recommendations, which visitors will see at first glance next to the book's title and cover, is the star ranking. That's an average of every rating people have given to the book: if it's been rated by four people, and two of them rate it three stars and the other two rate it four stars, Goodreads will show it as having three and a half stars. However, this one system ranking what everyone thinks about a book isn't actually useful for much.
For example, I know my own taste varies from even my sister's, let alone my parents' or friends'. When I recommend books to my sister, I'll carefully filter for her taste, which gives a very different result from recommending a book in general. I learned to do this back when we were teenagers and I urged her to read Cities in Flight. I loved the book, but she hated it because there was absolutely nothing to grab her interest there. Looking back, I can totally see why - she wanted better characters and didn't care for grandness of science-fiction premises or plots. A Goodreads star rating will merge tens of thousands of perspectives like this. In the end, it's useful for sociological comparisons (like I did once), but not much else.
Detailed reviews - which Goodreads also has, if you click through and scroll down - can go a long way toward ameliorating this problem with star ratings. When I use Goodreads to check out a book that's already been recommended to me, I barely glance at the rating and mostly look at the reviews. If the positive reviews are gushing about things I don't care for, that's bad; if the negative reviews are complaining about things I don't mind, that's good. I don't use them for judgment; I use them to effectively peek inside the book through the reviewers' eyes.
Amazon - the corporate owner of Goodreads - tries to make ratings more applicable to someone in its recommendation engine on its main site: when you're browsing books, you see "Books You Might Like" based on the books you've viewed and bought in the past, and other things that it deems similar. One thing it goes by is what people who've bought the same things as you have also bought. This's a good strategy: if I know someone likes my favorite authors, I'll lend more trust to their other book recommendations. It's too bad Amazon didn't extend this to Goodreads to take into account the star ranking system. According to this one article, they wanted to but were stymied by Goodreads' convoluted programming code; I totally believe it.
But then, I'm not the prototypical target audience for Goodreads. I haven't really gone looking for book recommendations in years, because my "To Read" list is already a few hundred titles long. I get my book recommendations in passing from places I'm looking for other reasons, like getting history book recommendations from /r/AskHistorians, or getting novel recommendations from authors whose own books I like or online forum commenters whose taste I already know. This's a much higher-variance strategy than going by the star rating on Goodreads, and it requires more effort, but it's more exact. I'm not sure why more people don't do this, but I suspect it's because they aren't naturally in so many places where people are discussing books, so they would need to actually put in that effort. Also, it's quite possible I'm unusually self-aware of the ways in which my taste in books differs from the average.
If I was able to design a computerized book recommendation system, what I'd do is weight each Goodreads rating by how much that reviewer's opinion agrees with yours about books you've both read. So, if Alice and Bob both like one book that Carol doesn't like, when Alice goes looking at other books, Bob's rating will have more weight than Carol's in the overall star rating that Alice sees.
On the downside, this would be expensive to compute when there're a lot of reviews. On the upside, this's very similar to the sort of thing large language models like ChatGPT already do. ChatGPT is trained to determine what's a likely text to come after the prompt, just like this recommendation system is computing what's a likely book to align with the ones you've read.
Of course, ChatGPT itself would be a very bad book recommendation engine. It's not designed to do it, and when you make it try, it recommends books that don't exist or don't say what it claims they say. This's primarily because it isn't built to recognize individual book titles as things - it just knows words and phrases; it doesn't tokenize titles as units. On top of that, it's trained on as much English text as its trainers can find, rather than focusing on things related to books. But if someone trained a similar AI to make book recommendations - emphasizing the right training data, and tokenizing in the right ways - I think it would do a good job of this.
But, this system of weighting reviews, or even this hypothetical AI trained in a similar manner, doesn't really capture the whole problem. There're many dimensions to a story. I'm reminded of Michael Flynn's Eifelheim: it's badly written stylistically, plot threads go next to nowhere or peter out, and a lot of the major characters are unsympathetic. But I'll pardon all those faults because the concept of aliens in medieval Europe (from the sympathetic perspective of the village priest) is something I personally like just that much. If an algorithm notices I liked Eifelheim and recommends me more books with similar plots and characters - well, when I tried reading more of Flynn's books, I abandoned both of them a chapter or two in. They appeared to have similar poor writing and unsympathetic characters, and without the great concept and dynamic of Eifelheim, those same Flynn-style things made me abandon them.
(I noticed a similar trend with the amateur online counterfactual history community, back when I was in it. Most of us knew that the stories we wrote about the "what-if"'s of history were of low literary merit, but we read them anyway because there weren't any published authors that got into the details of history to the same extent. Just like with Eifelheim, I read and enjoyed them for that one element. This was an itch we wanted to scratch. I suspect a lot of low-literary-merit fanfiction is read for the same reason: it scratches an itch its readers have.)
In theory, a review algorithm could ask me not just what I thought of Eifelheim but what I thought of each different element in it, and then recommend me not just based on which books I liked but which elements I appreciated in them. In practice though, even I would get impatient after too many questions like this, and I expect other people who're less introspective about their taste in books would get fed up much faster.
So, I think the least bad practical approach would be the weighted book recommendation system I described above. And then, on top of that, advertise different concepts adjacent in some way to the genres someone likes. For example, perhaps an algorithm could recommend Eifelheim to someone who likes first-contact science fiction and religious fiction even if other adjacencies wouldn't normally lead to it? I checked out Eifelheim, and the online counterfactual history community, the moment I heard their concepts, even though I hadn't considered those particular concepts before.
On a bottom line, what should Goodreads do?
Amazon was hoping to bring in more social community elements. That wouldn't be a bad thing in itself, and if done well, it would also provide more personal connection allowing for more personalized book recommendations (like I get from /r/AskHistorians and other web forums). I'm skeptical that would've worked on any large scale, though. And as it was, it was apparently stymied by Goodreads' codebase - which unfortunately also means it couldn't be revised into a weighted recommendation system.
As it is, at the moment, Goodreads is left with the problem of review-bombing. They could prevent bombing of not-yet-released books. I'm sure none of the 505 reviews of George R. R. Martin's The Winds of Winter are legitimate; he hasn't even finished writing the book yet! But that still leaves similar problems with reviews of already-published books from readers whose experience is unrepresentative of the average person who'll read the book, or reviewers who haven’t read the book at all. You could perhaps gate reviews on having read several books in the same genre, but even that wouldn't stop most of the problem. If you can't reform the whole rating system, this's a hard problem.
I don't know what Goodreads or other review aggregation sites might be doing about these immediate issues. But whatever they do about them, they'll still be left with the inherently hard problem of giving book recommendations at scale.
I toyed around with a different solution to this problem - adding more metrics. What if users rated a book with 1-5 stars, but also on its pacing, tone, style, etc? Then you could look for books that have a high rating AND the other elements you enjoy.
For what it’s worth, I built a prototype of this. Haven’t fully decided if the idea is working or not, but here you go: https://inkhrt.com.
Weighting review-scores by agreement is not actually all that computationally costly, since there are various tricks that are used to reduce compute needed (using only a random sample of all reviews; binning readers into categories; relying on particularly significant differences, etc), and it can be done ahead-of-time. There are a lot of people working very hard to optimize recommendation engines (for ad serving especially) and I'm sure Amazon has a bunch of them on staff. It's most likely that the issue is just Goodreads never having been intended to be so large.
Implementing a good recommender system would solve some of the review bombing problem too, since those bombers would be significantly weighted downwards, I think.