Suggestions for future improvements to our forum software

I edited the above just fine (deleted some words) and no smart quotes in sight. Remember that typographical entity processing only happens after you post, not before.

(now if you are doing something really weird like manually copying text out of the preview window instead of the editor, God Help You)

So… search gets an A+ for locating the only post on the forum that has a unique and obscure string in it. Good work.

But let’s try out some use cases of things I might actually search for in the real world. Say, I just finished a game and want to post my thoughts about it:

It surfaced the right thread just fine, but please explain under what conceivable logic the appropriate arrival point in that thread should be the undistinguished post #42/225, which contains none of my search terms?

Why is it at all surprising that this feels random, @wumpus?

I’ll give this use case a B-. It gets me most of the way to the right place, but requires a pointless extra step to get to the bottom to start replying, following a moment of “why am I here?” disorientation.

I think the disconnect is that my thought process when searching is almost always “Where was the thread about ______ ?” rather than “Where was the post that contained ‘__________’ ?” I naturally want to land at my own personal last-read position in that thread, just as if I had clicked it from one of the other navigation methods.

The obvious fix to me would be to decouple the thread title from the text snippet preview in terms of links. If you want the current functionality, click on the text snippet and go straight to that post. If you want to go to your last-read position, click on the thread title.

And as long as we’re talking about the search function’s deficiencies, it’s got a severe recency bias that makes it all-but-useless for locating older threads. There’s a valid argument for a moderate preference for more recent results, but not to anywhere near the extent that it’s done here.

Two issues there. First is that the preview text snippet for some reason highlights spurious partial matches like ‘looking’ rather than the verbatim match of the entire search term elsewhere in the same post. More importantly, we have a 500+ post thread chock-full of juicy, detailed discussion, dedicated to that game, but Discourse thinks I want random drive-by mentions in unrelated threads, just because they happen to be more recent?! F-!

Compare with Google, which surfaces no less than 8 dedicated threads before getting to the first match that’s just a mention in a thread on another topic:

That is indeed odd, I can’t explain why post #42 would be the target for a search on “Nier Automata” without the quotes. I’ve definitely never been in that topic, and since we both got shunted there… I dunno, let me look into that one.

Was there something done to the post DB around February? While playing around with it, I noticed that the search result for most active threads wind up sending you somewhere around that timeframe.

In general I do not agree that every search is de-facto searching for a topic title, though topic title matches should get a pretty hard bump in the search results. We could debate how strong that effect should be.

However, I do agree that we should have a “scope to just the topic title” advanced search option; this is as close as I can get right now:

https://forum.quartertothree.com/search?expanded=true&q=in%3Afirst%20king%20of%20dragon%20pass

in other words “scope to just the first post” … as it turns out, there are a disturbing number of topics where the first post body text does indeed contain the words “king of dragon pass”.

Yeah, I wouldn’t want to completely take away the ability to pinpoint the most recent usage of a specific phrase, it’s just the opposite of what I tend to use search for, so I find it being the assumed goal of search annoying.

Agreed. But even without the title or scoping to the first post, there’s something to be said for volume and concentration of search matches as a heuristic when trying to find the thread about a particular topic rather than just a random mention.

For example, take ‘trails in the sky’. The top hit is What will you play in the weekend? , which contains four matches of the search term spread across 1,730 posts. But even without the thread title as a clue, a more sophisticated search algorithm could recognize that Excellent classic JRPG on Steam NOW, $16.99 contains a dozen matches in 90 posts, so a much larger percentage of the thread is about the search term topic. But that thread isn’t even in the top 5 results, beaten out by, of all threads, Bethesda finally let the other shoe drop on paid mods at E3 2017, thanks to its several matches of ‘Skyrim’ and none at all of ‘Trails’.

Did you try it in quotes?

https://forum.quartertothree.com/search?q=%22trails%20in%20the%20sky%22

The highlighting is definitely wonky (it should only highlight the phrase when quotes are used), but I think this works, and should work.

It gets rid of the Skyrim and No Man’s Sky threads, but doesn’t make enough of a difference thanks to the priority given to recent drive-by mentions over older dedicated threads. Again, it’s not that the obviously correct thread results don’t show up at all, it’s just that they’re a ways down in the weeds, beaten out by threads that shouldn’t even be close to the same relevance to the search term.

And honestly, if the correct threads aren’t surfaced in the 5 results visible in the popup box, I’m much more likely to just pop over to google rather than futzing with advanced search or different search strings to find them. Witness:

Interestingly, I noticed that all the Google results for Qt3 threads claim that they have only 20 posts and even fewer authors. They’re obviously crawling the whole thread given that they do a much better job surfacing relevant threads than Discourse search does, but I wonder if there’s something you can adjust to present accurate stats there.

To chime in, @wumpus, my almost exclusive use case for search is finding whole threads about a topic, not a specific post with an almost unique search term buried in it. So the default behavior of, say, searching for Skyrim Mod Sales and getting a clickable link to some seemingly random spot in the thread about paid mods for Skyrim (let’s just say, for shits and giggles, that it–seemingly–randomly surfaces, oh, the 127th post out of the, say 600 or so posts in this thread I’ve never actually read before, and yes, those numbers are pulled out of my ass cuz I’m on my phone and lazy) now just drops me, context free, into the middle of a related thread, probably breaking my Last Read position in said thread in the process, when my ideal behavior would have been to click it and be taken to my first unread post (in this theoretical, the very first).

Maintaining the current specific post function you described above is certainly okay for other uses, but I join the others for search functionality that better serves thread-based searching (surfacing threads completely about the topic searched for and then linking you to the top or first unread post) would be enormously useful.

Sure, a solution was already posted for that. If you want only topic title matches, use

in:first

thus don’t search for

random

search for

in:first random

If you forget the words, just visit the full search page at https://forum.quartertothree.com/search and use the UI there which will insert the right words for you.

(We really need to add an in:title clause because “in the first post” doesn’t really capture the intent of searching by title only.)

The older the topic, the more likely this is to happen due to recency bias. So if you know you are searching for a topic from years (or even a decade) ago, you’ll need to change up your strategy. Regardless, use in:first when you search, since that’s what you seem to want, and in:title doesn’t quite exist yet.

Incidentally this is just one of many reasons why having decade (or even years) long running megatopics is a bad idea.

So then, in summary, what you want is

in:first "trails in the sky"

However this doesn’t seem to work, which I think is a bug.

Note that if people give their topic crazy super k00l l33t clever titles that don’t include the game in the title, you’re better off with the in:first assuming they aren’t so spectacularly clever that the entire first post never mentions the correct game name, either…

Sure… That’s why I complained about the recency bias in the first place. But it’s not a law of nature; you ultimately determine the weight of the recency bias in the search algorithm, right? Did you deliberately tune it to drown out all other measures of relevance? Are the sorts of results I saw for KoDP and TitS working as intended, matching your view of what should be expected for those search terms? If so, then carry on, I suppose. But if not, then I would suggest making some algorithmic weighting tweaks as future improvements to our forum software. It’s only a relatively minor inconvenience for me to go to Google for useful search results, but you have a vested interest in improving Discourse for all its users.

And I agree with retiring the truly huge mega topics, but I’m not sure that applies here. Would that 500 post KoDP conversation have benefited from being spread across a dozen threads as new people picked it up over the years, unable to easily refer to previous impressions? It’s all discussion on one particular topic.

Nah, what I want is sensible default search behavior that weights results closer to how I (and Google) perceive their relevance.

And I don’t think that not getting results there is actually a bug. I picked that thread because the original poster infamously omitted the game title from his gushing recommendation. So it serves as an example of a thread about a subject that’s not actually mentioned in the first post (unless you want extra credit by pulling the words out of the steam url). Similar, if less dramatic, examples occur when a thread starts out discussing some topic but after a few posts goes off on a lengthy tangent that ends up outweighing the original discussion. Limiting the search scope by title or first post can’t catch these, but a more sophisticated search algorithm could recognize that more that 10% of the posts in the thread contain the search term, so it is a major topic of discussion, and should be weighted accordingly.

You have to compare the number of employees and market cap of Google to Discourse. This is like looking at a toddler playing basketball and demanding to know why they can’t start as a forward for the New York Knicks in the NBA next Tuesday. Search is not an easy problem. Raise your hand if you regularly use Bing? Yahoo search? Duck Duck Go? Search seems pretty easy, anyone should be able to do it, that’s why there are so many fabulously popular alternatives to Google today, yes?

The simplest workable solution is to

  • require that topics have reasonable names that more or less match the games they are tied to: a topic titled “shit bonerz” that is about Bioshock Infinity is going to be a bad time for everyone, for eternity.

  • when searching, because you personally don’t care about post hits at all, indicate that you only care about results that are either matching the first post, or the topic title, with in:first.

In the meantime there are a few things for us to check, fix the quoted search highlighting, and we need to add in:title. But if there is no basic courtesy around topic titles (and/or first post) being recognizably related to the game they are about … you’re gonna have a bad time.

My suggestion was just to adjust the weight placed on three conceptually simple metrics (match recency, match in title vs. body, and number of matches relative to the total length of the thread), not to recreate the entire search algorithm. Agree with the other points though, thanks.

Bug: sometimes when backing out of a thread to return to Latest View (and presumably others), the forum software / browser go crazy and rapidly scroll down through dozens of posts, leaving you far below where you originally were.

Replication: Android 7 with latest Chrome. As you scroll down the Latest view, like the rest of Discourse, it begins loading the next chunk of topics, briefly showing you a spinning wheel beneath the lowest already-loaded content as the new stuff gets fetched. If you click into a topic while this loading is occurring, and then back out to Latest again, sometimes the forum will seemingly “continue” the extra content loading, but in overdrive, loading dozens of older posts and rapidly auto scrolling the view to reach the bottom of the newly loaded content.

Hey it’s not our fault you gave us a wickedly fast, powerful forum search feature that’s actually useful. Now we’re simply demanding that you make it perfect. :)

Aha good repro @eviltrout can you check this on your end?

This is better now