A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

Cat@ponder.cat · 2 days ago

A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

barsoap@lemm.ee · edit-2 1 day ago

After reading through the abstract the article is pop sci bunk: They developed a method to save additional space with constant-time overhead.

Which is certainly novel and nice and all kinds of things but it’s just a tool in the toolbox, making things more optimal in theory says little about things being faster in practice because the theoretical cost models never match what real-world machines are actually doing. In algorithm classes we learn to analyse sorting algorithms by number of comparisons, and indeed the minimum necessary is O(n log n), in the real world, it’s numbers of cache invalidation that matters: CPUs can compare numbers basically instantly, getting the stuff you want to compare from memory to the CPU is where time is spent. It can very well be faster to make more comparisons if it means you get fewer, or more regular (so that the CPU can predict and pre-fetch), data transfers.

Consulting my crystal ball, I see this trickling down into at least the minds of people who develop the usual KV stores, database engineers, etc, maybe it’ll help maybe it won’t those things are already incredibly optimized. Never trust a data structure optimisation you didn’t benchmark. Never trust any optimisation you didn’t benchmark, actually. Do your benchmarks, you’re not smarter than reality. In case it does help, it’s going to trickle down into standard implementations of data structures languages ship with.

EDIT: I was looking an this paper, not this. It’s actually disproving a conjecture of Yao, who has a Turing prize, certainly a nice feather to have in your cap. It’s also way more into the theoretical weeds than I’m comfortable with. This may have applications, or this may go along the lines of the Karatsuba algorithm: Faster only if your data is astronomically large, for (most) real-world applications the constant overhead out-weighs the asymptotic speedup.

tyler@programming.dev · 23 hours ago

the reason it confused me is because the college student was clearly using the algorithm to accomplish his task, not just theoretically designed. So it didn’t seem to be a small improvement that would only be noticeable in certain situations.

I’m not smart enough to understand the papers so that’s why I asked.

barsoap@lemm.ee · 23 hours ago

Oh no it’s definitely a theoretical paper. Even if the theory is fully formalised and thus executable it still wouldn’t give much insight on how it’d perform in the real world because theorem provers aren’t the most performant programming languages.

And, FWIW, CS theorists don’t really care about running programs same as theoretical physicists don’t care much about banging rocks together, in both cases making things work in the real world is up to engineers.

tyler@programming.dev · 2 hours ago

you’ve misunderstood what I’ve said, but whatever.

taladar@sh.itjust.works · 1 day ago

Also never even start optimizing until you profile and are sure the bit you are trying to optimize even matters to the overall performance of your program.

A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

Undergraduate Upends a 40-Year-Old Data Science Conjecture | Quanta Magazine