By the Power of Grayscale | Svelte Hacker News

shoo 21 minutes ago

If you enjoyed this post you may also like the 2024 book foundations of computer vision: https://visionbook.mit.edu/

prior hn thread: https://news.ycombinator.com/item?id=44281506

i don't have any background in computer vision but enjoyed how the introductory chapter gets right into it illustrating how to build a limited but working simple vision system

ryukoposting 37 minutes ago

It may come as a surprise to some that a lot of industrial computer vision is done in grayscale. In a lot of industrial CV tasks, the only things that matter are cost, speed, and dynamic range. Every approach we have to making color images compromises on one of those three characteristics.

I think this kind of thing might have real, practical use cases in industry if it's fast enough.

grep_it 2 hours ago

Really enjoyed this article, thanks for sharing!

I had recently learned about using image pyramids[1] in conjunction with template matching algorithms like SAD to do simple and efficient object recognition, it was quite fun.

1: https://en.wikipedia.org/wiki/Pyramid_%28image_processing%29

teiferer 4 hours ago

Appreciate the old school non-AI approach.

amelius 2 hours ago

But have a look at the "Thresholding" section. It appears to me that AI would be much better at this operation.
- vincenthwt an hour ago
  
  It really depends on the application. If the illumination is consistent, such as in many machine vision tasks, traditional thresholding is often the better choice. It’s straightforward, debuggable, and produces consistent, predictable results. On the other hand, in more complex and unpredictable scenes with variable lighting, textures, or object sizes, AI-based thresholding can perform better.
  That said, I still prefer traditional thresholding in controlled environments because the algorithm is understandable and transparent.
  Debugging issues in AI systems can be challenging due to their "black box" nature. If the AI fails, you might need to analyze the model, adjust training data, or retrain, a process that is neither simple nor guaranteed to succeed. Traditional methods, however, allow for more direct tuning and certainty in their behavior. For consistent, explainable results in controlled settings, they are often the better option.
- Legend2440 37 minutes ago
  
  It indeed would be much better. There’s a reason the old CV methods aren’t used much anymore.
  If you want to anything even moderately complex, deep learning is the only game in town.
- do_not_redeem an hour ago
  
  sure, if you don't mind it hallucinating different numbers into your image
  - Legend2440 40 minutes ago
    
    Right, but the non-deep learning OCR methods also do that. And they have a much much lower overall accuracy.
    There’s a reason deep learning took over computer vision.
    
    vincenthwt 3 minutes ago
    
    You're absolutely right, deep learning OCR often delivers better results for complex tasks like handwriting or noisy text. It uses advanced models like CNNs or CRNNs to learn patterns from large datasets, making it highly versatile in challenging scenarios.
    However, if I can’t understand the system, how can I debug it if there are any issues? Part of an engineer's job is to understand the system they’re working with, and deep learning models often act as a "black box," which makes this difficult.
    Debugging issues in these systems can be a major challenge. It often requires specialized tools like saliency maps or attention visualizations, analyzing training data for problems, and sometimes retraining the entire model. This process is not only time-consuming but also may not guarantee clear answers.
    
    do_not_redeem 27 minutes ago
    
    GP is talking about thresholding and thresholding is used in more than just OCR. Thresholding algorithms do not hallucinate numbers.

atum47 4 hours ago

I was working on a image editor on the browser, https://victorribeiro.com/customFilter

Right now the neat future it have is the ability of running custom filters of varied window size of images, and use custom formulas to blend several images

I don't have a tutorial at hand on how to use it, but I have a YouTube video where I show some of its features

https://youtube.com/playlist?list=PL3pnEx5_eGm9rVr1_u1Hm_LK6...

atum47 4 hours ago

At some point I would like to add more features as you described in your article; feature detection, image stitching...
Here's the source code if anyone's interested https://github.com/victorqribeiro/customFilter

rmonvfer an hour ago

I’m not a “C” person but I’ve really enjoyed reading this, it’s quite approachable and well written. Thank you for writing it.

kazinator 3 hours ago

Referencing "By the power of Grayskull!"

macleginn an hour ago

As an aside, "For the honor of grayscale" would work no worse here.
mkaic 3 hours ago

IIIII HHHAAAAAVE THE POWERRRRR

ggm 2 hours ago

Didn't recognize George Smiley in those photos. Which makes sense, given he's an espiocrat.

nakamoto_damacy 4 hours ago

From a 70s kid to an 80s kid, well done!

Xenoamorphous 4 hours ago

Ditto. I’ve upvoted this based solely on the amazing title. Best toyline ever.
dcminter 4 hours ago

I too applaud this terrible (amazing) pun.

ranger_danger 5 hours ago

Quality He-Man reference.