Net54baseball.com Forums - View Single Post

Snowman · #18 08-11-2021, 02:26 AM

Part 2 of 3...

OK, great. Got it, how does that apply to grading cards?

The problem of grading cards can be broken down into several distinct classification problems. This could be classifying a card as 'authentic' vs 'counterfeit', or 'trimmed' vs 'not trimmed', or 'recolored' vs 'not recolored', or 'creased' vs 'not creased', etc. However, the problem of assigning a paricular grade is a multi-class classification problem. Theoretically, the computer would "learn" how to recognize what a 10 looks like, what a 9 looks like, an 8, a 7, ..., so on and so forth. Then it would assign probability estimates for each grade. In practice, this would probably look something like 1% 1, 1% 2, 2% 3, 5% 4, 7% 5, 13% 6, 18% 7, 30% 8, 16% 9, 7% 10, after which the card would be classified as an 8 since the highest likelihood associated with it was an 8 at 30%. However, you could think of it as only being 30% confident that it's an 8. That's not super helpful in practice, and it's precisely the type of output we would expect from a mutli-class classifier like this. And this is IF it's working well despite all the other numerous challenges which I'll outline below. As Rick-Rarecards pointed out, there are both technical challenges and practical challenges that prevent this from being a useful application for "AI" or machine learning. I believe these challenges render this problem borderline futile and at minimum a considerable waste of resources. I will outline these challenges as I see them below.

Technical challenges

Identifying a trimmed card:

One of the most important challenges to understand is what's required in order to build what we call a "training data set" that the computer can learn from. Let's take the problem of detecting a trimmed card as an example. In order for a computer to be able to classify a card as either 'trimmed' or 'not trimmed', it first needs to learn what trimmed cards look like as well as what non-trimmed cards look like. How does it accomplish this? It uses computer vision as I talked about above, leveraging convolutional neural networks and "edge detection" ("edges" in an image like where the grass meets the player in a photo as I described above, not the physical edges of a card) to detect anomolies in an image or scan. So you create a large database of images (thousands and thousands of images at minimum) where each image is labeled as being either trimmed or not trimmed. Where do these labels come from? Humans, of course (and likely the graders specifically). So they have to sit down and document millions of cards, one by one, marking it as trimmed or not trimmed, recolored or not recolored, scratched or not scratched, creased or not creased, etc. Pretty much anything you want to be able to detect about a card, they have to have a massive database to learn from and humans have to physically examine those cards and label the training datasets. Then you feed that dataset to the machine learning algorithms (or rather a highly skilled, and extremely expensive, team of data scientists does this) and after a bit of black magic, you then have a computer that is "capable" of identifying trimmed cards. I put the word 'capable' in quotes here because there's a major caveat that needs to be understood here, and this is perhaps the biggest issue of them all. The machine learning algorithms, without question, will not be as good at humans at detecting trimmed cards. In fact, they won't even be remotely close to as good as humans are. Here's the problem. First, remember that training data set that we created for the ML algorithms to learn from? Well, humans labeled it! So it's not going to learn what a trimmed card looks like, it's going to learn what a trimmed card THAT HUMANS ARE CAPABLE OF DETECTING IN THE FIRST PLACE looks like. Remember, there are countless trimmed cards that humans can't detect. If it passes through a human undetected and gets fed to the algorithm, the algorithm is told that this is what a non-trimmed card looks like, despite the fact that it was actually trimmed. But even worse than that, the algorithm is working off of a database of scanned images. Even if the scan is extremely high definition, it's still just an image from one single angle of the card, looking straight at it. Imagine if it were your job to detect trimmed cards, but you weren't allowed to hold them and feel the edges with your fingers, or even hold them and just rotate the card in hand at different angles, catching how the light bounces off the card with every rotation. All you had to work with was a scanned image of the card on your computer screen. If you think you'd be good at detecting trimmed cards just by looking at a scanned image of it, I promise you you're wrong. You might be great at detecting a botched trim job, but nobody can detect a good one. And if you think measuring a card always (or even often) tells you whether or not a card has been trimmed, again, you're wrong. It's plausible that an ML algorithm could be trained to detect a trimmed edge on a vintage card that has 3 frayed edges and one super straight, clean, smooth edge. But those aren't exactly difficult to detect to begin with, so this isn't much of a win.

Also, you can't just create one large training dataset of all cards. You have to have separate training data and separate ML models for all different types of cards, each training dataset requiring many thousands of cards at minimum, and likely hundreds of thousands of cards to be even remotely performant (good luck with that). All of these are labeled individually by hand. You can't train a model on 1950s Topps cards and then scan a 2018 Topps Chrome Shohei Ohtani and expect it to know if the Ohtani has been trimmed. It will say it's trimmed every single time because the Ohtani has sharp edges and the 1950s Topps don't. A grader knows to differentiate this, a computer doesn't. You would need to have separate datasets for each card type. And this is just the tip of the iceberg. Trying to teach an algorithm to detect trimmed cards is a fool's errand. And if you think all of these hurdles can be overcome simply by scanning every card with some sort of 360 degree spherical scans, LOL. Ya, good luck with that too. You'd then need a separate ML model for every single angle, and now your problem just exploded 1,000 fold.

Detecting a recolored card:

This one is tricky, although it is perhaps the most interesting project to work on of all the possible ML applications for detecting altered cards. But it's an insanely large problem to solve. Here, you'd likely need far more training data sets than you would even in the trimmed cards problem. There's more variation in card printing techniques, inks, surfaces, and especially images within the actual cards themselves than there is variation in card stocks or edges like in the trimmed cards example. Here, you'd almost have to have a separate training set at least for every single issue (so separate training data for 1952 Topps and another for 1933 Goudey, one for T206, etc.), and depending on performance, you may even need one for each individual card! And remember, every training set requires, at minimum, many thousands of cards scanned, but likely hundreds of thousands to be even remotely performant. So the more granular your requirements become, the less plausible this problem is to solve. But let's pretend for a moment that we could at least group together certain cards. Perhaps all 1952 through 1956 Topps cards could be used in one training set. Every time the ML algorithm sees a print defect, it's going to think that is suspicious and will flag that card as having a high likelihood of being recolored. And again, this suffers from the same problem as the trimmed training data in that it can only learn to detect cards that humans can already detect as being recolored to begin with. I suspect it would perform quite well at picking up the obvious recoloring jobs, but then again, do we really need help with those? Not likely. The hope would be that it would be able to identify cards that were recolored very subtly, ones that might slip through human grading, but if humans can't flag those to begin with, it won't be able to learn what they look like because the training data doesn't flag them as recolored. It says they're not recolored. But even if all the cards were correctly labeled in the training data, it still would have a very difficult time distinguishing between print defects, a piece of lint on the scanner bed, a damaged card, one that was in fact recolored, and even cards which just have abnormalities in the image itself. Especially with modern cards. This could be very problematic. Basically, any abnormality in the image could result in that card being flagged as having a high probability of being recolored. And if you tuned the algorithms to not be as sensitive to these abnormalities, then there would be a tradeoff that would result in more recolored cards not getting flagged. There are always tradeoffs in machine learning.

As a reference for how these algorithms work, and what level of performance one might hope for, there was a somewhat infamous competition on a popular data science website several years back that had machine learning experts all over the world competing to come up with the best algorithm to be able to detect whether a picture was of a cat or a dog. The winner was able to code an algorithm that was 97% accurate. On one hand, 97% sounds pretty impressive, but when you weigh its performance against a human, who would get it right ~100% of the time, then it's no longer all that impressive. These algorithms are great for automating away large problems where we just don't have the manpower to be able to go through every photo manually, one at a time. So if we had millions of photos to classify as cat or dog, and we didn't have the time or manpower needed to do it manually, and if a 3% error rate was acceptable, then it would be a huge win, potentially saving some company millions of dollars in costs. But for the problem of grading cards, you can't accept a 3% error rate on a problem as simple as this. And that's just for detecting if an image is a cat or a dog. If you're trying to detect something as challenging as trimming or recoloring, the error rates would be much, much higher. When it comes to grading cards, the need for accuracy far outweighs the need/benefits of automation.

Detecting edge and corner wear:

For a set like 1986 Fleer basketball that has deep red borders and white paper stock, an ML algorithm could probably learn how to identify what good corners look like and what soft or bad corners look like because the matrices would show clear edge detection differences where the red borders and soft white corners meet in the image. For this problem, edge detection "works". Same with 1971 Topps baseball and the black borders. It could easily detect white chipping along the edges of those cards as it would show up in the data of the matrix. However, take a card with white borders and white paper stock and scan the image and you can quickly see how the algorithms would fail to identify the chipping or bad corners because the scanned image does not have an "edge" to detect (white on white doesn't create an "edge" in an image that can be represented mathematically). For this reason, "AI grading" would certainly underperform the expectations/needs of any TPG.

Detecting surface issues:

If you've made it this far into my post, then I'm guessing you're already able to anticipate what the issues might be for this problem. First of all, many surface issues wouldn't even show up in a scan because a scan is only taken at one angle, and you often have to rotate a card at just the right angle to be able to see surface flaws. How many times have you bought a raw card on eBay that looked perfect even in a zoomed-in scan, only to have it show up with surface flaws or even wrinkles? It happens all the time because scans often don't pick up on these flaws, especially with modern chrome cards. Feed that image to an ML algorithm and it won't be able to see it either. But even if it could, it still would need to be able to differentiate between a surface flaw and just some random abnormality in the image itself. It would also need to be able to differentiate between a surface issue and the natural variation in paper stock for vintage cards. 1948 Leaf cards come to mind as those often have dark paper fibers that are visible even through the print. Also, think of the image in the card itself. Is that little speck a raindrop in the photo or a flaw? Is it from dirt on the photographer's lens? Is it a scratch in the surface? Is it lint on the scanner bed? A scratch on the scanner bed? You get the point. All of these things cause an increase in the error rates that an ML algorithm would produce. And again, it would underperform any human that's even remotely competent.

I should also point out that each of the use cases above are the ones that the TPGs are MOST hopeful about lol. In a recent interview, Nat Turner explained that they are not currently using Genamint technology specifically for grading cards, and that realistically, they probably only hope to be able to use it to identify altered cards or cards that have already been submitted for grading before. And even this lofty goal they aren't planning to achieve until the end of the year. "AI grading" isn't coming to PSA anytime soon, and I'll go ahead and go out on a limb here and say it likely never will. The problem of actually assigning a numerical grade to a card is considerably more challenging than any of these binary classification models above, and would produce considerably higher error rates than any of the issues above. It's just not an ideal application for machine learning. For some tasks, computers can be taught remarkably well how to do something. But it's not a magic solution for everything, and you really need to understand the problem you're trying to solve deeply AND have a deep knowledge of how these algorithms work to begin with in order to know if you have a problem that is well suited for machine learning.

This is what happens when executives get excited about technology that they don't understand and buzzwords like "AI" simply because everyone else is doing it, so why shouldn't they? It seems like everyone and their brother in the corporate world today thinks "AI" is coming to revolutionize their industry and that they just have to win the race and get there before their competitors do. But in reality, many of the problems they're trying to solve just aren't well-suited applications for machine learning. Hell, even Uber and Lyft both gave up on their automated driving projects, and that's a problem that is extremely well-suited for AI. There are no shortage of problems that AI and ML are going to solve in the near future, or industries that will be disrupted by these technologies. Grading cards just isn't one of them.