Well, life came up and I couldn’t finish the kaggle project. On to the next one:
You know, I’ve been playing VRCHAT quite a bit. And one thing that really annoys me in VRCHAT is the elbows. Yep the elbows. Or the lack thereof.
In other words, you can’t do this:
Even though you paid well over a 2000$ to get Full body VR.
So is there anything that I can do to remedy it? Yes. I can buy more trackers. But that’s like 300$. I could do that, and I may if I need more data, but in this case, I need a portfolio so I’ll try to use my knowledge to help that community of degenerates that is VRCHAT’s.
So the plan right now is to I started looking for something in the literature that could help me.
Anyways, there are couple of thousand of google search results about pose estimationI read a few here and there and it turns out that 2D pose estimation (e.g. pinpoint the tail on the donkey, er I mean the joints on the image) works really well but 3D pose estimation less so.
There are couple of thousand of google search results about pose estimation.
so which one do I pick? I started with this one , which was near the top of the search results:
which led me to this one:
And I picked this paper largely because I saw it in the Two Minute papers Youtube channel and thought it was super cool:
#1. is a bit more recent, I certainly can live with 213 FPS it claims on benchmarks.
However, I came up with all kind of issue trying to install it.
#2. is about 10 times faster according to the paper, it also explains things a little bit more. I don’t think I’m going to have problems with performance either way, but just in case
#3. Do we really need to simulate a full-fledged fat guy bouncing around? Probably not. And it looks really complicated and doesn’t have source code to boot. So #2 it is.
Now I need a dataset to analyze. Google comes to the rescue:
There are couple of popular datasets for pose estimations, the most popular being Humans 3.6M… However, the authors seem to be picky as to whom the dataset to or completely uninterested .
So I checked out HumanEva. It turned out you had to fiddle with MATLAB in order to use it so. Well, I don’t want to pay for MATLAB and I don’t want to fiddle with Octave just for a one-off project.
3DPW has a fat dude that it uses and is generally in TMI territory. However, looking at the documentation and just playing around with it, I can extract joint locations and orientations easily enough. That’s exactly what I want.
Now there are several problems I foresee:
VideoPose3D doesn’t seem to output orientations of joints. I don’t think this will be a problem in this case. If you look at your elbow and play around with it, it seems that with the 2D position in the camera output , you’d be able to determine it easily enough: there’s only so many ways it can twist if you know the location of the hand and the shoulder.
The question is: can I use machine learning or will I have to hardcode it?
2-What kind of architecture do I want? I think I am going to try a simple neural net to begin wtih, and then maybe try to divide and conquer.
At this point, I don’t want a perfect solution, I just want a solution that kinds of works.