Brie Larson shows her teen body (DFL2 test #2)
I ended up front-loading a lot of the work with the dst masks – the subject kept constantly moving in/out of frame with lots of blurry moments, especially in profile – so relatively happy with the finished product. Still wish I could transfer SAEHD masks during the merging process. I’ll be looking at editors for future videos.
Two biggest bumps I faced:
– Mask uncanniness – This really comes down to the difference in resolution between the dst source (HD/1080p video) versus Brie/src masks (128 res) due to GPU/CPU limitations. Also was working with a FF faceset for src which somewhat was an issue with Lucy’s face shape in the video. More on both below.
– Occasional derpy eyes from Brie – Looking back at my src faceset, I did lack photos of her looking straight at the camera, which explains some of the weird faces ultimately generated.
Here’s more of my process breakdown:
Faceset
Lucy Tyler/dst – WF/512/90 jpg (png source) – Overall very pleased with the dst results. I spent several hours combing through the set since a lot of false positives (e.g. a face detected on her belly or jeans) and then had to manually detect faces because the person kept moving in/out frame in profile/top-down, etc.
Brie Larson/src – FF/256 – Initially went in to mask and train them as WF before remembering they were FF (which IMO contributed to some of the blending issues ultimately). Next version I’ll probably try to go back to a WF set for more flexibility.
Masking (over 100k iter – 285ms / reached 0.0150ish)
dst – IMO a near-flawless masking job given what I was working with. Very time-consuming and tricky as Lucy/dst was barely in the frame in lots of shots, and even moreso in profile. I really wanted as much fluidity and face recognition as possible, so I spent a huge amount of time making sure most faces (nose down) were recognized. At least 100 manual masks throughout to get XSeg kickstarted.
src – More time-consuming than difficult – as mentioned, will try to move back to a WF faceset
SAEHD Training – 128 res/DF/256 AED/64 ED/64 DD/22 DMD/4 batch – 0.800ms
GPU memory is a real bottleneck. I tried to push my equipment to its limit, first by attempting 256 res, then 160, but had to revert back down to 128.
As mentioned, mask uncanniness is probably my biggest gripe with this version, and that’s because of OOM. I ended up running a fairly straightforward SAEHD training with no real specific tweaks (e.g. GAN, true face, etc.) due to said memory limit.
Pre-trained for 120k+ iter with NSFW female dataset
0-50k iter – default values with face flip and random wrap
50k-145k – default values with eyes priority and face flip, no random wrap
145k-160k – default values with yaw, no eyes priority, no random wrap
Ultimately got to 0.1815/0.0944 (a bit of a stagnation in the iter)
Merging
Color and edges were the two biggest factors beyond resolution in making this merging process difficult. Tyler/dst has a tall oval face, which meant that the of Brie/src’s FF mask reached the middle of the dst forehead. Colors also were a bit off and hard to believably blend, in part because of difference in resolution. I went with skin-tone richness over full blend, which in part accentuated the sense of “mask depth” where the src masks sits between the hair and face shadow of the hair. Will have to revisit this problem when the resolutions and mask frames are comparable.