T2M-GPT 코드 분석

3 minute read

T2M-GPT Dataset 구조

T2M-GPT는 HumanML3D가 사용한 Dataset 구조를 사용했다고 한다.
HumanML3D가 사용한 Dataset은 Amass에서 제공하는 SMPLH + G dataset을 사용했으며 raw_pose_processing.ipynb, motion_representation.ipynb, cal_mean_variance.ipynb를 이용해 Dataset을 구성해냈다.

raw_pose_processing.ipynb

amass dataset인 아래 data들을 불러온 후에 압축해제 및 preprocessing 작업을 진행한다.

ACCD (ACCD)
HDM05 (MPI_HDM05)
TCDHands (TCD_handMocap)
SFU (SFU)
BMLmovi (BMLmovi)
CMU (CMU)
Mosh (MPI_mosh)
EKUT (EKUT)
KIT (KIT)
Eyes_Janpan_Dataset (Eyes_Janpan_Dataset)
BMLhandball (BMLhandball)
Transitions (Transitions_mocap)
PosePrior (MPI_Limits)
HumanEva (HumanEva)
SSM (SSM_synced)
DFaust (DFaust_67)
TotalCapture (TotalCapture)
BMLrub (BioMotionLab_NTroje)

내가 제일 궁금한 것은 이 raw한 data를 어떻게 Dataset 구조로 변경하는지가 제일 궁금했고 raw_pose_processing.ipynb에서 In [7]을 보면 답을 알 수 있었다.

    with torch.no_grad():
        for fId in range(0, frame_number, down_sample):
            root_orient = torch.Tensor(bdata['poses'][fId:fId+1, :3]).to(comp_device) # controls the global root orientation
            pose_body = torch.Tensor(bdata['poses'][fId:fId+1, 3:66]).to(comp_device) # controls the body
            pose_hand = torch.Tensor(bdata['poses'][fId:fId+1, 66:]).to(comp_device) # controls the finger articulation
            betas = torch.Tensor(bdata['betas'][:10][np.newaxis]).to(comp_device) # controls the body shape
            trans = torch.Tensor(bdata['trans'][fId:fId+1]).to(comp_device)    
            body = bm(pose_body=pose_body, pose_hand=pose_hand, betas=betas, root_orient=root_orient)
            joint_loc = body.Jtr[0] + trans
            pose_seq.append(joint_loc.unsqueeze(0))
    pose_seq = torch.cat(pose_seq, dim=0)
    
    pose_seq_np = pose_seq.detach().cpu().numpy()
    pose_seq_np_n = np.dot(pose_seq_np, trans_matrix)
    
    np.save(save_path, pose_seq_np_n)

해당 코드는 smpl parameter인 root_orient, pose_body, pose_hand, betas, trans를 얻고 human_body_prior에 있는 male_bm or female_bm BodyModel을 만들어 3D Pose로 만들어서 데이터로 저장하는 형태이다.
joint_loc = body.Jtr[0] + trans에서 보듯 class안에 Jtr을 보면 어떤 형식으로 빼는지 확인 할 수 있는데 human_body_prior/body_model/body_model.py를 확인하면
def forward 부분에

        verts, Jtr = lbs(betas=shape_components, pose=full_pose, v_template=v_template,
                            shapedirs=shapedirs, posedirs=self.posedirs,
                            J_regressor=self.J_regressor, parents=self.kintree_table[0].long(),
                            lbs_weights=self.weights, joints=joints, v_shaped=v_shaped,
                            dtype=self.dtype)

        Jtr = Jtr + trans.unsqueeze(dim=1)
        verts = verts + trans.unsqueeze(dim=1)

        res = {}
        res['v'] = verts
        res['f'] = self.f
        res['Jtr'] = Jtr 

lbs 함수를 거치는 것을 알 수 있다. lbs은 Linear Blend Skinning의 약자로 parameter로 vertex or pose로 바꿔주는 작업을 의미하고 코드는 아래와 같다.

def lbs(betas, pose, v_template, shapedirs, posedirs, J_regressor, parents,
        lbs_weights, joints = None, pose2rot=True, v_shaped=None, dtype=torch.float32):
    ''' Performs Linear Blend Skinning with the given shape and pose parameters
        Parameters
        ----------
        betas : torch.tensor BxNB
            The tensor of shape parameters
        pose : torch.tensor Bx(J + 1) * 3
            The pose parameters in axis-angle format
        v_template torch.tensor BxVx3
            The template mesh that will be deformed
        shapedirs : torch.tensor 1xNB
            The tensor of PCA shape displacements
        posedirs : torch.tensor Px(V * 3)
            The pose PCA coefficients
        J_regressor : torch.tensor JxV
            The regressor array that is used to calculate the joints from
            the position of the vertices
        parents: torch.tensor J
            The array that describes the kinematic tree for the model
        lbs_weights: torch.tensor N x V x (J + 1)
            The linear blend skinning weights that represent how much the
            rotation matrix of each part affects each vertex
        pose2rot: bool, optional
            Flag on whether to convert the input pose tensor to rotation
            matrices. The default value is True. If False, then the pose tensor
            should already contain rotation matrices and have a size of
            Bx(J + 1)x9
        dtype: torch.dtype, optional
        Returns
        -------
        verts: torch.tensor BxVx3
            The vertices of the mesh after applying the shape and pose
            displacements.
        joints: torch.tensor BxJx3
            The joints of the model
    '''

    batch_size = max(betas.shape[0], pose.shape[0])
    device = betas.device

    # Add shape contribution
    if v_shaped is None:
        v_shaped = v_template + blend_shapes(betas, shapedirs)

    # Get the joints
    # NxJx3 array
    if joints is not None:
        J = joints
    else:
        J = vertices2joints(J_regressor, v_shaped)

    # 3. Add pose blend shapes
    # N x J x 3 x 3
    ident = torch.eye(3, dtype=dtype, device=device)
    if pose2rot:
        rot_mats = batch_rodrigues(
            pose.view(-1, 3), dtype=dtype).view([batch_size, -1, 3, 3])

        pose_feature = (rot_mats[:, 1:, :, :] - ident).view([batch_size, -1])
        # (N x P) x (P, V * 3) -> N x V x 3
        pose_offsets = torch.matmul(pose_feature, posedirs).view(batch_size, -1, 3)
    else:
        pose_feature = pose[:, 1:].view(batch_size, -1, 3, 3) - ident
        rot_mats = pose.view(batch_size, -1, 3, 3)

        pose_offsets = torch.matmul(pose_feature.view(batch_size, -1),
                                    posedirs).view(batch_size, -1, 3)

    v_posed = pose_offsets + v_shaped
    # 4. Get the global joint location
    J_transformed, A = batch_rigid_transform(rot_mats, J, parents, dtype=dtype)

    # 5. Do skinning:
    # W is N x V x (J + 1)
    W = lbs_weights.unsqueeze(dim=0).expand([batch_size, -1, -1])
    # (N x V x (J + 1)) x (N x (J + 1) x 16)
    num_joints = J_regressor.shape[0]
    T = torch.matmul(W, A.view(batch_size, num_joints, 16)) \
        .view(batch_size, -1, 4, 4)

    homogen_coord = torch.ones([batch_size, v_posed.shape[1], 1],
                               dtype=dtype, device=device)
    v_posed_homo = torch.cat([v_posed, homogen_coord], dim=2)
    v_homo = torch.matmul(T, torch.unsqueeze(v_posed_homo, dim=-1))

    verts = v_homo[:, :, :3, 0]

    return verts, J_transformed

이렇게 나온 3D Pose를 데이터 형태로 저장하는 방법을 HumanML3D에서 진행하고 있다. 보통 num_joint가 나오려면 regerssor가 어떻게 생겼냐에 따라서 달라지는데 저장된 data의 shape을 확인해보니

[n_frame,n_joint,dim]

이렇게 구성되어 있었다.

Share on

Twitter Facebook LinkedIn

Changhee Yang

T2M-GPT 코드 분석

T2M-GPT Dataset 구조

raw_pose_processing.ipynb

Share on

Leave a comment

You may also enjoy

리눅스(Linux) 환경에서 Nvidia-driver 완전 삭제

PP_YOLOE Demo & Evaluation 하는 법

SMPL_mesh short path algorithm (Dijkstra, pygeodesic)

ControlNet Demo