Render subtitles libass

Posted by KRAK_JOE on Fri, 24 Dec 2021 15:45:24 +0100

Initialize libass
libass's functions are all based on ass_ At the beginning, it's easy to identify.

There is not much to pay attention to during initialization:

ASS_Library* libass = ass_library_init();
ASS_Renderer* ass_renderer = ass_renderer_init(libass);
ASS_Track* ass_track = ass_read_file(libass, (char*)"subtitle.ass", (char*)"UTF-8");

ass_set_fonts(ass_renderer, NULL, "Arial", ASS_FONTPROVIDER_AUTODETECT, NULL, 0);
ass_set_frame_size(ass_renderer, bgWidth, bgHeight);
subtitle.ass is the path to your subtitle file, ass_ set_ The second and third parameters of fonts can select the default font you want (used when the corresponding font cannot be found).

ass_set_frame_size sets the resolution of the video picture. Note that this is the resolution of the video rendering, not the original drawing resolution, so that libass will return a bitmap of the correct size.

Render subtitles
libass has only one function to render Subtitles: ass_render_frame:

long long currentms = 6000; // The time when the caption is generated, in milliseconds
int isChange = 0; // Compared with the last generation, 0: no change, 1: location changed, 2: content changed
ASS_Image* assimg = ass_render_frame(ass_renderer, ass_track, currentms, &isChange);
ASS_ The image structure is what we focus on:

A linked list of images produced by an ass renderer.
These images have to be rendered in-order for the correct screen
composition. The libass renderer clips these bitmaps to the frame size.
w/h can be zero, in this case the bitmap should not be rendered at all.
The last bitmap row is not guaranteed to be padded up to stride size,
e.g. in the worst case a bitmap has the size stride * (h - 1) + w.
*/
typedef struct ass_image {
int w, h; // Bitmap width/height
int stride; // How many bytes are there in each row of bitmap stripe
unsigned char bitmap; // 1bpp stripe h alpha buffer contains only the bitmap of alpha channel, and the size is stripe * H
// Note: the last row may not be padded to
// bitmap stride! Note that the last line may not be full, which means that when reading the last line, read enough w bytes
uint32_ t color; // Bitmap color and alpha, the RGBA color used by the RGBA bitmap
int dst_x, dst_y; // Bitmap placement inside the video frame where should the bitmap be displayed in the video frame

struct ass_image *next; // Next image, or NULL next image

enum {
IMAGE_TYPE_CHARACTER,
IMAGE_TYPE_OUTLINE,
IMAGE_TYPE_SHADOW
} type;

} ASS_Image;
Obviously, this is a linked list. libass will actually generate multi-layer images. We need to render layer by layer to see the correct subtitles.

According to the general thinking, we guess that libass should return an RGBA bitmap, so we can simply display the bitmap somewhere in the picture as long as we use conventional means, but the bitmap returned by libass has only alpha channel and a single color value, which is a little tricky.

First, we create a separate RGBA format texture subTexture with the same size as the picture, write the subtitles into this texture, and then render the video picture first, and then render the subtitle texture, so that the subtitles are above the video. It must be noted here that before rendering the texture of subtitles, you must call OMSetBlendState to tell DIrect3D that alpha mixing is required next, otherwise the transparent pixels will be rendered as black instead of the video texture pixels behind it.

Implementation key code:

int isChange = 0;
auto assimg = ass_render_frame(ass_renderer, ass_track, currentms, &isChange);

if (isChange != 0) {
int count = 0;
D3D11_MAPPED_SUBRESOURCE mapped;
d3dctx->Map(subTexture.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
memset(mapped.pData, 0, mapped.DepthPitch);

if (assimg) {
	while (assimg) {
		auto src = assimg->bitmap;
		auto dst = (UCHAR*)mapped.pData;
		// Correctly calculate the starting position of subtitles
		dst = dst + assimg->dst_y * mapped.RowPitch + assimg->dst_x * 4;

		for (int y = 0; y < assimg->h; ++y) {
			for (int x = 0; x < assimg->w; ++x) {
				auto i = assimg;
				auto pixel = dst + x * 4;

				auto srcA = (src[x] * (0xff - (assimg->color & 0x000000ff))) >> 8;
				auto compA = 0xff - srcA;

				double alpha = (255 - src[x]) / 255.0;
				UCHAR rb = (assimg->color & 0xff000000) >> 24;
				UCHAR gb = (assimg->color & 0x00ff0000) >> 16;
				UCHAR bb = (assimg->color & 0x0000ff00) >> 8;

				UCHAR ra = pixel[0];
				UCHAR ga = pixel[1];
				UCHAR ba = pixel[2];
				UCHAR aa = pixel[3];

				pixel[0] = (1 - alpha) * rb + alpha * ra;
				pixel[1] = (1 - alpha) * gb + alpha * ga;
				pixel[2] = (1 - alpha) * bb + alpha * ba;
				pixel[3] = (1 - alpha) * src[x] + alpha * aa;
			}
			// The pointer moves to the next line
			src += assimg->stride;
			dst += mapped.RowPitch;
		}
		assimg = assimg->next;
	}

	d3dctx->Unmap(subTexture.Get(), 0);
}

}

ctx->DrawIndexed(indicesSize, 0, 0);
It is mainly to cycle each pixel to write the correct color value. When rendering each layer, pay attention to manual alpha mixing with the pixels of the previous layer. After all layers are written, call DrawIndexed to render to D3D surface.

One problem with this method is that the efficiency is too low, the operation of the intermediate mixing process is not easy, and the number of iterations is too many. It is often used for nearly 10000 iterations of a layer. If it is used to render a large number of bullet screens, it is more expensive than PPT.

To improve efficiency, one is to reduce iterations, and the other is to hand over operations to GPU as much as possible. Therefore, a lot of work needs to be done:

Instead of using the previous full screen overlay texture, create a small texture for each layer independently
Directly copy assimg - > bitmap to the caption texture intact
Assimg - > color is used as a constant buffer, and the texture is grouped into a group of resources and put into the pipeline
Through assimg - > DST_ X and assimg - > DST_ Y calculate the vertex coordinates, and the texture into a group of resources, put them into the pipeline, and make the caption render to the correct position
The pixel color is processed by shader to make full use of GPU.
Put these resources into an array and save them. When the caption does not change (isChange == 0), directly start the rendering process of D3D and skip the process of writing data to texture and other resources
Because we cannot predict the size of the caption bitmap and the D3D texture size is fixed, we need to recreate the texture every time the caption changes. The previous texture cannot be reused and must be destroyed.
//Create a read-only texture for the caption layer
void CreateOneTimeTexture(ID3D11Device* d3ddevice, int width, int height, ID3D11Texture2D** subTexture, ID3D11ShaderResourceView** srv, const UCHAR* data, int pitch) {
D3D11_TEXTURE2D_DESC subDesc = {};
subDesc.Format = DXGI_FORMAT_R8_UNORM;
subDesc.ArraySize = 1;
subDesc.MipLevels = 1;
subDesc.SampleDesc = { 1, 0 };
subDesc.Width = width;
subDesc.Height = height;
subDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
subDesc.Usage = D3D11_USAGE_IMMUTABLE;

D3D11_SUBRESOURCE_DATA sd = {};
sd.pSysMem = &data[0];
sd.SysMemPitch = pitch;

ComPtr<ID3D11Texture2D> tempTexture;
if (subTexture == NULL) {
	subTexture = &tempTexture;
}

d3ddevice->CreateTexture2D(&subDesc, &sd, subTexture);

if (srv) {
	// Create shader resource
	D3D11_SHADER_RESOURCE_VIEW_DESC const srvDesc = CD3D11_SHADER_RESOURCE_VIEW_DESC(
		*subTexture,
		D3D11_SRV_DIMENSION_TEXTURE2D,
		subDesc.Format
	);

	d3ddevice->CreateShaderResourceView(
		*subTexture,
		&srvDesc,
		srv
	);
}

}

//D3D resources required for each caption layer
struct SubtitleD3DResource {
ComPtr tex;
ComPtr srv;
ComPtr cb_color;
ComPtr vertex;

SubtitleD3DResource(ID3D11Device* device, int w, int h, const UCHAR* texdata, int pitch, uint32_t color, const vector<Vertex>& vertices) {
	// ...  Create the D3D resource of this structure here.
	CreateOneTimeTexture(device, w, h, &tex, &srv, texdata, pitch);

	D3D11_BUFFER_DESC bd = {};
	bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
	bd.Usage = D3D11_USAGE_IMMUTABLE;
	bd.ByteWidth = vertices.size() * sizeof(Vertex);
	bd.StructureByteStride = sizeof(Vertex);
	D3D11_SUBRESOURCE_DATA sd = {};
	sd.pSysMem = &vertices[0];

	device->CreateBuffer(&bd, &sd, &vertex);

	D3D11_BUFFER_DESC cbd = {};
	cbd.Usage = D3D11_USAGE_IMMUTABLE;
	cbd.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
	cbd.ByteWidth = 16;
	cbd.StructureByteStride = sizeof(uint32_t);
	D3D11_SUBRESOURCE_DATA csd = {};
	csd.pSysMem = &color;

	device->CreateBuffer(&cbd, &csd, &cb_color);
}

};

void Draw () {
// ...

int isChange = 0;
auto assimg = ass_render_frame(ass_renderer, ass_track, currentms + 6300, &isChange);

if (isChange) {
	subsD3DResource.clear(); // The array of SubtitleD3DResource needs to be emptied and recycled when rewriting the caption bitmap

	if (assimg) {
		while (assimg) {
			// Calculate the UV, which is normalized here and converted to [0.0, 1.0]
			float u1 = (float)assimg->dst_x / sub_frame_width;
			float v1 = (float)assimg->dst_y / sub_frame_height;
			float u2 = ((float)assimg->dst_x + assimg->w) / sub_frame_width;
			float v2 = ((float)assimg->dst_y + assimg->h) / sub_frame_height;
			// Calculate the vertex coordinates. Here, change the above results to [- 1.0, + 1.0]
			float x1 = u1 * 2 - 1;
			float y1 = 1 - v1 * 2;
			float x2 = u2 * 2 - 1;
			float y2 = 1 - v2 * 2;

			vector<Vertex> vertices = {
				{x1,	y1,	0,	0,	0},
				{x2,	y1,	0,	1,	0},
				{x2,	y2,	0,	1,	1},
				{x1,	y2,	0,	0,	1},
			};

			// Copy the bitmap directly to the texture without any extra loops
			SubtitleD3DResource subRes(d3ddevice.Get(), assimg->w, assimg->h, assimg->bitmap, assimg->stride, assimg->color, vertices);
			subsD3DResource.push_back(subRes);

			assimg = assimg->next;
		}
	}
}

// Render each layer in order
for (auto& subRes : subsD3DResource) {
	ID3D11Buffer* vertexBuffers2[] = { subRes.vertex.Get() };
	ctx->IASetVertexBuffers(0, 1, vertexBuffers2, &stride, &offset);

	ID3D11Buffer* cbs2[] = { subRes.cb_color.Get() };
	ctx->PSSetConstantBuffers(0, 1, cbs2);

	ID3D11ShaderResourceView* srvs2[] = { subRes.srv.Get() };
	ctx->PSSetShaderResources(0, 1, srvs2);

	ctx->DrawIndexed(indicesSize, 0, 0);
}

// ...
}
When rendering a caption texture, use this shader:

Texture2D tex : register(t0);

SamplerState splr;

cbuffer CBuf
{
uint color;
};

float4 main_PS_ass(float2 tc : TEXCOORD) : SV_TARGET
{
float alpha = tex.Sample(splr, tc); // Gets the alpha value from the texture

// Get rgb value from constant buffer
float r = ((color & 0xff000000) >> 24) / 255.0;
float g = ((color & 0x00ff0000) >> 16) / 255.0;
float b = ((color & 0x0000ff00) >> 8) / 255.0;

return float4(r, g, b, alpha);

}
mosfet Driver Chip https://www.zg886.cn

Programmer Think

Render subtitles libass

Hot Topics