内容隐藏

1. 背景

3.3. 从 template 创建 sandbox

3.4. 创建 container

4. 技术限制

4.1. virtio-fs

https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md

vm template 是一种加速虚机创建、节省vm内存的技术。

1. 背景

我们之所以说虚机是强隔离的，主要是因为虚机有独立的内核。不同vm之间的进程是完全运行在自己的内核空间里面的，相互不可见

引起虚机的启动，需要引导一个完整的内核，以及完整的os

这在传统的虚机场景下，问题不大，一个物理机上启动的vm数量可能就十几个

但是容器场景下，kata-containers，容器粒度要远小于虚机，一个物理机可能启动上百个kata容器，每个kata容器都有自己独立的内核，这样算来，开销就不小了

由于每个kata容器的内核，操作系统镜像，都是相同的。如果虚机之间的内核、操作系统镜像所占用的内存能够共享，这样就能省掉不少内存。

于是，vm template 技术出现了

2. vm template 原理

主要是利用了linux内核fork系统的cow原理（copy on write）

cow：fork一个新的进程，会把原进程的内存空间全部copy一份，但这个copy只是一个引用。只有新进程在写这块内存区域的时候，才会发生真正的copy操作

所以 vm template 的核心思路是：通过一个事先创建好的最小 factory-vm，包含公共的内核、操作系统镜像、以及kata-agent。创建kata容器的时候，从factory-vm fork一个vm出来，然后再通过热插拔的方式，调整vm的规格以符合kata容器的要求

但是这里有一个问题是，template vm 的规格是固定的，但是 Pod 的规格不是固定的，所以必须通过 Pod vm 的热插拔 & resize 能力来实现 vm 规格的调整

3. 系统架构

补充系统流程图

3.1. template 是什么

template 本质上是一个内存快照，qemu 可以通过一个内存快照直接拉起一个 vm

当 vm template 特性打开时，qemu 的启动参数会多一行是这样的：

-object memory-backend-file,id=dimm1,size=1024M,mem-path=/run/vc/vm/template/memory

这个 /run/vc/vm/template/memory 就是我们说的 vm template

3.2. 打开 template 特性

打开 vm template 功能非常简单，只需要修改 kata 的以下配置即可

qemu-lite is specified in hypervisor.qemu-> path section
enable_template = true
initrd = is set
image = option is commented out or removed

另外，template 的规格是由 configuration.toml 里面的 default_vcpus、default_memory 参数决定

3.3. 创建 template

创建 template 有两种方式：

一种是 kata-runtime factory init：这个命令会自动把 template 创建出来
如果1没有执行，创建 sandbox 的时候如果发现 template 不存在，也会自动创建一个

template 的代码实现在 src/runtime/virtcontainers/factory/template/ 里

由于 kata 支持两种 vm 创建加速方案，一种是 VM cache，一种是 vm template，最终都抽象成了 factory

3.3.1. template 创建流程

以下从代码实现层面说明

函数调用栈如下：

containerd-shim-v2/create.go: create()

-> katautils/create.go: HandleFactory()

-> virtcontainers/factory/factory.go: NewFactory()

-> virtcontainers/factory/template/template.go: New() -> createTemplateVM()

NewFactory 的第三个参数是 fetchOnly，如果为 true，则会复用已存在的 template vm，如果不存在，会报错。如果 fetchOnly 为 false，则会直接 New 一个新的。

所以这个地方的代码逻辑很简单：先判断 template vm 是否存在，如果存在就直接复用，否则就 New 一个

    factoryConfig := vf.Config{
        Template:        runtimeConfig.FactoryConfig.Template,
        TemplatePath:    runtimeConfig.FactoryConfig.TemplatePath,
        VMCache:         runtimeConfig.FactoryConfig.VMCacheNumber > 0,
        VMCacheEndpoint: runtimeConfig.FactoryConfig.VMCacheEndpoint,
        VMConfig: vc.VMConfig{
            HypervisorType:   runtimeConfig.HypervisorType,
            HypervisorConfig: runtimeConfig.HypervisorConfig,
            AgentConfig:      runtimeConfig.AgentConfig,
        },
    }
     f, err := vf.NewFactory(ctx, factoryConfig, true)
    if err != nil && !factoryConfig.VMCache {
        kataUtilsLogger.WithError(err).Warn("load vm factory failed, about to create new one")
        f, err = vf.NewFactory(ctx, factoryConfig, false)
    }

真正 New 的时候，我们直接看 template.createTemplateVM() 的实现

从流程上来看，基本上就是 NewVM 去创建一个真正的 vm，创建完了之后，stop掉，通过 vm.Save() 得到内存快照，快照地址就是 MemoryPath，这个路径默认是 /run/vc/vm/template/

一旦得到这个内存快照之后，后续的所有 vm 都可以通过这个快照直接创建出来

func (t *template) createTemplateVM(ctx context.Context) error {
    // create the template vm
    config := t.config
    config.HypervisorConfig.BootToBeTemplate = true
    config.HypervisorConfig.BootFromTemplate = false
    config.HypervisorConfig.MemoryPath = t.statePath + "/memory"
    config.HypervisorConfig.DevicesStatePath = t.statePath + "/state"

    vm, err := vc.NewVM(ctx, config)  // 这个地方会创建一个 vm，并真正的 start 起来
    if err != nil {
        return err
    }
    defer vm.Stop(ctx)

    if err = vm.Disconnect(ctx); err != nil {
        return err
    }

    // Sleep a bit to let the agent grpc server clean up
    // When we close connection to the agent, it needs sometime to cleanup
    // and restart listening on the communication( serial or vsock) port.
    // That time can be saved if we sleep a bit to wait for the agent to
    // come around and start listening again. The sleep is only done when
    // creating new vm templates and saves time for every new vm that are
    // created from template, so it worth the invest.
    time.Sleep(templateWaitForAgent)

    if err = vm.Pause(ctx); err != nil {
        return err
    }

    if err = vm.Save(); err != nil {
        return err
    }
    return nil
}

3.3.2. template 规格

由上面代码我们可以看到创建 template 用的是 runtimeConfig.HypervisorConfig

而 runtimeConfig.HypervisorConfig 是通过读取 configuration.toml 生成的。

配置生成的函数调用栈：

loadRuntimeConfig() -> LoadConfiguration -> updateRuntimeConfig -> updateRuntimeConfigHypervisor -> newQemuHypervisorConfig

    return vc.HypervisorConfig{
        HypervisorPath:          hypervisor,
        HypervisorMachineType:   machineType,
        NumVCPUs:                h.defaultVCPUs(),   --> 对应 default vcpus 参数
        DefaultMaxVCPUs:         h.defaultMaxVCPUs(),
        MemorySize:              h.defaultMemSz(),   --> 对应 default memory 参数
        MemSlots:                h.defaultMemSlots(),
        MemOffset:               h.defaultMemOffset(),

3.3. 从 template 创建 sandbox

有了上述的 template 文件，通过 qemu 创建 vm 的时候，直接带上 -object memory-backend-file,id=dimm1,size=1024M,mem-path=/run/vc/vm/template/memory 参数即可

如下代码：

virtcontainers/qemu.go: CreateVM()

-> virtcontainers/qemu.go: setupTemplate()

func (q *qemu) setupTemplate(knobs *govmmQemu.Knobs, memory *govmmQemu.Memory) govmmQemu.Incoming {
    incoming := govmmQemu.Incoming{}

    if q.config.BootToBeTemplate || q.config.BootFromTemplate {
        knobs.FileBackedMem = true
        memory.Path = q.config.MemoryPath

        if q.config.BootToBeTemplate {
            knobs.MemShared = true
        }
        if q.config.BootFromTemplate {
            incoming.MigrationType = govmmQemu.MigrationDefer
        }
    }
    return incoming
}

3.4. 创建 container

runc里的sandbox其实是个pause容器，不占任何资源。基本上只是用来维护pod的namespace

而在kata里面，sandbox就是一个虚机，Pod内所有容器都在虚机内运行。

前面我们知道，sandbox 创建时是不知道 pod 内会起多少个容器的，因此，只会按照默认规格去启动 sandbox，等到真正需要创建 container 的时候，需要按照 container 规格去调大 vm 的规格

// CreateContainer creates a new container in the sandbox
// This should be called only when the sandbox is already created.
// It will add new container config to sandbox.config.Containers
func (s *Sandbox) CreateContainer(ctx context.Context, contConfig ContainerConfig) (VCContainer, error) {
    // Update sandbox config to include the new container's config
    s.config.Containers = append(s.config.Containers, contConfig)

    // Create the container object, add devices to the sandbox's device-manager:
    c, err := newContainer(ctx, s, &s.config.Containers[len(s.config.Containers)-1])
    if err != nil {
        return nil, err
    }

    // create and start the container
    if err = c.create(ctx); err != nil {
        return nil, err
    }

    // Add the container to the containers list in the sandbox.
    if err = s.addContainer(c); err != nil {
        return nil, err
    }

    // Sandbox is responsible to update VM resources needed by Containers
    // Update resources after having added containers to the sandbox, since
    // container status is requiered to know if more resources should be added.
    if err = s.updateResources(ctx); err != nil {
        return nil, err
    }

    if err = s.cgroupsUpdate(ctx); err != nil {
        return nil, err
    }
    // ....

    return c, nil
}

不过这里我有个地方不明白的是，为啥不是先 updateResources，再 create Container 呢，如果 create Container 时，资源不足 oom 了怎么办？？

4. 技术限制

4.1. virtio-fs

使用virtiofs 时，创建虚机时：-m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa node,memdev=mem，会有一个share=on的选项要打开

但是如果用模板启动虚机时，会将share=on去掉。导致virtiofs冲突无法使用

目前 virtio9p 没有这个问题，不过9p性能差是出了名的

一	二	三	四	五	六	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

成功，源于对美学的执著追求

vm template 原理浅析

1. 背景

2. vm template 原理

3. 系统架构

3.1. template 是什么

3.2. 打开 template 特性

3.3. 创建 template

3.3.1. template 创建流程

3.3.2. template 规格

3.3. 从 template 创建 sandbox

3.4. 创建 container

4. 技术限制

4.1. virtio-fs

发表回复取消回复

成功，源于对美学的执著追求

1. 背景

2. vm template 原理

3. 系统架构

3.1. template 是什么

3.2. 打开 template 特性

3.3. 创建 template

3.3.1. template 创建流程

3.3.2. template 规格

3.3. 从 template 创建 sandbox

3.4. 创建 container

4. 技术限制

4.1. virtio-fs

发表回复 取消回复

发表回复取消回复