angr之cle加载模块

最近看到关于angr这一框架的分析,但对于某些部分讲的比较模糊,于是就根据个人理解重新写入一篇分析,如果有理解错误的地方,希望大家指正.

在project模块中,如果初始化参数时,参数不是cle.loader类,则使用cle模块的loader来加载二进制模块

 cle.Loader(thing, **load_options)

在angr中首次调用loader类

loader的作用

使用loader后可以查看加载文件的属性以及一些共享库

loader类的初始化

hasattr(),判断对象是否有某一属性

先解析传入参数的类型,路径或者文件对象

__init__(self, 
        main_binary, 
        auto_load_libs=True,
        concrete_target = None,
        force_load_libs=(), 
        skip_libs=(),
        main_opts=None,
        lib_opts=None, 
        ld_path=(), 
        use_system_libs=True,
        ignore_import_version_numbers=True,
        case_insensitive=False, 
        rebase_granularity=0x100000,
        except_missing_libs=False,
        aslr=False,
        perform_relocations=True,
        load_debug_info=False,
        page_size=0x1,
        preload_libs=(),
        arch=None):

初始化参数在最后调用下面的方法 _internal_load 进行初始化

self._internal_load(main_binary, *preload_libs , *force_load_libs, preloading=(main_binary, *preload_libs))

定义如下

  def _internal_load(self, *args, preloading=()):

递归的加载依赖,知道所有依赖满足
解决符号依赖
布局地址空间
映射到内存
实现重定位

首先遍历参数,加载文件,根据传入参数可得分别是main_binary 等一系列需要处理的文件

for main_spec in args:  
    is_preloading = any(spec is main_spec for spec in preloading)
    if self.find_object(main_spec, extra_objects=objects) is not None:
        l.info("Skipping load request %s - already loaded", main_spec)
        continue

检查当前文件是否在已预加载目录中

find_object 如果给定的文件被加载则返回obj本身否则返回none

def find_object(self, spec, extra_objects=()):
        """
        If the given library specification has been loaded, return its object, otherwise return None.
        """
        if isinstance(spec, Backend):
            for obj in self.all_objects:
                if obj is spec:
                    return obj
            return None

all_objects在初始化时定义默认为空如果没有进行预加载则通过_load_object_isolated加载文件

         obj = self._load_object_isolated(main_spec)#加载文件

进入_load_object_isolated

def _load_object_isolated(self, spec):
     if isinstance(spec, Backend):
            return spec
     elif hasattr(spec, 'read') and hasattr(spec, 'seek'):
        .....
     elif type(spec) in (bytes, str):
        binary = self._search_load_path(spec) # this is allowed to cheat and do partial static loading
        l.debug("... using full path %s", binary)
        binary_stream = open(binary, 'rb')

分别对字符类型 Backend类型文件流类型进行判断处理,最终将binary_stream作为结果输出

 try:
            # STEP 2: collect options 选项
            if self.main_object is None:
                options = dict(self._main_opts)
            else:
                for ident in self._possible_idents(binary_stream if binary is None else binary): # also allowed to cheat
                    if ident in self._lib_opts:
                        options = dict(self._lib_opts[ident])
                        break
                else:
                    options = {}

进入_possible_idents ,在这里尝试通过binary_stream获取backend.

elif hasattr(spec, 'read') and hasattr(spec, 'seek'):
            backend_cls = self._static_backend(spec, ignore_hints=True)
            if backend_cls is not None:
                soname = backend_cls.extract_soname(spec)
                if soname is not None:
                    yield soname
                    if self._ignore_import_version_numbers:
                        yield soname.rstrip('.0123456789')

_static_backend会返回对于文件的正确加载器如果传递参数是未知的类型或者 blob类型则返回none

实现类似于binwalk来处理blob数据

blob是一个储存二进制的文件类型

with stream_or_path(spec) as stream:
            for rear in ALL_BACKENDS.values():
                if rear.is_default and rear.is_compatible(stream):
                    return rear

ALL_BACKENDS在_init_.py中初始为空字典.

backends是cle里面的子项目

支持多种文件类型如cgc elf java macho minidump pe tls 对应每一个文件类型都会通过register_backend来更新

def register_backend(name, cls):
    ALL_BACKENDS.update({name: cls})

因此 ALL_BACKENDS 被更新为cle支持的文件类型后台字典

    def is_compatible(stream):
        stream.seek(0)
        identstring = stream.read(0x1000)
        stream.seek(0)
        if identstring.startswith(b'\x7fELF'):
            if elftools.elf.elffile.ELFFile(stream).header['e_type'] == 'ET_CORE':
                return False
            return True
        return False

每个类型都有一个类似的检查,通过文件头判断文件类型,从而选择对应的backend.

这样就很好理解了,

with stream_or_path(spec) as stream:
            for rear in ALL_BACKENDS.values():
                if rear.is_default and rear.is_compatible(stream):
                    return rear

对于传入的对应文件,遍历支持的backend 并通过is_compatlble检查,从而返回正确的backend ,

返回到_load_object_islated

  # STEP 4: LOAD!
            l.debug("... loading with %s", backend_cls)

            result = backend_cls(binary, binary_stream, is_main_bin=self.main_object is None, loader=self, **options)
            result.close()
            return result#加载

backend_cls 由backend_reslover取得返回值为 ALL_BACKEND 字典中的值即文件对应的后台即对应类这些类定义在对应backends的文件中

            result = backend_cls(binary, binary_stream, is_main_bin=self.main_object is None, loader=self, **options)

初始化对应后台类并返回到_internal_load中

            obj = self._load_object_isolated(main_spec)#加载文件
            objects.append(obj)
            objects.extend(obj.child_objects)
            dependencies.extend(obj.deps)

obj获取到正确的类对于不同的文件类型有不同的处理方式另一种描述方法是继承自backend类的对应类

将obj 和它的子类添加到objects中

以及它的依赖添加到 dependencies中

接下来开始基于正确的后台初始化 main_object为获取到的类

if self.main_object is None:
                # this is technically the first place we can start to initialize things based on platform
                self.main_object = obj
                self.memory = Clemory(obj.arch, root=True)#分配内存

获取对应后台处理线程

                chk_obj = self.main_object if isinstance(self.main_object, ELFCore) or not self.main_object.child_objects else self.main_object.child_objects[0]
                if isinstance(chk_obj, ELFCore):
                    self.tls = ELFCoreThreadManager(self, obj.arch)
                elif isinstance(obj, Minidump):
                    self.tls = MinidumpThreadManager(self, obj.arch)
                elif isinstance(chk_obj, MetaELF):
                    self.tls = ELFThreadManager(self, obj.arch)
                elif isinstance(chk_obj, PE):
                    self.tls = PEThreadManager(self, obj.arch)
                else:
                    self.tls = ThreadManager(self, obj.arch)

跟进一个ELFCoreThreadManager中__init__

def __init__(self, loader, arch, **kwargs):  # pylint: disable=unused-argument
        self.loader = loader
        self.arch = arch
        self.threads = [ELFCoreThread(loader, arch, threadinfo) for threadinfo in loader.main_object._threads]

loader.main_object._threads同样在backend处理这里猜测它是对于传输信息tls的处理在深入backend后进行进一步处理

接下来处理以同样的方式处理依赖

在对于依赖这一部分的处理暂时不做深入考虑

然后映射到内存

        for obj in objects:
            self._map_object(obj)

处理重定位

      if self._perform_relocations:
            for obj in ordered_objects:
                obj.relocate()

通过名称插入合适的映射

   for obj in objects:
            self.requested_names.update(obj.deps)
            for ident in self._possible_idents(obj):
                self._satisfied_deps[ident] = obj

            if obj.provides is not None:
                self.shared_objects[obj.provides] = obj

        return objects

处理后返回对应object

（完）

loader的作用

loader类的初始化

find_object 如果给定的文件被加载 则返回obj本身否则返回none

返回到_load_object_islated

find_object 如果给定的文件被加载则返回obj本身否则返回none