* Implement syscall readlinkat
* Implement syscall readv by read syscalls
Since pk lacks kernel-space dynamic memory management, we implement readv with
normal read syscalls rather than forwarding it to spike
* Implement syscall readlinkat
* Implement syscall readv by read syscalls
Since pk lacks kernel-space dynamic memory management, we implement readv with
normal read syscalls rather than forwarding it to spike
We previously kernel-panicked because that made it more obvious when a
syscall implementation was missing. These days, it's more common that
the C library will do something sensible in response to returning -ENOSYS.
Favor that approach to avoid frustrating users.
The memory manager maintains the first free page as the page after the
`_end` synthetic emitted by the linker. This value is stored in a
translation unit local variable. This value is only ever written to
from `init_early_alloc` which is static and only ever invoked from
`pk_vm_init`. Furthermore, the value that `first_free_page` is ever set
to is computed as a rounding of the _address_ of `_end`. Because the
address of the symbol cannot change during execution of a normal
program, this is effectively a constant, making the computed value a
"constant" which can be re-materialized. Now, with the knowledge that
the value is effectively a constant that can be re-materialized and the
fact that the value is ever written to at a single position, we can
simply re-materialize the value if it was ever changed in
`free_page_addr`. This will allow the 8-byte value to be truncated to
1-byte.
Now, we can inline `__early_pgalloc_align`, and because the combination
of `__early_alloc` and `__early_pgalloc_align` is small, we can inline
that again at the two sites locally. This changes the
`__augment_page_freelist` to re-materialize the constant when needed for
the allocation.
The re-materialization however uses a pc-relative addressing, which now
computes a different value than expected - the address has become a VA
rather than a PA. This results in the address computed by
`free_page_addr` (which is the result of the `__early_pgalloc_align`) to
be a virtual address after the relocation, which then propagates through
`__early_alloc` to the value in `__augment_page_freelist`, which is then
consumed by `__page_alloc`, which will treat the now VA as a PA and
perform an additional translation to a VA.
Mark the value as `volatile` to indicate that the value must be read at
all points to thwart the size optimization of the compiler resulting in
a mis-compilation resulting in the eventual invalid memory access during
the `memset` that follows the allocation.
Thanks to @nzmichaelh for the help in tracking this down!
`SYS_getcwd` is different from `getcwd` in that the return value is < 0
on failure otherwise it is the length of the string. The proxy kernel
was treating 0 as success and all other values as error. As a result,
we would never return a valid value for `getcwd`.
The following program now executes properly with the Proxy Kernel:
```c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/limits.h>
int main(int argc, char **argv) {
unsigned char buffer[PATH_MAX + 1] = {0};
if (getcwd(buffer, PATH_MAX))
printf("cwd: %s\n", buffer);
return EXIT_SUCCESS;
}
```
This replaces use of the old `sbadaddr` CSR name with the current
`stval` name. The old spelling is not supported by the LLVM IAS,
however, the modern spelling is supported by both LLVM and binutils.
Using recent compilers we get the following error message:
../pk/pk.c: In function 'run_loaded_program.constprop':
../pk/pk.c:177:3: error: both arguments to '__builtin___clear_cache'
must be pointers
177 | __clear_cache(0, 0);
| ^~~~~~~~~~~~~~~~~~~
Let's use the existing function __riscv_flush_icache(),
give it a header with a prototype and use it to
emits the FENCE.I instruction directly.
See #239
Suggested-by: Andrew Waterman <andrew@sifive.com>
Signed-off-by: Christoph Muellner <cmuellner@linux.com>
This assumes that stval is populated with the opcode on illegal
instruction exceptions. But since we're only using the opcode for
error reporting, it's OK if this assumption is violated.
Previously, the pk would always run from virtual address MEM_START.
Instead, remap it into the negative virtual addresses, allowing user
processes to expand beyond MEM_START.