1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
|
-*-org-*-
* TODO
** Keep exit code of traced process
See https://bugzilla.redhat.com/show_bug.cgi?id=105371 for details.
** Automatic prototype discovery:
*** Use debuginfo if available
Alternatively, use debuginfo to generate configure file.
*** Mangled identifiers contain partial prototypes themselves
They don't contain return type info, which can change the
parameter passing convention. We could use it and hope for the
best. Also they don't include the potentially present hidden this
pointer.
** Automatically update list of syscalls?
** More operating systems (solaris?)
** Get rid of EVENT_ARCH_SYSCALL and EVENT_ARCH_SYSRET
** Implement displaced tracing
A technique used in GDB (and in uprobes, I believe), whereby the
instruction under breakpoint is moved somewhere else, and followed
by a jump back to original place. When the breakpoint hits, the IP
is moved to the displaced instruction, and the process is
continued. We avoid all the fuss with singlestepping and
reenablement.
** Create different ltrace processes to trace different children
** Config file syntax
*** mark some symbols as exported
For PLT hits, only exported prototypes would be considered. For
symtab entry point hits, all would be.
*** named arguments
This would be useful for replacing the arg1, emt2 etc.
*** parameter pack improvements
The above format tweaks require that packs that expand to no types
at all be supported. If this works, then it should be relatively
painless to implement conditionals:
| void ptrace(REQ=enum(PTRACE_TRACEME=0,...),
| if[REQ==0](pack(),pack(pid_t, void*, void *)))
This is of course dangerously close to a programming language, and
I think ltrace should be careful to stay as simple as possible.
(We can hook into Lua, or TinyScheme, or some such if we want more
general scripting capabilities. Implementing something ad-hoc is
undesirable.) But the above can be nicely expressed by pattern
matching:
| void ptrace(REQ=enum[int](...)):
| [REQ==0] => ()
| [REQ==1 or REQ==2] => (pid_t, void*)
| [true] => (pid_t, void*, void*);
Or:
| int open(string, FLAGS=flags[int](O_RDONLY=00,...,O_CREAT=0100,...)):
| [(FLAGS & 0100) != 0] => (flags[int](S_IRWXU,...))
This would still require pretty complete expression evaluation.
_Including_ pointer dereferences and such. And e.g. in accept, we
need subtraction:
| int accept(int, +struct(short, +array(hex(char), X-2))*, (X=uint)*);
Perhaps we should hook to something after all.
*** system call error returns
This is closely related to above. Take the following syscall
prototype:
| long read(int,+string0,ulong);
string0 means the same as string(array(char, zero(retval))*). But
if read returns a negative value, that signifies errno. But zero
takes this at face value and is suspicious:
| read@SYS(3 <no return ...>
| error: maximum array length seems negative
| , "\n\003\224\003\n", 4096) = -11
Ideally we would do what strace does, e.g.:
| read@SYS(3, 0x12345678, 4096) = -EAGAIN
*** errno tracking
Some calls result in setting errno. Somehow mark those, and on
failure, show errno. System calls return errno as a negative
value (see the previous point).
*** second conversions?
This definitely calls for some general scripting. The goal is to
have seconds in adjtimex calls show as e.g. 10s, 1m15s or some
such.
*** format should take arguments like string does
Format should take value argument describing the value that should
be analyzed. The following overwriting rules would then apply:
| format | format(array(char, zero)*) |
| format(LENS) | X=LENS, format[X] |
The latter expanded form would be canonical.
This depends on named arguments and parameter pack improvements
(we need to be able to construct parameter packs that expand to
nothing).
*** More fine-tuned control of right arguments
Combination of named arguments and some extensions could take care
of that:
| void func(X=hide(int*), long*, +pack(X)); |
This would show long* as input argument (i.e. the function could
mangle it), and later show the pre-fetched X. The "pack" syntax is
utterly undeveloped as of now. The general idea is to produce
arguments that expand to some mix of types and values. But maybe
all we need is something like
| void func(out int*, long*); |
ltrace would know that out/inout/in arguments are given in the
right order, but left pass should display in and inout arguments
only, and right pass then out and inout. + would be
backward-compatible syntactic sugar, expanded like so:
| void func(int*, int*, +long*, long*); |
| void func(in int*, in int*, out long*, out long*); |
This is useful in particular for:
| ulong mbsrtowcs(+wstring3_t, string*, ulong, addr); |
| ulong wcsrtombs(+string3, wstring_t*, ulong, addr); |
Where we would like to render arg2 on the way in, and arg1 on the
way out.
But sometimes we may want to see a different type on the way in and
on the way out. E.g. in asprintf, what's interesting on the way in
is the address, but on the way out we want to see buffer contents.
Does something like the following make sense?
| void func(X=void*, long*, out string(X)); |
** Support for functions that never return
This would be useful for __cxa_throw, presumably also for longjmp
(do we handle that at all?) and perhaps a handful of others.
** Support flag fields
enum-like syntax, except disjunction of several values is assumed.
** Support long long
We currently can't define time_t on 32bit machines. That mean we
can't describe a range of time-related functions.
** Support signed char, unsigned char, char
Also, don't format it as characted by default, string lens can do
it. Perhaps introduce byte and ubyte and leave 'char' as alias of
one of those with string lens applied by default.
** Support fixed-width types
Really we should keep everything as {u,}int{8,16,32,64} internally,
and have long, short and others be translated to one of those
according to architecture rules. Maybe this could be achieved by a
per-arch config file with typedefs such as:
| typedef ulong = uint8_t; |
** Support for ARM/AARCH64 types
- ARM and AARCH64 both support half-precision floating point
- there are two different half-precision formats, IEEE 754-2008
and "alternative". Both have 10 bits of mantissa and 5 bits of
exponent, and differ only in how exponent==0x1F is handled. In
IEEE format, we get NaN's and infinities; in alternative
format, this encodes normalized value -1S × 2¹⁶ × (1.mant)
- The Floating-Point Control Register, FPCR, controls: — The
half-precision format where applicable, FPCR.AHP bit.
- AARCH64 supports fixed-point interpretation of {,double}words
- e.g. fixed(int, X) (int interpreted as a decimal number with X
binary digits of fraction).
- AARCH64 supports 128-bit quad words in SIMD
** Some more functions in vect might be made to take const*
Or even marked __attribute__((pure)).
** pretty printer support
GDB supports python pretty printers. We migh want to hook this in
and use it to format certain types.
** support new Linux kernel features
- PTRACE_SIEZE
- /proc/PID/map_files/* (but only root seems to be able to read
this as of now)
* BUGS
** After a clone(), syscalls may be seen as sysrets in s390 (see trace.c:syscall_p())
** leak in regex matching
>> I looked into this. Ltrace is definitely leaking it. The regex is
>> released when filter_destroy() calls filter_rule_destroy(), but those
>> are not called by anything.
>
>Ah, there we go. I just saw that we call regfree, but didn't check
>whether we actually call those. Will you roll this into your change
>set, or should I look into it?
I'd rather you looked at it, if you don't mind.
** unconditional follow of pthread_create
Basically we'd like to follow pthread_create always, and fork only if -f
is given. ltrace now follows nothing, unless -f is given, and then it
follows everything. (Really it follows everything alway and detaches
from newly-created children unless -f is given.)
The thing is, in Linux, we can choose to follow only {v,}forks by
setting PTRACE_O_TRACE{V,}FORK. We can also choose to follow all clones
by setting PTRACE_O_TRACECLONE, but that captures pthread_create as well
as fork (as all these are built on top of the underlying clone system
call), as well as direct clone calls.
So what would make sense would be to tweak the current logic to only
detach if what happened was an actual fork, which we can tell from the
parameters of the system call. That might provide a more useful user
experience. Tracing only a single thread is problematic anyway, because
_all_ the threads will hit the breakpoints that ltrace sets anyway, so
pre-emptively tracing all threads is what you generally need.
|