不知 C 在读者心中是什么样子,在笔者印象中,C 的表达力很差,只有数组与指针两种高级数据结构,标准库小并且有很多历史问题,没有包管理机制,最致命的是,需要手动管理内存,现代年轻程序员很难对 C 感兴趣了,可选择的高级语言太多了,比如:如日中天的 Java,后起之秀的 Go/Rust,为什么要去选择 C?
笔者也是最近一年有重新学习 C 的想法,主要原因是现在的工作与数据库相关实现,而老牌数据库像 PostgresQL、MySQL 等都是用 C 来实现的,虽然如今新兴的数据库有更多的语言选择,但之前 C 在这方面积累了几十年的经验,不是轻易就能超过的,而且像数据库这类复杂的软件,即使采用相似的思路,性能也可能差距甚远。比如哈希函数的选择?冲突怎么解决?如何确定哈希桶大小?有了 C 语言底子,才有可能去看懂这些传统软件的实现过程。 背景介绍
为了体验 C 的开发体验,笔者先是看了 21st Century C 这本书,这本书比较短,可以很快看完,不过这本书讲的也比较浅,之前不了解 C 的工具链的话,看这本书会有所帮助,但对写 C 代码帮助有限,于是又重新温习了 C程序设计语言(第2版·新版),不得不说,即使这么多年了,这本书依然是学习 C 最好的教材,内容简洁、扼要,案例典型,全篇没有一句废话,这本书也比较短,不做习题的话,一周也可以看完。如果说这本书有缺点,那就是变量名不需要在函数一开始全部定义出来了,现在的 C 编译器比之前先进了不少。
有了 K&R C 的底子,就可以直接开始实战了。笔者最近一个月用 C 开发了一个 2K 行的项目:jiacai2050/oh-my-github,不算太 trivial,主要是想从这个项目中体验以下内容:
C 的开发流程,熟悉相关工具链 C99/C11 的语言特性 C 编码风格要领,如何设计 API 才能避免使用者踩坑
首先聊聊工具链。开发一个正式项目前,有一些比较繁琐的事情要做,比如配置开发、调试环境,安装依赖等。对于 C 来说,LSP 支持的比较好,笔者使用的 language server 是 clangd,查看变量定义、查看引用、自动补全等功能都支持。clangd 使用 compile_commands.json 这个文件做配置,一些构建工具可以直接生成它,对于简单的个人项目,也可以直接用 compile_flags.txt 做配置,使用样例:
==609== HEAP SUMMARY: ==609== in use at exit: 276 bytes in 37 blocks ==609== total heap usage: 35,019 allocs, 34,982 frees, 279,075,706 bytes allocated ==609== ==609== 48 bytes in 6 blocks are still reachable in loss record 1 of 3 ==609== at 0x4849E4C: malloc (vg_replace_malloc.c:307) ==609== by 0x57DB83F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DCE2F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x5843C5B: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DB73F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DC8DB: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57D8443: gcry_control (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x4CE9BC3: libssh2_init (in /usr/lib/aarch64-linux-gnu/libssh2.so.1.0.1) ==609== by 0x48D58CF: ??? (in /usr/lib/aarch64-linux-gnu/libcurl.so.4.7.0) ==609== by 0x48831FB: curl_global_init (in /usr/lib/aarch64-linux-gnu/libcurl.so.4.7.0) ==609== by 0x10A0AB: omg_setup_context (omg.c:184) ==609== by 0x10DD0B: main (cli.c:19)
==609== 84 bytes in 25 blocks are definitely lost in loss record 2 of 3 ==609== at 0x4849E4C: malloc (vg_replace_malloc.c:307) ==609== by 0x10D86B: omg_parse_trending (omg.c:1116) ==609== by 0x10DBEF: omg_query_trending (omg.c:1176) ==609== by 0x10DD6B: main (cli.c:38) ==609== ==609== 144 bytes in 6 blocks are still reachable in loss record 3 of 3 ==609== at 0x4849E4C: malloc (vg_replace_malloc.c:307) ==609== by 0x57DB83F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DCE2F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x5843C4F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DB73F: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57DC8DB: ??? (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x57D8443: gcry_control (in /usr/lib/aarch64-linux-gnu/libgcrypt.so.20.2.8) ==609== by 0x4CE9BC3: libssh2_init (in /usr/lib/aarch64-linux-gnu/libssh2.so.1.0.1) ==609== by 0x48D58CF: ??? (in /usr/lib/aarch64-linux-gnu/libcurl.so.4.7.0) ==609== by 0x48831FB: curl_global_init (in /usr/lib/aarch64-linux-gnu/libcurl.so.4.7.0) ==609== by 0x10A0AB: omg_setup_context (omg.c:184) ==609== by 0x10DD0B: main (cli.c:19) ==609== ==609== LEAK SUMMARY: ==609== definitely lost: 84 bytes in 25 blocks ==609== indirectly lost: 0 bytes in 0 blocks ==609== possibly lost: 0 bytes in 0 blocks ==609== still reachable: 192 bytes in 12 blocks ==609== suppressed: 0 bytes in 0 blocks
可以非常清楚地看到泄漏的地方,然后按图索骥去修复相应逻辑即可。修复后的报告:
1 2 3 4 5 6
==481== LEAK SUMMARY: ==481== definitely lost: 0 bytes in 0 blocks ==481== indirectly lost: 0 bytes in 0 blocks ==481== possibly lost: 0 bytes in 0 blocks ==481== still reachable: 192 bytes in 12 blocks ==481== suppressed: 0 bytes in 0 blocks
除了使用工具来避免内存问题,更优雅的方式是在设计 API 时就保证尽少地分配,区分好边界,这在后面 API 设计时会提到,这里不再赘述。以下链接有更多关于 cleanup 的讨论:
Portable equivalent to gcc's __attribute__(cleanup) A good and idiomatic way to use GCC and clang __attribute__((cleanup)) and pointer declarations
#include <assert.h> int main(void) { static_assert(2 + 2 == 4, "2+2 isn't 4"); // well-formed static_assert(sizeof(int) < sizeof(char), "this program requires that int is less than char"); // compile-time error }
Generic selection
C11 的这个特性在一定程度上支持了泛型编程。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#include <stdio.h> #include <math.h>
// Possible implementation of the tgmath.h macro cbrt #define cbrt(X) _Generic((X), \ long double: cbrtl, \ default: cbrt, \ float: cbrtf \ )(X)
int main(void) { double x = 8.0; const float y = 3.375; printf("cbrt(8.0) = %f\n", cbrt(x)); // selects the default cbrt printf("cbrtf(3.375) = %f\n", cbrt(y)); // converts const float to float, // then selects cbrtf }
多线程
C11 新增了下面两个头文件用于对多线程的支持:
threads.h,类似于 pthread 库 stdatomic.h,提供原子变量以及与 C++ 类似的内存顺序
编码风格 错误处理
在传统 C 中,一般的做法是函数返回一个整型的错误码,错误信息通过读取一全局变量来获得。比如 libc,处理逻辑大致如下:
How to Write Reusable and Maintainable Code using the C Language.
总结
作为一门历史悠久的语言,C 并没有过时,相反随着时代进步,它也在逐步演进。对于长期使用高级语言的程序员来说,刚转向 C 时,可能会觉得它功能太过简陋,开发效率低,但这其实只是表面现象,通过对整个生态圈的熟悉,这种感觉会逐渐消失。而且由于黑魔法少,程序员对整个代码库会更有控制感。
In the beginning you always want results.
In the end all you want is CONTROL.
扩展阅读
漫谈 C 语言及如何学习 C 语言 C 语言底层开发怎么样? - V2EX C 语言该怎么继续提高 - 闲聊灌水 - Emacs China Autotools Introduction (automake) Debugging and Profiling · the missing semester of your cs education One year of C Struct and union initialization - cppreference.com Modern C and What We Can Learn From It (YouTube) To Save C, We Must Save ABI | The Pasture C Isn't A Programming Language Anymore | Lobsters Learning C with gdb - Blog - Recurse Center