Nginx

什么叫好的架构

From: https://www.zhihu.com/question/25056737

我是 2012 年 10 月份去的阿里云,做了架构师。ECS 当时是两个月迭代一次版本,我们团队全年通宵了 23 次,花了两年时间,把线上的五万台服务器重构成一个新的架构,大概是一个月就可以做迭代。

好的架构真的很重要。Nginx 就是个好的架构。

好架构,就是你要有预见性。比如 TCP 协议,它在 1995 年的时候就创立出来了,到现在都没有大改,依旧服务了各种各样的场景。Nginx 也是一样,从 2014 年推出来后基本没多大变化。但它的模块化,是纵向的划分的架构,最底层是事件驱动,基于 EP 的事件驱动,再上面是 HTTP 框架,再上面是 HTTP 模块,再上面是 OpenResty 的 lua。再看横向的划分,比如说我们要做一些 WAF 防火墙的限制,又比如基于 IP 做白名单、黑名单,那它的模块划分很清楚,还比如其他的模块,像 linit 或者 Request 只负责限速,不会自己去获取到用户的真实 IP 再去限速等等。

所以,我说 Nginx 是一个好的架构,而我们学习 Nginx,也可以获取到非常好的架构思维。

Nginx出现在哪里

执行流程,module的功能在特定的流程处理阶段得到执行,和conf里面的配置顺序没有关系

  • POST_READ The ngx_http_realip_module registers its handler at this phase to enable substitution of client addresses before any other module is invoked

  • SERVER_REWRITE Rewrite directives defined in a server block (but outside a location block) are processed

  • FIND_CONFIG Special phase where a location is chosen based on the request URI

  • REWRITE For rewrite rules defined in the location, chosen in the FIND_CONFIG phase

  • POST_REWRITE Special phase where the request is redirected to a new location if its URI changed during a rewrite

  • PREACCESS Common phase for different types of handlers, not associated with access control. The standard nginx modules ngx_http_limit_conn_module and ngx_http_limit_req_module register their handlers at this phase

  • ACCESS Phase where it is verified that the client is authorized to make the request. Standard nginx modules such as ngx_http_access_module and ngx_http_auth_basic_module register their handlers at this phase

  • POST_ACCESS Special phase where the satisfy any directive is processed. If some access phase handlers denied access and none explicitly allowed it, the request is finalized. No additional handlers can be registered at this phase

  • PRECONTENT Handlers to be called prior to generating content. Standard modules such as ngx_http_try_files_module and ngx_http_mirror_module register their handlers at this phase

  • CONTENT Response is normally generated. Multiple nginx standard modules register their handlers at this phase, including ngx_http_index_module or ngx_http_static_module

  • LOG Request logging is performed. Currently, only the ngx_http_log_module registers its handler at this stage for access logging. Log phase handlers are called at the very end of request processing, right before freeing the request

Modules

  • realip[POST_READ PHASE]

    • break 停止执行rewrite模块中的指令集,影响其他模块的指令执行。
  • rewrite

    • break 停止执行rewrite模块中的指令集,影响其他模块的指令执行
    • if 变量,正则表达式,文件,路径,是否可执行,和shell中有点类似
    • return code [text]; return code URL; return URL
      • 301 permanent may incorrectly sometimes be changed to a GET method
      • 302 temporary some old clients were incorrectly changing the method to GET: the behavior with non-GET methods and 302 is then unpredictable on the Web, whereas the behavior with 307 is predictable. For GET requests, their behavior is identical
        • 303 temporary This response code is usually sent back as a result of PUTor POST. The method used to display this redirected page is always GET
        • 307 temporary guarantees that the method and the body will not be changed
        • 308 permanent request method and the body will not be altered
    • rewrite 使用特定的正则表达式来匹配请求的uri,如果匹配成功,则可以更改uri。last,break
    • rewrite_log
    • set
  • location

    • match rules
      • nginx first checks locations defined using the prefix strings (prefix locations). Among them, the location with the longest matching prefix is selected and remembered.
      • Then regular expressions are checked, in the order of their appearance in the configuration file. The search of regular expressions terminates on the first match, and the corresponding configuration is used
      • If no match with a regular expression is found then the configuration of the prefix location remembered earlier is used

Nginx负载均衡策略

  • 随机 random

  • 轮询(默认)每个请求按时间顺序逐一分配到不同的后端服务器,如果后端服务器down掉,能自动剔除。

  • weight 指定轮询几率,weight和访问比率成正比,用于后端服务器性能不均的情况。权重越高,在被访问的概率越大。

  • ip_hash 可以采用ip_hash指令解决这个问题,如果客户已经访问了某个服务器,当用户再次访问时,会将该请求通过哈希算法,自动定位到该服务器。每个请求按访问ip的hash结果分配,这样每个访客固定访问一个后端服务器。

  • fair(第三方)按后端服务器的响应时间来分配请求,响应时间短的优先分配。

  • url_hash(第三方)使每个url定向到同一个(对应的)后端服务器,后端服务器为缓存时比较有效。

  • least_conn(最少连接数)按照保持的连接数。一般来说保持的连接数越多说明处理的任务越多,也是最繁忙的,可以将请求分配给其他机器处理。

故障节点摘除与恢复
  • max_fails=number
    这个参数决定了多少次请求后端失败后会暂停这个业务节点,不再给它发新的请求,默认值是1。此参数需要配合fail_timeout一起用。

题外话:如何定义失败,有很多种类型,这里因为主要处理HTTP代理,所以更关注proxy_next_upstream。
proxy_next_upstream:主要定义了当服务节点出现状况时,会将请求发给其他节点,也就是定义了怎么算作业务节点失败。

  • fail_timeout=time

决定了当Nginx认定这个节点不可用时,暂停多久。不配置默认就是10s。

把上面两个参数联合起来考虑就是:当Nginx发现发送到这个节点上的请求失败了3次的时候,就会把这个节点摘除,摘除时间是30s,30s后才会再次发送请求到这个节点上。

  • backup

类似于switch语句中的default,当主要节点都挂了的时候,会把请求打到这个backup节点。这是最后一个救兵了。

Nginx优秀的第三方扩展

  1. OpenResty

熟悉HTTP

  1. web服务器


参考

 Comments