# 定点数与浮点数

CS Basics

2017.01.17

var x = 0.3;
var y = 0.2;
var z = 0.1;

console.log((x - y) == (y - z));  \\ => false
console.log((y + z) == x);        \\ => false


## 定点数

### 十进制表示

$123.45 = 1 \times {10^2} + 2 \times {10^1} + 3 \times {10^0} + 4 \times {10^{ - 1}} + 5 \times {10^{ - 2}}$

### 二进制数表示

$1101.11 = 1 \times {2^3} + 1 \times {2^2} + 0 \times {2^1} + 1 \times {2^0} + 1 \times {2^{ - 1}} + 1 \times {2^{ - 2}}$

${b_7}{b_6}{b_5}{b_4}{b_3}{b_2}{b_1}{b_0}.{b_{ - 1}}{b_{ - 2}}{b_{ - 3}}{b_{ - 4}}{b_{ - 5}}{b_{ - 6}}{b_{ - 7}}{b_{ - 8}}$

### 定义

$x=(1/2^b)\sum_{n=0}^{N-1}2^nx_n$

$x = 1011.1111 = 8 + 2 + 1 + 0.5 + 0.25 + 0.125 + 0.0625 = 11.9375$

$x = 191/{2^4} = 191/16 = 11.9375$

• 0000000100000000 = 00000001 . 00000000 = 1d (1 decimal)
• 0000001000000000 = 00000010 . 00000000 = 2d
• 0000001010000000 = 00000010 . 10000000 = 2.5d

### 有符号定点数

NA(a,b) 有符号二进制数的真实值可以通过下式表示：

$x=(1/2^b)\Big[-2^{N-1}x_{N-1}+\sum_{n=0}^{N-2}2^nx_n\Big]$

• 00000000100000000 = 00000001 . 00000000 = 1d
• 10000000100000000 = 10000001 . 00000000 = -128 + 1 = -127d
• 00000001000000000 = 00000010 . 00000000 = 2d
• 10000001000000000 = 10000010 . 00000000 = -128 + 2 = -126d
• 00000001010000000 = 00000010 . 10000000 = 2.5d
• 10000001010000000 = 10000010 . 10000000 = -128 + 2.5 = -125.5d

### 特性

• 无符号数 U(a,b) 的取值：$0 \leqslant x \leqslant 2^a-2^{-b}$
• 有符号数 A(a,b) 的取值：$-2^a \leqslant x \leqslant 2^a-2^{-b}$
• 两个不同格式的数相加时，需要将小数点位对齐再执行加法
• 两个 A(a,b) 相加结果形如 A(a+1,b)，而 U(a,b) 相加结果形如 U(a+1,b)
• U(a,b)U(c,d) 相乘结果形如 U(a+c,b+d)
• A(a,b)A(c,d) 相乘结果形如 A(a+c+1,b+d)

#### 值域

  ---+-----------+-----------+-----------+-----------+----------   ----------+--------------------
0         2^-8        2.2^-8      3.2^-8      4.2^-8       ...       2^8-2^-8
|                                                           |
smallest number representable                                largest one


#### 分辨率

  ---+-----------+-----------+-----------+-----------+----------   ----------+--------------------
0         2^-8        2.2^-8      3.2^-8      4.2^-8       ...       2^8-2^-8
|<--------->|           |<--------->|
resolution              resolution


#### 准确度

$Accuracy(F) = Resolution(F)/2$

  ---+-----------+-----------+-----|-----+-----------+----------   ----------+--------------------
0         2^-8        2.2^-8  |   3.2^-8      4.2^-8       ...       2^8-2^-8
|
| real value we need to represent
<--->
Accuracy is the largest such difference


## 浮点数

UC Berkeley 有个网页介绍 IEEE 浮点数标准的历史3，有兴趣可以去看看，我暂且不看了，有时间当小说阅读吧。

$1.23456789\times10^{-19}=12.3456789\times10^{-20}=0.000 000 000 000 000 000 123 456 789\times10^0$

### IEEE 标准浮点数

#### 格式

$x=\pm1.bbbbbb...bbb\times2^{bbb...bb}$
• b(s) 表示各个位
• $\pm$ 是符号位，通常由第一位表示，0 标志该数为正，1 标志该数为负
• 1.bbbbbb…bbb 即为尾数
• 2 为基数
• 指数上的 bbb…bb 即为指数
• 这是规范化的数，小数点左侧只有一位
• 由于第一位总是 1，所以计算机中不需要存储这个位，它是一个隐含位

#### 位编码

1. 先将其规范化
2. 然后调整其指数

$y=+1000.100111$

$y=+1.000100111\times2^3$

$y=+1.000100111\times10^{11}$

1. 0 表示符号位 +
2. 000100111 表示尾数（小数点左侧的 1 上面已经说过是隐含位，不用存储）
3. 然后需要存储指数 11

   31 30      23 22                    0
+---+----------+-----------------------+
| s | exponent |       mantissa        |
| 1 |    8     |          23           |
+---+----------+-----------------------+

• MSB 是符号位，占 1 位
• 接下来的 8 位存储指数信息
• 接下来的 23 位存储尾数

y = 0 bbbbbbbb 0001001110000000000000

-1260special case #1
-1261
-1252
-1243
-1234
..
..
..
-1126
0127
1128
2129
3130
..
..
..
127254
128255special case #2

y = 0 10000010 0001001110000000000000

#### 特例

0.0 = 0 00000000 0000000000000000000000


• 指数部分为 0，也就是对应的实数指数为 -126
• 尾数部分不再包含隐含的 1，而变为 0.001000…00，对应的十进制数为 0.125
• 这个二进制浮点数对应的实数则为 $0.125\times2^{-126}=1.4693679e^{-39}$

+∞ = 0 11111111 00000000000000000000000

-∞ = 1 11111111 00000000000000000000000


import java.util.*;
import static java.lang.Double.NaN;
import static java.lang.Double.POSITIVE_INFINITY;
import static java.lang.Double.NEGATIVE_INFINITY;

public class NaN {
public static void main(String args[]) {
double[] allNaNs = {
0D/0D,
POSITIVE_INFINITY / POSITIVE_INFINITY,
POSITIVE_INFINITY / NEGATIVE_INFINITY,
NEGATIVE_INFINITY / POSITIVE_INFINITY,
NEGATIVE_INFINITY / NEGATIVE_INFINITY,
0 * POSITIVE_INFINITY,
0 * NEGATIVE_INFINITY,
Math.pow(1, POSITIVE_INFINITY),
POSITIVE_INFINITY + NEGATIVE_INFINITY,
NEGATIVE_INFINITY + POSITIVE_INFINITY,
POSITIVE_INFINITY - POSITIVE_INFINITY,
NEGATIVE_INFINITY - NEGATIVE_INFINITY,
Math.sqrt(-1),
Math.log(-1),
Math.asin(-2),
Math.acos(+2),
};
System.out.println(Arrays.toString(allNaNs));
// prints "[NaN, NaN...]"
System.out.println(NaN == NaN); // prints "false"
System.out.println(Double.isNaN(NaN)); // prints "true"
}
}


### 浮点数值域

denormalizednormalizedapproximate decimal
single precision$\pm2^{-149}$ to $(1-2^{-23})\times2^{-126}$$\pm2^{-126}$ to $(2-2^{-23})\times2^{127}$$\pm\sim10^{-44.85}$ to $\sim10^{38.53}$
double precision$\pm2^{-1074}$ to $(1-2^{-52})\times2^{-1022}$$\pm2^{-1022}$ to $(2-2^{-52})\times2^{1023}$$\pm\sim10^{-323.3}$ to $\sim10^{308.3}$

denormalizednormalized
single precision$\pm(2-2^{-23})\times2^{127}$$\sim\pm10^{38.53}$
double precision$\pm(2-2^{-52})\times2^{1023}$$\sim\pm10^{308.25}$

#### 间隔

real valuebyte integerstored [sign exp mantissa]floating point
-inf2401 111 0000-inf
-15.52391 110 1111- 1.9375 * 2^ 3
-15.02381 110 1110- 1.875 * 2^ 3
-14.52371 110 1101- 1.8125 * 2^ 3
-14.02361 110 1100- 1.75 * 2^ 3
-13.52351 110 1011- 1.6875 * 2^ 3
-13.02341 110 1010- 1.625 * 2^ 3
-12.52331 110 1001- 1.5625 * 2^ 3
-12.02321 110 1000- 1.5 * 2^ 3
-11.52311 110 0111- 1.4375 * 2^ 3
-11.02301 110 0110- 1.375 * 2^ 3
-10.52291 110 0101- 1.3125 * 2^ 3
-10.02281 110 0100- 1.25 * 2^ 3
-9.52271 110 0011- 1.1875 * 2^ 3
-9.02261 110 0010- 1.125 * 2^ 3
-8.52251 110 0001- 1.0625 * 2^ 3
-8.02241 110 0000- 1.0 * 2^ 3
-7.752231 101 1111- 1.9375 * 2^ 2
-7.52221 101 1110- 1.875 * 2^ 2
-7.252211 101 1101- 1.8125 * 2^ 2
-7.02201 101 1100- 1.75 * 2^ 2
-6.752191 101 1011- 1.6875 * 2^ 2
-6.52181 101 1010- 1.625 * 2^ 2
-6.252171 101 1001- 1.5625 * 2^ 2
-6.02161 101 1000- 1.5 * 2^ 2
-5.752151 101 0111- 1.4375 * 2^ 2
-5.52141 101 0110- 1.375 * 2^ 2
-5.252131 101 0101- 1.3125 * 2^ 2
-5.02121 101 0100- 1.25 * 2^ 2
-4.752111 101 0011- 1.1875 * 2^ 2
-4.52101 101 0010- 1.125 * 2^ 2
-4.252091 101 0001- 1.0625 * 2^ 2
-4.02081 101 0000- 1.0 * 2^ 2
-3.8752071 100 1111- 1.9375 * 2^ 1
-3.752061 100 1110- 1.875 * 2^ 1
-3.6252051 100 1101- 1.8125 * 2^ 1
-3.52041 100 1100- 1.75 * 2^ 1
-3.3752031 100 1011- 1.6875 * 2^ 1
-3.252021 100 1010- 1.625 * 2^ 1
-3.1252011 100 1001- 1.5625 * 2^ 1
-3.02001 100 1000- 1.5 * 2^ 1
-2.8751991 100 0111- 1.4375 * 2^ 1
-2.751981 100 0110- 1.375 * 2^ 1
-2.6251971 100 0101- 1.3125 * 2^ 1
-2.51961 100 0100- 1.25 * 2^ 1
-2.3751951 100 0011- 1.1875 * 2^ 1
-2.251941 100 0010- 1.125 * 2^ 1
-2.1251931 100 0001- 1.0625 * 2^ 1
-2.01921 100 0000- 1.0 * 2^ 1
-1.93751911 011 1111- 1.9375 * 2^ 0
-1.8751901 011 1110- 1.875 * 2^ 0
-1.81251891 011 1101- 1.8125 * 2^ 0
-1.751881 011 1100- 1.75 * 2^ 0
-1.68751871 011 1011- 1.6875 * 2^ 0
-1.6251861 011 1010- 1.625 * 2^ 0
-1.56251851 011 1001- 1.5625 * 2^ 0
-1.51841 011 1000- 1.5 * 2^ 0
-1.43751831 011 0111- 1.4375 * 2^ 0
-1.3751821 011 0110- 1.375 * 2^ 0
-1.31251811 011 0101- 1.3125 * 2^ 0
-1.251801 011 0100- 1.25 * 2^ 0
-1.18751791 011 0011- 1.1875 * 2^ 0
-1.1251781 011 0010- 1.125 * 2^ 0
-1.06251771 011 0001- 1.0625 * 2^ 0
-1.01761 011 0000- 1.0 * 2^ 0
-0.968751751 010 1111- 1.9375 * 2^ -1
-0.93751741 010 1110- 1.875 * 2^ -1
-0.906251731 010 1101- 1.8125 * 2^ -1
-0.8751721 010 1100- 1.75 * 2^ -1
-0.843751711 010 1011- 1.6875 * 2^ -1
-0.81251701 010 1010- 1.625 * 2^ -1
-0.781251691 010 1001- 1.5625 * 2^ -1
-0.751681 010 1000- 1.5 * 2^ -1
-0.718751671 010 0111- 1.4375 * 2^ -1
-0.68751661 010 0110- 1.375 * 2^ -1
-0.656251651 010 0101- 1.3125 * 2^ -1
-0.6251641 010 0100- 1.25 * 2^ -1
-0.593751631 010 0011- 1.1875 * 2^ -1
-0.56251621 010 0010- 1.125 * 2^ -1
-0.531251611 010 0001- 1.0625 * 2^ -1
-0.51601 010 0000- 1.0 * 2^ -1
-0.4843751591 001 1111- 1.9375 * 2^ -2
-0.468751581 001 1110- 1.875 * 2^ -2
-0.4531251571 001 1101- 1.8125 * 2^ -2
-0.43751561 001 1100- 1.75 * 2^ -2
-0.4218751551 001 1011- 1.6875 * 2^ -2
-0.406251541 001 1010- 1.625 * 2^ -2
-0.3906251531 001 1001- 1.5625 * 2^ -2
-0.3751521 001 1000- 1.5 * 2^ -2
-0.3593751511 001 0111- 1.4375 * 2^ -2
-0.343751501 001 0110- 1.375 * 2^ -2
-0.3281251491 001 0101- 1.3125 * 2^ -2
-0.31251481 001 0100- 1.25 * 2^ -2
-0.2968751471 001 0011- 1.1875 * 2^ -2
-0.281251461 001 0010- 1.125 * 2^ -2
-0.2656251451 001 0001- 1.0625 * 2^ -2
-0.251441 001 0000- 1.0 * 2^ -2
-0.11718751431 000 1111- 0.9375 * 2^ -3
-0.1093751421 000 1110- 0.875 * 2^ -3
-0.10156251411 000 1101- 0.8125 * 2^ -3
-0.093751401 000 1100- 0.75 * 2^ -3
-0.08593751391 000 1011- 0.6875 * 2^ -3
-0.0781251381 000 1010- 0.625 * 2^ -3
-0.07031251371 000 1001- 0.5625 * 2^ -3
-0.06251361 000 1000- 0.5 * 2^ -3
-0.05468751351 000 0111- 0.4375 * 2^ -3
-0.0468751341 000 0110- 0.375 * 2^ -3
-0.03906251331 000 0101- 0.3125 * 2^ -3
-0.031251321 000 0100- 0.25 * 2^ -3
-0.02343751311 000 0011- 0.1875 * 2^ -3
-0.0156251301 000 0010- 0.125 * 2^ -3
-0.00781251291 000 0001- 0.0625 * 2^ -3
0.000 000 00000.0
0.007812510 000 0001+ 0.0625 * 2^ -3
0.01562520 000 0010+ 0.125 * 2^ -3
0.023437530 000 0011+ 0.1875 * 2^ -3
0.0312540 000 0100+ 0.25 * 2^ -3
0.039062550 000 0101+ 0.3125 * 2^ -3
0.04687560 000 0110+ 0.375 * 2^ -3
0.054687570 000 0111+ 0.4375 * 2^ -3
0.062580 000 1000+ 0.5 * 2^ -3
0.070312590 000 1001+ 0.5625 * 2^ -3
0.078125100 000 1010+ 0.625 * 2^ -3
0.0859375110 000 1011+ 0.6875 * 2^ -3
0.09375120 000 1100+ 0.75 * 2^ -3
0.1015625130 000 1101+ 0.8125 * 2^ -3
0.109375140 000 1110+ 0.875 * 2^ -3
0.1171875150 000 1111+ 0.9375 * 2^ -3
0.25160 001 0000+ 1.0 * 2^ -2
0.265625170 001 0001+ 1.0625 * 2^ -2
0.28125180 001 0010+ 1.125 * 2^ -2
0.296875190 001 0011+ 1.1875 * 2^ -2
0.3125200 001 0100+ 1.25 * 2^ -2
0.328125210 001 0101+ 1.3125 * 2^ -2
0.34375220 001 0110+ 1.375 * 2^ -2
0.359375230 001 0111+ 1.4375 * 2^ -2
0.375240 001 1000+ 1.5 * 2^ -2
0.390625250 001 1001+ 1.5625 * 2^ -2
0.40625260 001 1010+ 1.625 * 2^ -2
0.421875270 001 1011+ 1.6875 * 2^ -2
0.4375280 001 1100+ 1.75 * 2^ -2
0.453125290 001 1101+ 1.8125 * 2^ -2
0.46875300 001 1110+ 1.875 * 2^ -2
0.484375310 001 1111+ 1.9375 * 2^ -2
0.5320 010 0000+ 1.0 * 2^ -1
0.53125330 010 0001+ 1.0625 * 2^ -1
0.5625340 010 0010+ 1.125 * 2^ -1
0.59375350 010 0011+ 1.1875 * 2^ -1
0.625360 010 0100+ 1.25 * 2^ -1
0.65625370 010 0101+ 1.3125 * 2^ -1
0.6875380 010 0110+ 1.375 * 2^ -1
0.71875390 010 0111+ 1.4375 * 2^ -1
0.75400 010 1000+ 1.5 * 2^ -1
0.78125410 010 1001+ 1.5625 * 2^ -1
0.8125420 010 1010+ 1.625 * 2^ -1
0.84375430 010 1011+ 1.6875 * 2^ -1
0.875440 010 1100+ 1.75 * 2^ -1
0.90625450 010 1101+ 1.8125 * 2^ -1
0.9375460 010 1110+ 1.875 * 2^ -1
0.96875470 010 1111+ 1.9375 * 2^ -1
1.0480 011 0000+ 1.0 * 2^ 0
1.0625490 011 0001+ 1.0625 * 2^ 0
1.125500 011 0010+ 1.125 * 2^ 0
1.1875510 011 0011+ 1.1875 * 2^ 0
1.25520 011 0100+ 1.25 * 2^ 0
1.3125530 011 0101+ 1.3125 * 2^ 0
1.375540 011 0110+ 1.375 * 2^ 0
1.4375550 011 0111+ 1.4375 * 2^ 0
1.5560 011 1000+ 1.5 * 2^ 0
1.5625570 011 1001+ 1.5625 * 2^ 0
1.625580 011 1010+ 1.625 * 2^ 0
1.6875590 011 1011+ 1.6875 * 2^ 0
1.75600 011 1100+ 1.75 * 2^ 0
1.8125610 011 1101+ 1.8125 * 2^ 0
1.875620 011 1110+ 1.875 * 2^ 0
1.9375630 011 1111+ 1.9375 * 2^ 0
2.0640 100 0000+ 1.0 * 2^ 1
2.125650 100 0001+ 1.0625 * 2^ 1
2.25660 100 0010+ 1.125 * 2^ 1
2.375670 100 0011+ 1.1875 * 2^ 1
2.5680 100 0100+ 1.25 * 2^ 1
2.625690 100 0101+ 1.3125 * 2^ 1
2.75700 100 0110+ 1.375 * 2^ 1
2.875710 100 0111+ 1.4375 * 2^ 1
3.0720 100 1000+ 1.5 * 2^ 1
3.125730 100 1001+ 1.5625 * 2^ 1
3.25740 100 1010+ 1.625 * 2^ 1
3.375750 100 1011+ 1.6875 * 2^ 1
3.5760 100 1100+ 1.75 * 2^ 1
3.625770 100 1101+ 1.8125 * 2^ 1
3.75780 100 1110+ 1.875 * 2^ 1
3.875790 100 1111+ 1.9375 * 2^ 1
4.0800 101 0000+ 1.0 * 2^ 2
4.25810 101 0001+ 1.0625 * 2^ 2
4.5820 101 0010+ 1.125 * 2^ 2
4.75830 101 0011+ 1.1875 * 2^ 2
5.0840 101 0100+ 1.25 * 2^ 2
5.25850 101 0101+ 1.3125 * 2^ 2
5.5860 101 0110+ 1.375 * 2^ 2
5.75870 101 0111+ 1.4375 * 2^ 2
6.0880 101 1000+ 1.5 * 2^ 2
6.25890 101 1001+ 1.5625 * 2^ 2
6.5900 101 1010+ 1.625 * 2^ 2
6.75910 101 1011+ 1.6875 * 2^ 2
7.0920 101 1100+ 1.75 * 2^ 2
7.25930 101 1101+ 1.8125 * 2^ 2
7.5940 101 1110+ 1.875 * 2^ 2
7.75950 101 1111+ 1.9375 * 2^ 2
8.0960 110 0000+ 1.0 * 2^ 3
8.5970 110 0001+ 1.0625 * 2^ 3
9.0980 110 0010+ 1.125 * 2^ 3
9.5990 110 0011+ 1.1875 * 2^ 3
10.01000 110 0100+ 1.25 * 2^ 3
10.51010 110 0101+ 1.3125 * 2^ 3
11.01020 110 0110+ 1.375 * 2^ 3
11.51030 110 0111+ 1.4375 * 2^ 3
12.01040 110 1000+ 1.5 * 2^ 3
12.51050 110 1001+ 1.5625 * 2^ 3
13.01060 110 1010+ 1.625 * 2^ 3
13.51070 110 1011+ 1.6875 * 2^ 3
14.01080 110 1100+ 1.75 * 2^ 3
14.51090 110 1101+ 1.8125 * 2^ 3
15.01100 110 1110+ 1.875 * 2^ 3
15.51110 110 1111+ 1.9375 * 2^ 3
inf1120 111 0000+ inf

#### bias

 5e-08 = 0 01100110 10101101011111110010101 1 = 0 01111111 00000000000000000000000 65536.5 = 0 10001111 00000000000000001000000

## 林宏

Frank Lin

Hey, there! This is Frank Lin (@flinhong), one of the 1.41 billion . This 'inDev. Journal' site holds the exploration of my quirky thoughts and random adventures through life. Hope you enjoy reading and perusing my posts.

## YOU MAY ALSO LIKE

Web Notes

2016.08.20

### Using Liquid in Jekyll - Live with Demos

Liquid is a simple template language that Jekyll uses to process pages for your site. With Liquid you can output complex contents without additional plugins.

JavaScript Notes

2018.12.17

### Practising closures in JavaScript

JavaScript is a very function-oriented language. As we know, functions are first class objects and can be easily assigned to variables, passed as arguments, returned from another function invocation, or stored into data structures. A function can access variable outside of it. But what happens when an outer variable changes? Does a function get the most recent value or the one that existed when the function was created? Also, what happens when a function invoked in another place - does it get access to the outer variables of the new place?

JavaScript Notes

2018.03.08