布隆过滤器的优缺点概述 优点:
缺点:
布隆过滤器实现 google.guava工具类实现 demo理解
引入Guava pom配置
1 2 3 4 5 <dependency > <groupId > com.google.guava</groupId > <artifactId > guava</artifactId > <version > 29.0-jre</version > </dependency >
代码实现(demo版)
向布隆过滤器中插入100万个数据,使用bloomFilter.mightContain方法判断再添加数据会出现下标重合的误判。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 import com.google.common.hash.BloomFilter;import com.google.common.hash.Funnels;public class BloomFilterCase { private static int size = 1000000 ; private static double fpp = 0.01 ; private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size, fpp); public static void main (String[] args) { for (int i = 0 ; i < size; i++) { bloomFilter.put(i); } int count = 0 ; for (int i = size; i < size + 100000 ; i++) { if (bloomFilter.mightContain(i)) { count++; System.out.println(i + "误判了" ); } } System.out.println("总共的误判数:" + count); } }
运行结果
10万数据里有947个误判,约等于0.01%,也就是我们代码里设置的误判率:fpp = 0.01。
核心源码分析(guava 29.0-jre 版本)
创建布隆过滤器 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 public static <T> BloomFilter<T> create (Funnel<? super T> funnel, int expectedInsertions, double fpp) { return create(funnel, (long )expectedInsertions, fpp); } public static <T> BloomFilter<T> create (Funnel<? super T> funnel, long expectedInsertions, double fpp) { return create(funnel, expectedInsertions, fpp, BloomFilterStrategies.MURMUR128_MITZ_64); } @VisibleForTesting static <T> BloomFilter<T> create (Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy) { Preconditions.checkNotNull(funnel); Preconditions.checkArgument(expectedInsertions >= 0L , "Expected insertions (%s) must be >= 0" , expectedInsertions); Preconditions.checkArgument(fpp > 0.0 D, "False positive probability (%s) must be > 0.0" , fpp); Preconditions.checkArgument(fpp < 1.0 D, "False positive probability (%s) must be < 1.0" , fpp); Preconditions.checkNotNull(strategy); if (expectedInsertions == 0L ) { expectedInsertions = 1L ; } long numBits = optimalNumOfBits(expectedInsertions, fpp); int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits); try { return new BloomFilter(new LockFreeBitArray(numBits), numHashFunctions, funnel, strategy); } catch (IllegalArgumentException var10) { throw new IllegalArgumentException("Could not create BloomFilter of " + numBits + " bits" , var10); } }
所需参数:
funnel
:数据类型(一般是调用Funnels工具类中的)
expectedInsertions
:期望插入的值的个数
fpp
:误判率(默认值为0.03)
strategy
:哈希算法
参数fpp误判率与所占内存
情景一:fpp = 0.01
情景二:fpp=0.03
情景总结
误判率可以通过fpp
参数进行调节
fpp越小,需要的内存空间就越大:0.01需要900多万位数,0.03需要700多万位数。
fpp越小,集合添加数据时,就需要更多的hash函数运算更多的hash值,去存储到对应的数组下标里。(忘了去看上面的布隆过滤存入数据的过程)
上面的numBits
,表示存一百万个int类型数字,需要的位数为7298440,700多万位。理论上存一百万个数,一个int是4字节32位,需要481000000=3200万位。如果使用HashMap去存,按HashMap50%的存储效率,需要6400万位。可以看出BloomFilter的存储空间很小,只有HashMap的1/10左右
上面的numHashFunctions
表示需要几个hash函数运算,去映射不同的下标存这些数字是否存在(0 or 1)。
布隆过滤器中添加、判断是否误判方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 enum BloomFilterStrategies implements Strategy {MURMUR128_MITZ_64 { public <T> boolean put (T object, Funnel<? super T> funnel, int numHashFunctions, BloomFilterStrategies.LockFreeBitArray bits) { long bitSize = bits.bitSize(); byte [] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal(); long hash1 = this .lowerEight(bytes); long hash2 = this .upperEight(bytes); boolean bitsChanged = false ; long combinedHash = hash1; for (int i = 0 ; i < numHashFunctions; ++i) { bitsChanged |= bits.set((combinedHash & 9223372036854775807L ) % bitSize); combinedHash += hash2; } return bitsChanged; } public <T> boolean mightContain (T object, Funnel<? super T> funnel, int numHashFunctions, BloomFilterStrategies.LockFreeBitArray bits) { long bitSize = bits.bitSize(); byte [] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal(); long hash1 = this .lowerEight(bytes); long hash2 = this .upperEight(bytes); long combinedHash = hash1; for (int i = 0 ; i < numHashFunctions; ++i) { if (!bits.get((combinedHash & 9223372036854775807L ) % bitSize)) { return false ; } combinedHash += hash2; } return true ; } .........
Redis实现 上面使用Guava实现的布隆过滤器是把数据放在了本地内存中。分布式的场景中就不合适了,无法共享内存。 我们还可以用Redis来实现布隆过滤器,这里使用Redis封装好的客户端工具Redisson。 其底层是使用数据结构bitMap
代码实现
导入配置
1 2 3 4 5 <dependency > <groupId > org.redisson</groupId > <artifactId > redisson-spring-boot-starter</artifactId > <version > 3.13.4</version > </dependency >
java代码
guava工具类使用mightContain()判断是否存在,可在访问要求查缓存前查询缓存是否存在,避免缓存穿透。1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public class RedissonBloomFilter { public static void main (String[] args) { Config config = new Config(); config.useSingleServer().setAddress("redis://127.0.0.1:6379" ); config.useSingleServer().setPassword("1234" ); RedissonClient redisson = Redisson.create(config); RBloomFilter<String> bloomFilter = redisson.getBloomFilter("phoneList" ); bloomFilter.tryInit(100000000L ,0.03 ); bloomFilter.add("10086" ); System.out.println(bloomFilter.contains("123456" )); System.out.println(bloomFilter.contains("10086" )); } }
This is copyright.