Remove Duplication | Richrad' Tech

常规元素遍历
在实际工作业务场景中，对于List集合去重是不可避免的一件事情...

// 遍历后判断赋给另一个List集合，保持原来顺序
    public static void ridRepeat1(List<String> list) {
        System.out.println("list = [" + list + "]");
        List<String> listNew = new ArrayList<String>();
        for (String str : list) {
            if (!listNew.contains(str)) {
                listNew.add(str);
            }
        }
        System.out.println("listNew = [" + listNew + "]");
    }

Set集合去重

除了遍历去重，我们常常想到利用Set集合不允许重复元素的特点，通过List和Set互转，来去掉重复元素。

	// Set集合去重，保持原来顺序
    public static void ridRepeat2(List<String> list) {
        System.out.println("list = [" + list + "]");
        List<String> listNew = new ArrayList<String>();
        Set set = new HashSet();
        for (String str : list) {
            if (set.add(str)) {
                listNew.add(str);
            }
        }
        System.out.println("listNew = [" + listNew + "]");
    }

    // Set去重     由于Set(HashSet)的无序性，不会保持原来顺序
    public static void ridRepeat3(List<String> list) {
        System.out.println("list = [" + list + "]");
        Set set = new HashSet();
        List<String> listNew = new ArrayList<String>();
        set.addAll(list);
        listNew.addAll(set);
        System.out.println("listNew = [" + listNew + "]");
    }

    // Set通过HashSet去重（将ridRepeat3方法缩减为一行） 无序
    public static void ridRepeat4(List<String> list) {
        System.out.println("list = [" + list + "]");
        List<String> listNew = new ArrayList<String>(new HashSet(list));
        System.out.println("listNew = [" + listNew + "]");
    }

    // Set通过TreeSet去重   会按字典顺序重排序
    public static void ridRepeat5(List<String> list) {
        System.out.println("list = [" + list + "]");
        List<String> listNew = new ArrayList<String>(new TreeSet<String>(list));
        System.out.println("listNew = [" + listNew + "]");
    }

    // Set通过LinkedHashSet去重  保持原来顺序
    public static void ridRepeat6(List<String> list) {
        System.out.println("list = [" + list + "]");
        List<String> listNew = new ArrayList<String>(new LinkedHashSet<String>(list));
        System.out.println("listNew = [" + listNew + "]");
    }

Java8 Stream去重

除此之外，可以利用Java8的stream来实现去重

  // 利用java8的stream去重 
  // 如果集合是存的是自定义对象，对象实现hashCode()和equals()也可实现
  List uniqueList = list.stream().distinct().collect(Collectors.toList());
  System.out.println(uniqueList.toString());

上面的方法在List元素为基本数据类型及String类型时是可以的，但是如果List集合元素为对象，却不会奏效

public class ObjectRidRepeat {
    
    public static void main(String[] args) {
        List<User> userList = new ArrayList<User>();
        userList.add(new User("小黄",10));
        userList.add(new User("小红",23));
        userList.add(new User("小黄",78));
        userList.add(new User("小黄",10));
        
        //使用HashSet,无序
        Set<User> userSet = new HashSet<User>();
        userSet.addAll(userList);
        System.out.println(userSet);
        
        //使用LinkedHashSet，有序
        List<User> listNew = new ArrayList<User>(new LinkedHashSet(userList));
        System.out.println(listNew.toString());
    }
}

对象去重 Java8

//根据name属性去重
        List<User> unique1 = userList.stream().collect(
                collectingAndThen(
                        toCollection(() -> new TreeSet<>(comparing(User::getName))), ArrayList::new));

        System.out.println(unique1.toString());

        //根据name,age属性去重
        List<User> unique2 = userList.stream().collect(
                collectingAndThen(
                        toCollection(() -> new TreeSet<>(comparing(o -> o.getName() + ";" + o.getAge()))), ArrayList::new)
        );

        System.out.println(unique2.toString());

重写hashCode和equals方法

在User类中重写equals()方法和hashCode()方法：

//重写equals方法
 @Override
    public boolean equals(Object obj) {
        User user = (User) obj;
        return name.equals(user.getName()) && (age==user.getAge());
    }

//重写hashCode方法
    @Override
    public int hashCode() {
        String str = name + age;
        return str.hashCode();
    }

通过最具代表的的String中的equals()方法和hashCode()方法源码来探究两个方法的实现

equals()方法

	/**
     * Compares this string to the specified object.  The result is {@code
     * true} if and only if the argument is not {@code null} and is a {@code
     * String} object that represents the same sequence of characters as this
     * object.
     *
     * @param  anObject
     *         The object to compare this {@code String} against
     *
     * @return  {@code true} if the given object represents a {@code String}
     *          equivalent to this string, {@code false} otherwise
     *
     * @see  #compareTo(String)
     * @see  #equalsIgnoreCase(String)
     */
    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

比较两个对象时，首先先去判断两个对象是否具有相同的地址，如果是同一个对象的引用，则直接放回true；如果地址不一样，则证明不是引用同一个对象，接下来就是挨个去比较两个字符串对象的内容是否一致，完全相等返回true，否则false。

hashCode()方法

	/**
     * Returns a hash code for this string. The hash code for a
     * {@code String} object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using {@code int} arithmetic, where {@code s[i]} is the
     * <i>i</i>th character of the string, {@code n} is the length of
     * the string, and {@code ^} indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

当equals方法被重写时，通常有必要重写 hashCode 方法，以维护 hashCode 方法的常规协定，该协定声明相等对象必须具有相等的哈希码。

根据《Effective Java》第二版的第九条：覆盖equals时总要覆盖hashCode 中的内容，总结如下：

在程序执行期间，只要equals方法的比较操作用到的信息没有被修改，那么对这同一个对象调用多次，hashCode方法必须始终如一地返回同一个整数。
如果两个对象根据equals方法比较是相等的，那么调用两个对象的hashCode方法必须返回相同的整数结果。
如果两个对象根据equals方法比较是不等的，则hashCode方法不一定得返回不同的整数。但是，程序员应该知道，为不相等的对象生成不同整数结果可以提高哈希表的性能。

《Java编程思想》中也有类似总结：

设计hashCode()时最重要的因素就是：无论何时，对同一个对象调用hashCode()都应该产生同样的值。如果在讲一个对象用put()添加进HashMap时产生一个hashCdoe值，而用get()取出时却产生了另一个hashCode值，那么就无法获取该对象了。所以如果你的hashCode方法依赖于对象中易变的数据，用户就要当心了，因为此数据发生变化时，hashCode()方法就会生成一个不同的散列码。