solr搜索服务的使用 | 指尖上的记忆

solr9.6搜索服务的使用:
官网下载solr-9.6.0.tgz
1>解压
tar zxvf ./solr-9.6.0.tgz
  
2>运行自带的example项目
k8s@HPDEV-31:/usr/local/solr-9.6.0$ sudo bin/solr start -e techproducts -p 8985 -force

这里使用 -force:
WARNING: Starting Solr as the root user is a security risk and not considered best practice. Exiting.
         Please consult the Reference Guide. To override this check, start with argument '-force'

ERROR: Failed to start Solr using command: "bin/solr" start -p 8985 -s "example/techproducts/solr" Exception : org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
  
3>停止之前的
k8s@HPDEV-31:/usr/local/solr-9.6.0$ sudo bin/solr stop -p 8985 -force
  
4>再次启动
k8s@HPDEV-31:/usr/local/solr-9.6.0$ sudo bin/solr start -p 8985 -force
*** [WARN] *** Your open file limit is currently 1024.  
 It should be set to 65000 to avoid operational disruption. 
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Java 17 detected. Enabled workaround for SOLR-16463
Waiting up to 180 seconds to see Solr running on port 8985 [|]  
Started Solr server on port 8985 (pid=35159). Happy searching!
  
5>创建core(必须先启动solr)
k8s@HPDEV-31:/usr/local/solr-9.6.0$ sudo bin/solr create -c conferences_core -p 8985 -force
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin/solr config -c conferences_core -p 8985 -action set-user-property -property update.autoCreateFields -value false

Created new core 'conferences_core'

默认创建路径如下：
/usr/local/solr-9.6.0/server/solr/conferences_core


参考:https://solr.apache.org/guide/solr/latest/deployment-guide/installing-solr.html
  
6>修改core的 schema, 添加字段
k8s@HPDEV-31:/usr/local/solr-9.6.0/server/solr/conferences_core/conf$ sudo vim managed-schema.xml

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<!-- docValues are enabled by default for long type so we don't need to index the version field  -->
<field name="age" type="pint" indexed="true" stored="true" />
<field name="description" type="text_ik" indexed="true" stored="true"  />  // 注意我第一次没有安装ik分词器，默认好像没有 type为 text_ik，但是我安装完ik分词器之后就有了，所以下次使用的时候，还是先安装ik,然后再倒数据，同时如果后来重新改了这个记得relaod core
<field name="create_time" type="pdate" indexed="true" stored="true"  />
<field name="update_time" type="pdate" indexed="true" stored="true"  />


然后 reload，结果一直报错，于是把 create_time 改为 createTime， update_time 改为 updateTime 就好了(xml格式的文件问题很多坑...)
当然官网还有schema api: https://solr.apache.org/guide/solr/latest/indexing-guide/schema-api.html
我通过api给schema添加了一个新的字段，第一次添加的时候没有选择 indexed="true", 结果导致 description 不能插入，一直报: from index options=docs to inconsistent index options=docs_and_freqs_and_positions 问题，但是后来加上了indexed="true" 就没问题了,这个属性必须要有，即使是设置为indexed="false"； 同时还注意到当新增 filed 之后，之前查询的数据是不包含这 field的，只有后面新增的数据中包含了这个field, 那么查询的数据才会包含这个新增的field.
  
7>添加数据(选择Documents操作 /update)
目前添加的机制是，如果添加的数据的 id  不存在，那么就会新增，否则更新数据，比如：

{"id": "11","age": 40,"description": "创造了古代浪漫主义文学高峰、歌行体和七绝达到后人难及的高度"},
{"id": "12","age": 31,"description": "唐代伟大的现实主义文学作家，唐诗思想艺术的集大成者"}
{"id": "13","age": 31,"description": "头孢不叫阿莫西林。头孢和阿莫西林属于两种不同的抗菌药物，患者需要在医生指导下服用。若是出现身体不适，要及时就医治疗"}
{"id": "14","age": 68,"description": "阿莫西林属于广谱青霉素类抗生素，对肺炎链球菌、溶血性链球菌、不产青霉素酶葡萄球菌、粪肠球菌、大肠埃希菌等均具有良好的抗菌活性。有片剂、颗粒剂等多种剂型，临床上适用于敏感菌所致的皮肤及软组织感染、呼吸道感染、胃肠道感染、尿路感染、败血症等"}
{"id": "15","age": 52,"description": "如果患者无甲状腺疾病相关症状，且其他检查指标均正常，则说明甲状腺功能正常。如果患者出现甲状腺功能亢进症或甲状腺功能减退症的相关症状，且伴有促甲状腺激素（TSH）、甲状腺抗体等其他甲状腺指标异常，则说明甲状腺功能异常，需要查明原因后进行相应的治疗"}


报错，需要 _vesion_ 字段(我猜测是每个document都必须有一个版本字段的原因是，为了记录最近操作)，直接客户端选择schema,然后添加这个字段(string类型即可)即可，再执行添加json数据，不报错.
我发现插入完数据之后，再 sudo vim managed-schema.xml，里面的内容也变了.

官网 update document api: https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-handlers.html
关于各个 field 属性的介绍: https://solr.apache.org/guide/solr/latest/indexing-guide/fields.html
  
8>条件查询报错:Undefined field _text_
解决方法:https://opensolr.com/faq/view/opensolr-wiki-q-a/104/undefined-field-_text_
但是执行之后又会报其它错误,又恢复了,最好直接添加一个 _text_ 字段，这样不会报错，主要是添加这个_text_以后，对于查询参数 q, 可以直接写任何东西,不用指定某个字段，那么会走默认字段，就是这里的_text_，其实上面的链接里有，打开配置会发现.但是一般 _text_字段又没有数据,所以什么也查不到. 参考这个初始化配置:https://solr.apache.org/guide/solr/latest/configuration-guide/initparams.html

我发现查询应该这么用:
比如查询 description的q参数
description:*古代*
这样就不会报错了,需要指定查询的字段

参考:https://blog.csdn.net/zhouzhiwengang/article/details/111028596

如果查询所有数据,那么q参数如下(默认值)
*:*

还可以多条件:
description:*因为* && age:42

官网query api: https://solr.apache.org/guide/solr/latest/query-guide/json-request-api.html
这个是我的一个复合查询语句: http://localhost:8985/solr/conferences_core/select?fl=* score&indent=true&q.op=OR&q=description:*文*&rows=2&sort=id desc&start=0&useParams=
  
9>删除数据
documents的Request-Handler /update
documents type 选择 XML

documents 输入下面语句
<delete><query>*:*</query></delete>
<commit/>

但是上面的删除，会删除所有的数据.

删除指定id的数据:
<delete>
<id>11</id>
</delete>

删除指定筛选条件的数据:
<delete>
<query>description:*唐代*</query>
</delete>


10>更新数据(选择Documents操作 /update)
目前更新的机制是，添加 id 相同的数据，会自动更新为最新数据，比如下面会更新id为11的数据：
{"id": "11","age": 36,"description": "秋季破鼻子可能是空气干燥、不良习惯导致的，还可能与鼻腔异物、外伤、过敏性鼻炎等情况有关，具体可以前往医院就诊，明确诊断后，在医生指导下进行针对性治疗。"},
  
11>ik分词器:
https://mvnrepository.com/artifact/com.github.magese/ik-analyzer
选择合适的版本，在 Files 找到jar下载

目前最高的是8.x版本，但是对于solr9.x 也是可以用的,就选择最新的 8.5版本就可以了

将这个jar包放到如下目录(里面全是jar包):
/usr/local/solr-9.6.0/server/solr-webapp/webapp/WEB-INF/lib

进入下面这个目录:
/usr/local/solr-9.6.0/server/solr/conferences_core/conf

sudo vim managed-schema.xml

我直接在最后一个 fieldType 添加如下代码:
<!-- ik分词器 -->
    <fieldType name="text_ik" class="solr.TextField">
      <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
      <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    </fieldType>

然后重新启动solr服务:
k8s@HPDEV-31:/usr/local/solr-9.6.0$ sudo bin/solr restart -p 8985 -force

测试ik是否可用:
选择对应的core, 然后点击 Analysis, 

Field Value: 输入一串中文

Analyse Fieldname / FieldType 中找到 text_ik, 
点击右侧的 Analyse values 即可看到分词结果

也可以在 Analyse Fieldname / FieldType 中找到 设置了 text_ik 的 filed，比如这里的 description, 然后点击右侧的 Analyse values 即可看到分词结果.

分词查询:
将 q 字段给值 description:多种机制， 不要加 ** 号, 不然分词不起作用.
  
12>关于高亮部分
如果想使用 highlighting 功能，需要配置 hl参数和hl.fl参数,一个是控制是否高亮，一个是控制需要高亮的字段(可以有多个),至少需要这两个参数，否则查询不生效
这里设置为: hl=true, 以及 hl.fl=description
http://localhost:8985/solr/conferences_core/select?hl.fl=description&hl=true&indent=true&q.op=OR&q=description:两种人&useParams=

如果想重新配置包裹高亮的标签，可以通过下面两个参数实现, 默认是 <em> </em> 标签:
hl.simple.pre
hl.simple.post

比如设置为: hl.simple.pre=<strong>, hl.simple.post=</strong>, 最实用的是配置color为红色样式.

以下是highlighting的查询结果:
"highlighting":{
    "16":{
      "description":["创造了古代浪漫主义<em>文学</em>高峰、歌行体和七绝达到后人难及的高度"]
    },
    "11":{
      "description":["创造了古代浪漫主义<em>文学</em>高峰、歌行体和七绝达到后人难及的高度"]
    }
  }

官网参考: https://solr.apache.org/guide/solr/latest/query-guide/highlighting.html
  
13>solr安全配置
参考: https://solr.apache.org/guide/solr/latest/deployment-guide/authentication-and-authorization-plugins.html
基本授权: https://solr.apache.org/guide/solr/latest/deployment-guide/basic-authentication-plugin.html


这里配置 /usr/local/solr-9.6.0/server/solr/security.json, 然后重启solr:
{
"authentication":{ 
   "blockUnknown": true, 
   "class":"solr.BasicAuthPlugin",
   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}, 
   "realm":"My Solr users", 
   "forwardCredentials": false 
}
}

默认用户名: solr 密码: SolrRocks
  
  
总参考：
https://blog.csdn.net/hejiahao_/article/details/133698865